Help with grammar

--0000000000005ea0e505a62e8543
Content-Type: text/plain; charset="UTF-8"

Hi!

Can someone explain me why my grammar isn't working? Unfortunately i
can't figure it out :-(

Full script attached (42 lines) - the new lines in the script are
always only "\n"

The output:

TOP
|  request-line
|  |  method
|  |  * MATCH "CONNECT"
|  |  request-uri
|  |  * MATCH "ssl.gstatic.com:443"
|  |  http-version
|  |  * MATCH "HTTP/1.1"
|  |  crlf
|  |  * MATCH "\n"
|  * MATCH "CONNECT ssl.gstatic.com:443 HTTP/1.1\n"
|  headers
|  |  header
|  |  * MATCH "Proxy-Connection"
|  |  header-value
|  |  * MATCH "keep-alive\n"
|  |  crlf
|  |  * FAIL
|  * FAIL
* FAIL
Nil


It matches the request line's newline but not the headers.


Best regards,
David Santiago

--0000000000005ea0e505a62e8543
Content-Type: application/octet-stream; name="test.raku"
Content-Disposition: attachment; filename="test.raku"
Content-Transfer-Encoding: base64
Content-ID: <f_kah8mc380>
X-Attachment-Id: f_kah8mc380

dXNlIEdyYW1tYXI6OlRyYWNlcjsKCmdyYW1tYXIgaHR0cF9yZXF1ZXN0IHsKICAgIHJ1bGUgVE9Q
IHsKICAgICAgICAgICAgPHJlcXVlc3QtbGluZT48aGVhZGVycz4KICAgIH0KICAgIHJ1bGUgcmVx
dWVzdC1saW5lIHsKICAgICAgICA8bWV0aG9kPiA8cmVxdWVzdC11cmk+IDxodHRwLXZlcnNpb24+
PC5jcmxmPgogICAgfQogICAgcmVnZXggbWV0aG9kIHsKICAgICAgICA6aSAnb3B0aW9ucyd8J2dl
dCd8J2hlYWQnfCdwb3N0J3wncHV0J3wnZGVsZXRlJ3wndHJhY2UnfCdjb25uZWN0JwogICAgfQog
ICAgdG9rZW4gcmVxdWVzdC11cmkgewogICAgICAgIDxncmFwaD4rCiAgICB9CiAgICB0b2tlbiBo
dHRwLXZlcnNpb24gewogICAgICAgIDppICAnaHR0cC8nIFxkICcuJyBcZAogICAgfQogICAgcnVs
ZSBoZWFkZXJzIHsKICAgICAgICA8aGVhZGVyPic6JyA8aGVhZGVyLXZhbHVlPjwuY3JsZj4KICAg
IH0KICAgIHJlZ2V4IGhlYWRlcnsKICAgICAgICA8OmFscGhhKyBbXC1dID4rCiAgICB9CiAgICBy
dWxlIGhlYWRlci12YWx1ZSB7CiAgICAgICAgPGdyYXBoPisKICAgIH0KICAgIHRva2VuIGNybGYg
ewogICAgICAgIFx4WzBhXQogICAgfQp9CgpteSBTdHIgJHJlcXVlc3Q9cTp0by9FTkQvOwpDT05O
RUNUIHNzbC5nc3RhdGljLmNvbTo0NDMgSFRUUC8xLjEKUHJveHktQ29ubmVjdGlvbjoga2VlcC1h
bGl2ZQpVc2VyLUFnZW50OiBNb3ppbGxhLzUuMCAoWDExOyBMaW51eCB4ODZfNjQpIEFwcGxlV2Vi
S2l0LzUzNy4zNiAoS0hUTUwsIGxpa2UgR2Vja28pIENocm9tZS84MS4wLjQwNDQuMTM4IFNhZmFy
aS81MzcuMzYKCkVORAoKc2F5IGh0dHBfcmVxdWVzdC5wYXJzZSgkcmVxdWVzdCk7Cg==
--0000000000005ea0e505a62e8543--
0
demanuel
5/21/2020 8:40:08 PM
perl.perl6.users 1391 articles. 0 followers. Follow

3 Replies
8 Views

Similar Articles

[PageSpeed] 34

On 2020-05-21 David Santiago <demanuel@gmail.com> wrote:
> Can someone explain me why my grammar isn't working? Unfortunately i
> can't figure it out :-(

Mixing ``rule``, ``token``, and ``regex`` apparently at random doesn't
make for a good grammar=E2=80=A6

The text at
https://docs.raku.org/language/grammar_tutorial#The_technical_overview
is a bit confusing.

This https://docs.raku.org/language/regexes#Sigspace is more precise:
a ``rule`` inserts a ``<.ws>`` wherever there's whitespace in the
source code, so your::

   rule header-value { <graph>+ }

is equivalent to::

  token header-value { <graph>+ <.ws> }

which, as you saw in the trace, eats up the newline.

Short version: the only ``rule``s should be ``TOP``, ``request-line``,
and ``headers``, the others are all ``token``s

Extending the grammar to recognise more than one header is left as an
exercise.

--=20
	Dakkar - <Mobilis in mobile>
	GPG public key fingerprint =3D A071 E618 DD2C 5901 9574
	                             6FE2 40EA 9883 7519 3F88
	                    key id =3D 0x75193F88
0
dakkar
5/21/2020 9:01:00 PM
On Thu, May 21, 2020 at 08:40:08PM +0000, David Santiago wrote:
> Can someone explain me why my grammar isn't working? Unfortunately i
> can't figure it out :-(
> 
> |  headers
> |  |  header
> |  |  * MATCH "Proxy-Connection"
> |  |  header-value
> |  |  * MATCH "keep-alive\n"
> |  |  crlf
> |  |  * FAIL
> |  * FAIL
> * FAIL
> Nil

Notice how <header-value> is capturing the newline in "keep-alive\n"?  That means there's not a newline for the <.crlf> subrule that follows, and thus the match fails.

Try changing "rule header-value" to be a "token" instead.  That will prevent it from consuming any whitespace immediately following the <graph>+ sequence.  When I tried your script with header-value defined as a token, it got a lot farther into the match:

  $ rakudo test.raku
  TOP
  |  request-line
  |  |  method
  |  |  * MATCH "CONNECT"
  |  |  request-uri
  |  |  * MATCH "ssl.gstatic.com:443"
  |  |  http-version
  |  |  * MATCH "HTTP/1.1"
  |  |  crlf
  |  |  * MATCH "\n"
  |  * MATCH "CONNECT ssl.gstatic.com:443 HTTP/1.1\n"
  |  headers
  |  |  header
  |  |  * MATCH "Proxy-Connection"
  |  |  header-value
  |  |  * MATCH "keep-alive"
  |  |  crlf
  |  |  * MATCH "\n"
  |  * MATCH "Proxy-Connection: keep-alive\n"
  * MATCH "CONNECT ssl.gstatic.com:443 HTTP/1.1\nProxy-Connection: keep-"
  Nil


Personally, I would likely define <header-value> to be something more like

    token header-value { \N+ }

which gets any sequence of non-newline characters, since some of the headers coming afterwards contain spaces and characters which aren't part of <graph>.

Pm
0
pmichaud
5/21/2020 9:05:53 PM
Thank you all for your replies.

I was able to fix it and better understanding grammars :-)

Regards,
David Santiago

Patrick R. Michaud <pmichaud@pobox.com> escreveu no dia quinta,
21/05/2020 =C3=A0(s) 21:05:
>
> On Thu, May 21, 2020 at 08:40:08PM +0000, David Santiago wrote:
> > Can someone explain me why my grammar isn't working? Unfortunately i
> > can't figure it out :-(
> >
> > |  headers
> > |  |  header
> > |  |  * MATCH "Proxy-Connection"
> > |  |  header-value
> > |  |  * MATCH "keep-alive\n"
> > |  |  crlf
> > |  |  * FAIL
> > |  * FAIL
> > * FAIL
> > Nil
>
> Notice how <header-value> is capturing the newline in "keep-alive\n"?  Th=
at means there's not a newline for the <.crlf> subrule that follows, and th=
us the match fails.
>
> Try changing "rule header-value" to be a "token" instead.  That will prev=
ent it from consuming any whitespace immediately following the <graph>+ seq=
uence.  When I tried your script with header-value defined as a token, it g=
ot a lot farther into the match:
>
>   $ rakudo test.raku
>   TOP
>   |  request-line
>   |  |  method
>   |  |  * MATCH "CONNECT"
>   |  |  request-uri
>   |  |  * MATCH "ssl.gstatic.com:443"
>   |  |  http-version
>   |  |  * MATCH "HTTP/1.1"
>   |  |  crlf
>   |  |  * MATCH "\n"
>   |  * MATCH "CONNECT ssl.gstatic.com:443 HTTP/1.1\n"
>   |  headers
>   |  |  header
>   |  |  * MATCH "Proxy-Connection"
>   |  |  header-value
>   |  |  * MATCH "keep-alive"
>   |  |  crlf
>   |  |  * MATCH "\n"
>   |  * MATCH "Proxy-Connection: keep-alive\n"
>   * MATCH "CONNECT ssl.gstatic.com:443 HTTP/1.1\nProxy-Connection: keep-"
>   Nil
>
>
> Personally, I would likely define <header-value> to be something more lik=
e
>
>     token header-value { \N+ }
>
> which gets any sequence of non-newline characters, since some of the head=
ers coming afterwards contain spaces and characters which aren't part of <g=
raph>.
>
> Pm
0
demanuel
5/23/2020 8:58:45 AM
Reply: