RFC: White space within braced regex constructs

The question is should we allow space within the braces of things like 
\b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?

Space is already allowed within Unicode property definitions

	 \p{ foo = bar }

is perfectly legal without /x.  This was because the Unicode standard 
required it.  The space is only valid adjacent to the braces and the 
equals sign.

I believe this is the only case where it is legal, however.  You can't 
say \x{ df } or \b{ wb }, for example, even under /x.

It has long been planned to bring Perl to parity with other languages so 
as to be able to omit the lower bound in a curly quantifier, a{,3} would 
have a lower bound of 0.  We are now in a position to do that.  We could 
choose to allow white space within this construct  1) never; 2) always; 
3) with /x.

I don't really know what is the right decision.
0
public
10/16/2020 3:20:38 AM
perl.perl5.porters 48234 articles. 1 followers. Follow

7 Replies
13 Views

Similar Articles

[PageSpeed] 23

Karl Williamson writes:

> The question is should we allow space within the braces of things like
> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
>=20
> ... omit the lower bound in a curly quantifier, a{,3} would have a
> lower bound of 0.  We are now in a position to do that. We could
> choose to allow white space within this construct  1) never; 2)
> always; 3) with /x.
>=20
> I don't really know what is the right decision.

What's the disadvantage to always allowing whitespace?

Triggering on /x seems pointless. Outside of braces, whitespace
characters are normally literal, so /x changes their interpretation from
one valid meaning to a different one.

But spaces inside {m,n} are currently an error (=E2=80=9CUnescaped left b=
race in
regex is illegal here in regex=E2=80=9D). Nobody is currently using them.=
 If
we're going to start skipping spaces in there, it seems unnecessarily
petty to throw an error unless the user enables /x. If somebody writes
{2, 3}, we unambiguously know what they mean.

Is there a significant efficiency or complexity of implementation
disadvantage to allowing whitespace in there?

The main disadvantage I can think of is backwards compatibility:
somebody adding spaces inside braces will find their code doesn't run on
older versions of perl. Or somebody uses spaces in an example, which
another user can't get to work. But that's also true for any
improvements to Perl; for years, many people avoided C<say>, to remain
compatible with pre-v5.10 perls.

Smylers
0
Smylers
10/16/2020 7:47:20 AM
Karl Williamson <public@khwilliamson.com> wrote:
:The question is should we allow space within the braces of things like 
:\b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
:
:Space is already allowed within Unicode property definitions
:
:	 \p{ foo = bar }
:
:is perfectly legal without /x.  This was because the Unicode standard 
:required it.  The space is only valid adjacent to the braces and the 
:equals sign.
:
:I believe this is the only case where it is legal, however.  You can't 
:say \x{ df } or \b{ wb }, for example, even under /x.
:
:It has long been planned to bring Perl to parity with other languages so 
:as to be able to omit the lower bound in a curly quantifier, a{,3} would 
:have a lower bound of 0.  We are now in a position to do that.  We could 
:choose to allow white space within this construct  1) never; 2) always; 
:3) with /x.
:
:I don't really know what is the right decision.

I feel we should absolutely allow whitespace next to the punctuation in
\x{df} and {1,10} under /x. I think the value of it absent /x is a lot
lower, enough so that if there are any backcompat concerns we probably
shouldn't change it. So I'd go for (3).

I don't think we should allow whitespace within the numbers in either case
(\x{d f}, {1,1 0}).

It is a shame, though, that the error message you get is about
"unescaped left brace" - if the scenario Smylers suggests arises,
where someone on an older perl tries to use a regexp suggested by
someone used to the newer semantics, the error message will trigger
exactly the wrong attempt to "fix" the problem.

(Still better than silently wrong, as it would be for perl < 5.22).

Hugo
0
hv
10/16/2020 10:26:32 AM
--Sig_/b_Ez/J5flRtOGhuP/0AJP0_
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 16 Oct 2020 11:26:32 +0100, hv@crypt.org wrote:

> Karl Williamson <public@khwilliamson.com> wrote:
> :The question is should we allow space within the braces of things
> like :\b{} {m,n} quantifiers, etc, regardless of the /x modifier
> setting? :
> :Space is already allowed within Unicode property definitions
> :
> :	 \p{ foo =3D bar }
> :
> :is perfectly legal without /x.  This was because the Unicode
> standard :required it.  The space is only valid adjacent to the
> braces and the :equals sign.
> :
> :I believe this is the only case where it is legal, however.  You
> can't :say \x{ df } or \b{ wb }, for example, even under /x.
> :
> :It has long been planned to bring Perl to parity with other
> languages so :as to be able to omit the lower bound in a curly
> quantifier, a{,3} would :have a lower bound of 0.  We are now in a
> position to do that.  We could :choose to allow white space within
> this construct  1) never; 2) always; :3) with /x.
> :
> :I don't really know what is the right decision.
>=20
> I feel we should absolutely allow whitespace next to the punctuation
> in \x{df} and {1,10} under /x. I think the value of it absent /x is a
> lot lower, enough so that if there are any backcompat concerns we
> probably shouldn't change it. So I'd go for (3).

/me would also prefer {3}

Personally I see a big diff between \X{...} and {x,y} and see
implementation of whitespace in the two as separate issues

> I don't think we should allow whitespace within the numbers in either
> case (\x{d f}, {1,1 0}).

\x{d f}  and {1,1 0}: NO
\x{ df } and {1, 10}: YES

both only under /x

> It is a shame, though, that the error message you get is about
> "unescaped left brace" - if the scenario Smylers suggests arises,
> where someone on an older perl tries to use a regexp suggested by
> someone used to the newer semantics, the error message will trigger
> exactly the wrong attempt to "fix" the problem.
>=20
> (Still better than silently wrong, as it would be for perl < 5.22).
>=20
> Hugo

--=20
H.Merijn Brand  https://tux.nl  Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.33      porting perl5 on HP-UX, AIX, and Linux
https://useplaintext.email                 https://www.test-smoke.org
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/

--Sig_/b_Ez/J5flRtOGhuP/0AJP0_
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEGolmczWuFi3lJEbAA6FHoT5dwJgFAl+Jhf4ACgkQA6FHoT5d
wJhgagf5ATyQsz9pfHJzPm7OLVoA8T2EUeBMuOULrmyoZ4OAAmg2PfMuTwftje1j
gNY60cFNw4gyMYQEz/3KxY1cAKzFezBtY4yMUt1zJOg4ZSh+gjOm3R0KNdS7uqu7
eY6dbUwCNFuFCFY8pTYEbRs3wYyqRkLVf/H9/K++/r0v86HNH0b7xcp+LqhdaI/n
dkRx8QJD3RA5lnM1qUrolJslgbXBJ0od0VYjzTOmfj86pafOqryftzv5kRo+d3Ww
IJpgQFTMIGL8N0DE4CABZDrBBFwLCBlF0cQr7OnKbYaKkidsbunPCWFVpRVSKo6v
rpgzWiI2h9ifVa/DuscQXJD+erhV9g==
=PmEb
-----END PGP SIGNATURE-----

--Sig_/b_Ez/J5flRtOGhuP/0AJP0_--
0
perl5
10/16/2020 11:37:34 AM
--55a54800740c41fc879b6b4c390fc49c
Content-Type: text/plain

On Thu, Oct 15, 2020, at 11:20 PM, Karl Williamson wrote:
> The question is should we allow space within the braces of things like 
> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?

I think we should pick one and stick to it.  I would pick "always allow the whitespace", because I think it's easy to read and matches up to the behavior of an interpolated @{...} for example.

No backward compatibility concerns have yet come to my mind.  (I am not worried about "but code on new perls won't necessarily run on old perls", because that is not a backward compatibility concern.)

If we go the opposite direction and say that /x is needed for that whitespace, I think that will be fine, too.  It's just not what I'd do. 

-- 
rjbs
--55a54800740c41fc879b6b4c390fc49c
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso=
Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>On Thu, Oct 15,=
 2020, at 11:20 PM, Karl Williamson wrote:<br></div><blockquote type=3D"=
cite" id=3D"qt" style=3D""><div>The question is should we allow space wi=
thin the braces of things like&nbsp;<br></div><div>\b{} {m,n} quantifier=
s, etc, regardless of the /x modifier setting?<br></div></blockquote><di=
v><br></div><div>I think we should pick one and stick to it.&nbsp; I wou=
ld pick "always allow the whitespace", because I think it's easy to read=
 and matches up to the behavior of an interpolated @{...} for example.<b=
r></div><div><br></div><div>No backward compatibility concerns have yet =
come to my mind.&nbsp; (I am not worried about "but code on new perls wo=
n't necessarily run on old perls", because that is not a backward compat=
ibility concern.)<br></div><div><br></div><div>If we go the opposite dir=
ection and say that /x is needed for that whitespace, I think that will =
be fine, too.&nbsp; It's just not what I'd do.&nbsp;<br></div><div><br><=
/div><div>--&nbsp;<br></div><div>rjbs</div></body></html>
--55a54800740c41fc879b6b4c390fc49c--
0
perl
10/17/2020 8:53:00 PM
On 2020-10-17 1:53 p.m., Ricardo Signes wrote:
> On Thu, Oct 15, 2020, at 11:20 PM, Karl Williamson wrote:
>> The question is should we allow space within the braces of things like
>> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
> 
> I think we should pick one and stick to it.  I would pick "always allow the 
> whitespace", because I think it's easy to read and matches up to the behavior of 
> an interpolated @{...} for example.

I also vote for whitespace being allowed unconditionally (no matter whether /x 
is present or not) inside curly brace constructs.  Then /x unambiguously only 
affects the meaning of things outside the curly brace constructs.  This is a 
much cleaner and more predictable or easy to use language design. -- Darren Duncan
0
darren
10/18/2020 12:43:56 AM
On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
> I also vote for whitespace being allowed unconditionally (no matter whether
> /x is present or not) inside curly brace constructs.  Then /x unambiguously
> only affects the meaning of things outside the curly brace constructs.  This
> is a much cleaner and more predictable or easy to use language design. --
> Darren Duncan

+1

-- 
You live and learn (although usually you just live).
0
davem
10/19/2020 10:02:55 AM
On 10/19/20 4:02 AM, Dave Mitchell wrote:
> On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
>> I also vote for whitespace being allowed unconditionally (no matter whether
>> /x is present or not) inside curly brace constructs.  Then /x unambiguously
>> only affects the meaning of things outside the curly brace constructs.  This
>> is a much cleaner and more predictable or easy to use language design. --
>> Darren Duncan
> 
> +1
> 

The effective rule  would be that any tokens within braces may be 
preceded or followed by white space.  Should it be just horizontal white 
space? I think so

A complication is that certain braced constructs can occur in double 
quoted strings, such a \x{fb00}.  Would they follow the same rules?
0
public
10/21/2020 12:41:20 AM
Reply: