&, &&, and backtracking.

------=_Part_20601_33098141.1189046184291
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

How do C<&> and C<&&> differ with respect to backtracking?  For instance,

    "foobar" ~~ / <[a..z]>+ & [ ... ] /;

Both sides of the C<&> happen in parallel, so I would guess that they
both match "foo" then stop. Please correct me if that's wrong.

Were we using the procedural conjunction:

    "foobar" ~~ / <[a..z]>+ && [ ... ] /;

I would guess that the LHS matches as much as it can ("foobar"), then
the RHS matches "foo". Since "foobar" ne "foo", the regex engine
backtracks on the LHS matching "fooba" and then the RHS matches "foo"
again. Again, since "fooba" ne "foo", the regex engine backtracks as
before and this behavior continues until both sides match "foo" (or, in
the general case, it can determine that the conjunctions can't match the
same portion of the string)

Or it's much simpler than that and both of the regexes above just fail
because of the greediness of C<+> and there is no intra-conjunction
backtracking.

So ... anyone care to enlighten me on how this is supposed to work?

Thanks,

-Scott
-- 
Jonathan Scott Duff
perlpilot@gmail.com

------=_Part_20601_33098141.1189046184291--
0
perlpilot
9/6/2007 2:36:24 AM
perl.perl6.language 6835 articles. 0 followers. Follow

7 Replies
1116 Views

Similar Articles

[PageSpeed] 18

Jonathan Scott Duff wrote:
> How do C<&> and C<&&> differ with respect to backtracking?  For instance,
>
>     "foobar" ~~ / <[a..z]>+ & [ ... ] /;
>
> Both sides of the C<&> happen in parallel, so I would guess that they
> both match "foo" then stop. Please correct me if that's wrong.

As written, this match would fail, since '<[a..z]>+' would match
"foobar" while '[ ... ]' would match "foo".  '&' requires that both
sides match the same start and end points.  I suppose that it might be
worth considering requiring only the start points to match, and having
the conjoined match use the earliest end point; so that the above
match would then match "foo" instead of failing.  But it's entirely
possible that there's something faulty with this approach.

The difference between '&' and '&&' is that '&' evaluates the two
sides in an arbitrary order, possibly even in parallel; '&&' always
evaluates the left side first, and only bothers with the right side if
the left side matched.

-- 
Jonathan "Dataweaver" Lang
0
dataweaver
9/6/2007 9:03:24 AM
On Wed, Sep 05, 2007 at 09:36:24PM -0500, Jonathan Scott Duff wrote:
> How do C<&> and C<&&> differ with respect to backtracking?  For instance,
> 
>     "foobar" ~~ / <[a..z]>+ & [ ... ] /;
> 
> Both sides of the C<&> happen in parallel, so I would guess that they
> both match "foo" then stop. Please correct me if that's wrong.

I think the phrase "happen in parallel" overstates things a bit
here.  From S05:

    The & form is considered declarative rather than procedural;
    it allows the compiler and/or the run-time system to decide 
    which parts to evaluate first, and it is erroneous to assume 
    either order happens consistently.  The && form guarantees 
    left-to-right order, and backtracking makes the right argument 
    vary faster than the left.

So, to answer your original question, I think the most we can
say is that C<&&> guarantees a specific order of evaluation,
while C<&> allows the pattern matcher to choose an ordering.

> Were we using the procedural conjunction:
> 
>     "foobar" ~~ / <[a..z]>+ && [ ... ] /;
> 
> I would guess that the LHS matches as much as it can ("foobar"), then
> the RHS matches "foo" [...and then backtracks the LHS until a 
> conjunctional match is found...]
>
> Or it's much simpler than that and both of the regexes above just fail
> because of the greediness of C<+> and there is no intra-conjunction
> backtracking.

I think we definitely allow intra-conjunction backtracking.
PGE implements it that way.


On a somewhat similar question, what happens with a pattern
such as

    "foobar" ~~ / foo.+? | fooba /

The LHS initially matches "foob", but with backtracking could
eventually match "foobar".  Do the longest-token semantics
in this case cause the RHS to be dispatched first, even
though the token declaration of the LHS _could_ match a 
longer token prefix?  

Thanks,

Pm
0
pmichaud
9/6/2007 6:25:12 PM
On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote:
: > Were we using the procedural conjunction:
: > 
: >     "foobar" ~~ / <[a..z]>+ && [ ... ] /;
: > 
: > I would guess that the LHS matches as much as it can ("foobar"), then
: > the RHS matches "foo" [...and then backtracks the LHS until a 
: > conjunctional match is found...]
: >
: > Or it's much simpler than that and both of the regexes above just fail
: > because of the greediness of C<+> and there is no intra-conjunction
: > backtracking.
: 
: I think we definitely allow intra-conjunction backtracking.
: PGE implements it that way.

That's what I think.

: On a somewhat similar question, what happens with a pattern
: such as
: 
:     "foobar" ~~ / foo.+? | fooba /
: 
: The LHS initially matches "foob", but with backtracking could
: eventually match "foobar".  Do the longest-token semantics
: in this case cause the RHS to be dispatched first, even
: though the token declaration of the LHS _could_ match a 
: longer token prefix?  

Yow.  ICATBW.  Non-greedy matching is somewhat antithetical to
longest-token matching.  But basically it boils down to this:
Does the longest-token matcher ignore the ?  and do

    foo.+ | fooba

or is there an implicit ordering above and beyond the DFA engine of

    foob | fooba ||
    fooba | fooba ||
    foobar | fooba ||

I think longest-token semantics have to trump minimal matching here,
and my argument is this.  Most uses of *? have additional information
on what terminates it, either implicitly in what it is matching, or
explicitly in the next bit of regex.  That is, you'd typically see
either

    foo\w+? | fooba

or

    foo.+? <wb> | fooba

In either case, the clear intent is to match foobar over fooba.
Therefore I think the DFA matcher just strips ? and does its ordinary
character by character match, relying on that extra info to match
the real extent of the quantifier.

Larry
0
larry
9/6/2007 7:37:37 PM
On Thu, 2007-09-06 at 12:37 -0700, Larry Wall wrote:
> Yow.  ICATBW.

The what now?


-'f


0
geoff
9/6/2007 8:14:43 PM
On Thu, Sep 06, 2007 at 12:37:37PM -0700, Larry Wall wrote:
> On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote:
> : On a somewhat similar question, what happens with a pattern
> : such as
> : 
> :     "foobar" ~~ / foo.+? | fooba /
> : 
> : The LHS initially matches "foob", but with backtracking could
> : eventually match "foobar".  Do the longest-token semantics
> : in this case cause the RHS to be dispatched first, even
> : though the token declaration of the LHS _could_ match a 
> : longer token prefix?  
> 
> Yow.  ICATBW.  Non-greedy matching is somewhat antithetical to
> longest-token matching.  

I agree.  One thought I had was that perhaps non-greedy matching
could also terminate the token prefix.

> [...]
> I think longest-token semantics have to trump minimal matching here,
> and my argument is this.  Most uses of *? have additional information
> on what terminates it, either implicitly in what it is matching, or
> explicitly in the next bit of regex.  That is, you'd typically see
> either
>     foo\w+? | fooba
> or
>     foo.+? <wb> | fooba
> 
> In either case, the clear intent is to match foobar over fooba.
> Therefore I think the DFA matcher just strips ? and does its ordinary
> character by character match, relying on that extra info to match
> the real extent of the quantifier.

Does this still hold true for a non-greedy quantifier in the
middle of an expression... ?  I.e.,

    "foobazbar deborah" ~~ /foo .+? b.r | fooba | foobazb /

matches "foobazbar debor" ?

(I completely grant that the examples I'm coming up with here
may be completely nonsensical in real application, but I'm
just exploring the space a bit.)

Pm
0
pmichaud
9/6/2007 9:02:19 PM
On Thu, Sep 06, 2007 at 04:02:19PM -0500, Patrick R. Michaud wrote:
: I agree.  One thought I had was that perhaps non-greedy matching
: could also terminate the token prefix.

Well, that's more or less arguing it the other way.  It kind of assumes
your fooba-ish arguments are smart enough to test for non-r after.

: > [...]
: > I think longest-token semantics have to trump minimal matching here,
: > and my argument is this.  Most uses of *? have additional information
: > on what terminates it, either implicitly in what it is matching, or
: > explicitly in the next bit of regex.  That is, you'd typically see
: > either
: >     foo\w+? | fooba
: > or
: >     foo.+? <wb> | fooba
: > 
: > In either case, the clear intent is to match foobar over fooba.
: > Therefore I think the DFA matcher just strips ? and does its ordinary
: > character by character match, relying on that extra info to match
: > the real extent of the quantifier.
: 
: Does this still hold true for a non-greedy quantifier in the
: middle of an expression... ?  I.e.,
: 
:     "foobazbar deborah" ~~ /foo .+? b.r | fooba | foobazb /
: 
: matches "foobazbar debor" ?

I suppose one approach would be to limit the DFA to the longest known
constant string, and then rerun any variable branches individually
to see if they will actually match something that long.  Alternately,
if the DFA engine is smart enough to kick out all the matches of the
"foo .+? b.r" branch then it could conceivably be taught to return
the shortest one rather than the longest one.  Course, the question
after than is, what if you mix minimal with maximal quantifiers?

Another consideration is the efficiency of matching that pattern
against "foobazbar" ~ 'b' x 100000000.  That would definitely run
faster if you cut off the DFA after the longest known token, or as
soon as the DFA reduces to one possibility, presuming you can know
such a thing.

Any way we work it, there will be a way to use it wrong.

Larry
0
larry
9/6/2007 10:49:42 PM
On Thu, Sep 06, 2007 at 03:49:42PM -0700, Larry Wall wrote:
: On Thu, Sep 06, 2007 at 04:02:19PM -0500, Patrick R. Michaud wrote:
: : I agree.  One thought I had was that perhaps non-greedy matching
: : could also terminate the token prefix.
: 
: Well, that's more or less arguing it the other way.  It kind of assumes
: your fooba-ish arguments are smart enough to test for non-r after.

The more I think about it, the more I like this approach.  Most token
should be matched in ratchet mode, and I think minimal matching
is perhaps a form of cheating: "Look for the nearest end of this
without me actually bothering to parse the middle."  And certainly
any reasonable grammer is going to be checking that a keyword is
properly terminated by a word boundary anyway.

Larry
0
larry
9/6/2007 11:02:04 PM
Reply:

Similar Artilces:

&& In Generated Javascript Becomes &amp;&amp;
I am trying to write client side scripts in c#, asp.net V 1.4. The && operator is generated in the javascript as &amp;amp;, which causes errors.  How can I avoid this? (Code below)  Thanks in advance.   script = "<script language='javascript' type='text/javascript'>\n"; script += "if (Form1." + tbxA.ClientID + ".value == '' && "; script += "Form1." + tbxB.ClientID + ".value == '') {"; script += "Form1." + cbxC.ClientID + ".Checked = false;}"; ...

Convert &amp; to &
I have a Gridview that is populated by the 3 different users from selections made in a prior Form.  This gridview summarizes everything.  I then step through this Gridview and write the data to a history file on SQL. The problem I am having is, one column in this Gridview is a name column which contains the "&" symbol.  When I iterate the Gridview and populate the history file the & becomes &amp;.  I can't seem to figure out how to convert this properly.  Both data fields are nvarchar type.  Please help anyone. This is the code to tak...

& turning to &amp;
I'm doing an operation reformat and trying to use the & in a text box. In the trace it looks ok in the policy, but when it gets to the actual xml value it turns into this: <modify-attr attr-name="EG"> <remove-all-values/> <add-value> <value>&amp;</value> </add-value> </modify-attr> Can I fix this in policy builder or by escaping the character somehow? Or do I need a stylesheet to fix the problem? -- nate_spears ------------------------------------------------------------------------ It isn't a p...

If & =&amp; in xml then { or }= what?
Hi frnds, some problem when using xml & is not compatible in xml file so we replace & with &amp;    like that i want to know what the characters which are not compatible with xml syntax for example when i m using { or } in xml file then its shwoing error msg. if anyone knows the soln then plz reply me. Thanks in advance.Regards,Hasan Mohiuddin Farooqihasan_farooqi@yahoo.co.in Hi Dear,Please find the Table. It contains the all special character list... quot " U+0022 (34) HTML 2.0 HTMLspecial ISOnum quotation mark (= APL quote) amp &am...

&amp; instead of &
I have gridview with the following code on row commandDim row As GridViewRow = gvBusinessLines.Rows(e.CommandArgument)txtBusinessLine.Text = CStr(row.Cells(1).Text) ''''''''''''''''''''''''''''''''''''''''''''''' When I have this text in gridview for example:"test & test2", when clicking on the button of the gridview, textbox is containing test &amp; test2How to avoide this other than re...

& where shown as &amp;
hai,     At runtime i will add values from textbox to datagrid using datatables.what my problem was , when i add as he & she in textbox it show as he &amps; she in the grid.how to solve this. Thanks in advance   cool.mugil:    At runtime i will add values from textbox to datagrid using datatables.what my problem was , when i add as he & she in textbox it show as he &amps; she in the grid.how to solve this.  Make sure you are not HTML encoding the input values.  Hai,    How to do stop html enc...

Replacement of & , &amp;
I was trying this java script inside a mozilla extension but the browser shows me error this.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;'); XML Parsing Error: not well-formed at replace(/&/g,'&amp;') Any thoughts ? Thanks On Fri, 30 May 2008 10:26:19 -0700 (PDT), sb wrote: > I was trying this java script inside a mozilla extension but the > browser shows me error > this.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;'); > ...

Kfarbair.com
Name: LeceHoigree Email: 101atmanga-kenseidotcom Product: eBay Companion Summary: Kfarbair.com - &#1489;&#1497;&#1514; &#1502;&#1500;&#1493;&#1503; &#1499;&#1508;&#1512;&#1497; & &#1495;&#1491;&#1512;&#1497;&#1501; &#1502;&#1512;&#1493;&#1493;&#1495;&#1497;&#1501; & &#1499;&#1508;&#1512; &#1489;&#1506;&#1497;&#1512; Comments: <a href=http://www.kfarbair.com><img>http://www.kfarbair.com/_images/_photos/photo_big8.jpg</img></a> ...

&amp;
C:\Program Files\ASP.NET Starter Kits\ASP.NET Portal (VBSDK)\PortalVBSDK\admin\Cardholders.aspx(174) : error BC30451: Name 'amp' is not declared.            __output.Write(Global.GetApplicationPath(Request) &amp; "/ASPNETPortal.css")                                                   ...

&&
Hi all=2C I have some arrays and like to trigger some events depending on which = array or combination of arrays contains data=2E What I tried was = something like=3A snippet if(=40dates =26=26 !=40themes =26=26 !=40cities) =7B =23do something with the data in =40dates =7D elsif(!=40dates =26=26 =40themes =26=26 !=40cities) =7B =23do somthing with =40themes =7D =2E=2E=2E Alas! I doesn=27t work=2E Using =22and=22 instead of =22=26=26=22 doesn=27= t help either=2E Or do I have to use =22=26=22 instead=2E Although it doesn=27t work eithe= r=2E = instead = ...

What the *$&@)%#(&!!???
Ok, a little off topic but I need help ASAP. I installed a mapping program this morning and ever since then, my computer has been going nuts. My IIS wouldn't work anymore...it wouldn't show any of my virtual directories...said there were none to manage. I re-installed that and it works now. *i think* Then I tried to browse to one of my local sites ... such as http://localhost/somesite, or http://localhost/dotnetnuke, etc. None of them work! They all point to the correct folders and have the default.aspx at the top of the documents list. All I get is a blank white page when I browse t...

@#%&*!@+%$&@
Name: Rick Kimmel Email: rick_kimmelatbellsouthdotnet Product: Firefox Summary: @#%&*!@+%$&@ Comments: Since I downloaded the latest Firefox version, every time I open it I'm assailed with a prompt telling me I need some kind of goddamned Registry Defender and it pisses me off! I already have registry software and it was free, thank you and I don't need yours. I'm looking around to find a new browser since Mozilla seems to have gone to Hell! Browser Details: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CL...

and, or, &&, ||
Hello! I remember someone mentioning that there may be a precedence problem with "unexpected results" during a mathematical conditional test, using the keywords "and" and "or"...instead of the symbols (&& and ||). Is this true? Can anyone give me an example? Thanks much! cl __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com On Tue, 25 Sep 2001, Christine Lenda wrote: > I remember someone mentioning that there may b...

Kfarbair.com
Name: creammott Email: 107atmanga-kenseidotcom Product: eBay Companion Summary: Kfarbair.com - &#1489;&#1497;&#1514; &#1502;&#1500;&#1493;&#1503; &#1499;&#1508;&#1512; &#1489;&#1506;&#1497;&#1512; & &#1513;&#1497;&#1512;&#1493;&#1514; &#1495;&#1491;&#1512;&#1497;&#1501; & &#1513;&#1511;&#1496; Comments: <a href=http://www.kfarbair.com><img>http://www.kfarbair.com/_images/logo.png</img></a> &#1502;&#1500;&#1493;&#1503; <a ...

Web resources about - &, &&, and backtracking. - perl.perl6.language

Backtracking - Wikipedia, the free encyclopedia
The classic textbook example of the use of backtracking is the eight queens puzzle , that asks for all arrangements of eight chess queens on ...

Iran says Western powers backtracking as nuclear deadline expires
Iran accused major powers on Friday of backtracking on previous pledges and throwing up new &quot;red lines&quot; at nuclear talks, after the ...

Right-wing think tank IPA says George Brandis is backtracking on race hate laws
A political fight is brewing between Attorney-General George Brandis and the Institute of Public Affairs.

Greece bailout deal: Backtracking on promises?
Athens claims success in comprise reached with creditors that would end austerity, but new reforms could be costly.

Candy Crowley not backtracking on Romney’s Benghazi gaffe
Ryan claims this morning that debate moderator Crowley is backtracking on her correction of Romney. She says "nuh uh." .

Sydney siege sees Uber raise prices before backtracking
Cab-ordering firm Uber is facing criticism again, this time for charging passengers far more during the cafe hostage crisis in Sydney, Australia. ...


Instapology: A Photo Tour of CEO Backtracking
A brief photographic history of CEO backtracking from public outcry

WWE stockholder lawsuit heats up over backtracking whistleblower
Did WWE "lie" to their investors over the potential of their domestic TV negotiations last year? Confidential Witness Number 1, aka former WWE ...

Emanuel's repeated backtracking shows he underestimated McDonald video fallout
Chicago Tribune Emanuel's repeated backtracking shows he underestimated McDonald video fallout Chicago Tribune Mayor Rahm Emanuel is interviewed ...

Resources last updated: 12/7/2015 4:01:23 PM