[PATCH 5.005_58] REx documentation

--- ./pod/perlre.pod~~	Mon Aug  2 16:20:36 1999
+++ ./pod/perlre.pod	Fri Aug 27 15:53:30 1999
@@ -289,7 +289,8 @@ Perl defines the following zero-width as
     \A	Match only at beginning of string
     \Z	Match only at end of string, or before newline at the end
     \z	Match only at end of string
-    \G	Match only where previous m//g left off (works only with /g)
+    \G	Match only at pos(), say, at the end-of-match
+	of the previous m//g
 
 A word boundary (C<\b>) is a spot between two characters
 that has a C<\w> on one side of it and a C<\W> on the other side
@@ -383,7 +384,13 @@ Today it is more common to use the quote
 metaquoting escape sequence to disable all metacharacters' special
 meanings like this:
 
-    /$unquoted\Q$quoted\E$unquoted/
+    /$unquoted\Q$quoted\E$unquoted/;
+
+Beware that if you put I<literal> backslashes (those not inside
+interpolated variables) between C<\Q> and C<\E>, double-quotish
+backslash interpolation may lead to confusing results.  If you
+I<need> to use literal backslashes in scope of C<\Q>,
+consult L<perlop/"Gory details of parsing quoted constructs">.
 
 =head2 Extended Patterns
 
@@ -394,7 +401,7 @@ the parentheses.  The character after th
 the extension.
 
 The stability of these extensions varies widely.  Some have been
-part of the core language for many years.  Others are experimental
+part of the core language for many years.  Others are still experimental
 and may change without warning or be completely removed.  Check
 the documentation on an individual feature to verify its current
 status.
@@ -502,8 +509,8 @@ only for fixed-width look-behind.
 
 =item C<(?{ code })>
 
-B<WARNING>: This extended regular expression feature is considered
-highly experimental, and may be changed or deleted without notice.
+B<WARNING>: This extended regular expression feature is still
+experimental.
 
 This zero-width assertion evaluate any embedded Perl code.  It
 always succeeds, and its C<code> is not interpolated.  Currently,
@@ -564,8 +571,9 @@ module.  See L<perlsec> for details abou
 
 =item C<(?p{ code })>
 
-B<WARNING>: This extended regular expression feature is considered
-highly experimental, and may be changed or deleted without notice.
+B<WARNING>: This extended regular expression feature is still
+highly experimental.  While the semantic is pretty much settled down,
+a simplified version of the syntax should be designed.
 
 This is a "postponed" regular subexpression.  The C<code> is evaluated
 at run time, at the moment this subexpression may match.  The result
@@ -589,14 +597,13 @@ The following pattern matches a parenthe
 
 =item C<(?E<gt>pattern)>
 
-B<WARNING>: This extended regular expression feature is considered
-highly experimental, and may be changed or deleted without notice.
-
 An "independent" subexpression, one which matches the substring
 that a I<standalone> C<pattern> would match if anchored at the given
-position--but it matches no more than this substring.  This
+position, and it matches I<nothing else> than this substring.  This
 construct is useful for optimizations of what would otherwise be
-"eternal" matches, because it will not backtrack (see L<"Backtracking">).
+"eternal" matches, because it will not backtrack (see L<"Backtracking">),
+as well as for many places where "grab all you can, and do not give
+anything back" semantic is desirable.
 
 For example: C<^(?E<gt>a*)ab> will never match, since C<(?E<gt>a*)>
 (anchored at the beginning of string, as above) will match I<all>
@@ -619,7 +626,7 @@ Consider this pattern:
 
     m{ \(
 	  ( 
-	    [^()]+ 
+	    [^()]+		# x+ inside (group)+
           | 
             \( [^()]* \)
           )+
@@ -639,7 +646,7 @@ hung.  However, a tiny change to this pa
 
     m{ \( 
 	  ( 
-	    (?> [^()]+ )
+	    (?> [^()]+ )	# Change x+ to (?> x+ )
           | 
             \( [^()]* \)
           )+
@@ -656,12 +663,30 @@ On simple groups, such as the pattern C<
 effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
 This was only 4 times slower on a string with 1000000 C<a>s.
 
+The "grab all you can, and do not give anything back" semantic is desirable
+in many situation when on the first sight C<()*> looks like a correct
+solution.  Say, suppose we parse text with comments being delimited by
+C<#> followed by an optional (horizontal) whitespace.  Contrary to
+its appearence, C<#[ \t]*> I<is not> a correct subexpression to match
+the comment delimiter.  The answer is one of
+
+    (?>#[ \t]*)
+    #[ \t]*(?![ \t])
+
+Say, to grab non-empty comments into $1, one should use one of
+
+    / (?> \# [ \t]* ) (        .+ ) /x;
+    /     \# [ \t]*   ( [^ \t] .* ) /x;
+
+It is a judgement call which one of these expressions better reflects
+the above specification of comments.
+
 =item C<(?(condition)yes-pattern|no-pattern)>
 
 =item C<(?(condition)yes-pattern)>
 
-B<WARNING>: This extended regular expression feature is considered
-highly experimental, and may be changed or deleted without notice.
+B<WARNING>: This extended regular expression feature is still
+experimental.
 
 Conditional expression.  C<(condition)> should be either an integer in
 parentheses (which is valid if the corresponding pair of parentheses
@@ -684,7 +709,10 @@ themselves.
 A fundamental feature of regular expression matching involves the
 notion called I<backtracking>, which is currently used (when needed)
 by all regular expression quantifiers, namely C<*>, C<*?>, C<+>,
-C<+?>, C<{n,m}>, and C<{n,m}?>.
+C<+?>, C<{n,m}>, and C<{n,m}?>.  Though internally the regular engine
+may use different mechanisms than backtracking to find the match,
+for humans the allegory of backtracking gives a convenient way
+to predict how the regular engine will behave.
 
 For a regular expression to match, the I<entire> regular expression must
 match, not just part of it.  So if the beginning of a pattern containing a
@@ -857,10 +885,11 @@ is not a zero-width assertion, but a one
 
 B<WARNING>: particularly complicated regular expressions can take
 exponential time to solve because of the immense number of possible
-ways they can use backtracking to try match.  For example, this will
+ways they can use backtracking to try match.  For example, without
+internal optimizations done by the regular expression engine, this would
 take a painfully long time to run
 
-    /((a{0,5}){0,5}){0,5}/
+    'aaaaaaaaaaaa' =~ /((a{0,5}){0,5}){0,5}[c]/
 
 And if you used C<*>'s instead of limiting it to 0 through 5 matches,
 then it would take forever--or until you ran out of stack space.
@@ -1003,7 +1032,7 @@ may match zero-length substrings.  Here'
     @chars = split //, $string;		  # // is not magic in split
     ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
 
-Thus Perl allows the C</()/> construct, which I<forcefully breaks
+Thus Perl allows such constructs by I<forcefully breaking
 the infinite loop>.  The rules for this are different for lower-level
 loops given by the greedy modifiers C<*+{}>, and for higher-level
 ones like the C</g> modifier or split() operator.
@@ -1043,6 +1072,8 @@ position one notch further in the string
 
 The additional state of being I<matched with zero-length> is associated with
 the matched string, and is reset by each assignment to pos().
+Zero-length matches at the end of the previous match are ignored
+during C<split>.
 
 =head2 Creating custom RE engines
 
@@ -1093,8 +1124,8 @@ part of this regular expression needs to
 
 =head1 BUGS
 
-This manpage is varies from difficult to understand to completely
-and utterly opaque.
+As with many other parts of Perl documentation, this manpage may win from
+separating "User manual"-style sections from "Reference manual"-style ones.
 
 =head1 SEE ALSO
 
0
ilya
8/27/1999 11:02:18 PM
perl.perl5.porters 47809 articles. 1 followers. Follow

1 Replies
454 Views

Similar Articles

[PageSpeed] 20

You made a lot of changes from:

    B<WARNING>: This extended regular expression feature is considered
    highly experimental, and may be changed or deleted without notice.

to:

    B<WARNING>: This extended regular expression feature is still
    experimental.

I'd like to know why.  Anything experimental needs flashing lights all
over it.  I don't think that should be toned down the way you have
proposed doing.

You also seem to have decided that one or more aspects of this aren't
even experimental anymore, such as /(?>pattern)/.  When did Sarathy or
Larry announce that this was now considered non-experimental?

--tom
0
tchrist
8/27/1999 11:07:45 PM
Reply:

Web resources about - [PATCH 5.005_58] REx documentation - perl.perl5.porters

GNU Free Documentation License - Wikipedia, the free encyclopedia
The GNU Free Documentation License ( GNU FDL or simply GFDL ) is a copyleft license for free documentation, designed by the Free Software Foundation ...

Facebook Tweaks Documentation For Developers
Facebook continued its focus on developers with its release Thursday of improved documentation for FQL and the software-development kits for ...

Making Our Documentation Better
Over the past several months, our engineering team has been working on improving the quality of our documentation. Today, we are excited to announce ...

Facebook shares new documentation for local currency pricing, sets migration for Q3
Facebook today provided updates regarding its transition from Credits to local currency pricing. The company offered new documentation for game ...

Emergent Documentation: One way that Agile is very different from Waterfall.
(from a 2012 email) One of the questions I always get around the use of Agile is, how do you do the documentation? Many people are very uncomfortable ...

BIMx Pro - Building Information Model eXplorer for complete project documentations on the App Store on ...
Get BIMx Pro - Building Information Model eXplorer for complete project documentations on the App Store. See screenshots and ratings, and read ...

Documentation in Software Development
There is currently a trend to produce “just enough” documentation in software development. We should however not forgot that what we might estimate ...

The Documentation Dilemma
Back when 37signals was consulting, we gradually weaned ourselves off of documentation. It’s normal practice in the design world to produce lots ...

Apple publishes OS X Mavericks and iOS 7 Core Technologies Overview documentation
A new developer document posted to Apple’s website today details the technologies that power OS X Mavericks. The 36-page document includes information ...

Facebook Releases ThreatExchange API Documentation
... information about malware and other security threats, and the social network announced Friday that the application-programming-interface documentation ...

Resources last updated: 12/29/2015 11:42:07 AM