Zero-width split() match creates empty trailing strings but not empty leading strings

The perlfunc documentation spells this out clearly, and it matches what I see:

$ perl -e 'for (split(//, "fob", -1)) { print "$_\n"; }' | sed -e
's/^$/<blank>/'
f
o
b
<blank>

The question on my mind is why. In particular, is it a decision worth
replicating to language's libraries?

Thanks for any pointers. I tried to find the source for split, and I
think I may have found it in pp_split in pp.c. But there's not really
any reason to expect the source code to include the justification, and
I couldn't find one.

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Splitter.html
0
cpovirk
1/18/2013 11:02:10 PM
perl.perl5.porters 48287 articles. 1 followers. Follow

2 Replies
852 Views

Similar Articles

[PageSpeed] 34
Get it on Google Play
Get it on Apple App Store

Hi Chris,

* Chris Povirk <cpovirk@google.com> [2013-01-19 00:05]:
> The perlfunc documentation spells this out clearly, and it matches
> what I see:
>
> $ perl -e 'for (split(//, "fob", -1)) { print "$_\n"; }' | sed -e
> 's/^$/<blank>/'
> f
> o
> b
> <blank>
>
> The question on my mind is why.

it’s not an accident of implementation exactly, but you might call it an
accident of semantics. It’s due to `split` being defined in terms of
pattern matching and due to how pattern matching operates. Consider the
output you get from the following:

    perl -Mre=debug -E 'sub x{say "-"x72} $_ = "fob"; x; x while /(?:)/g; x'

(This pattern yields a regexp program identical to that in `split //`.)

Note that the regexp engine starts matching at position 0, then bumps
along the string as it detects a match at the same position that it
previously matched at (all these “Match possible, but” lines).

Note well that it succeeds matching the empty pattern at position 3,
i.e. just beyond the end of the string. That last one is where the
trailing empty field comes from.

The curious thing here is that it also succeeds matching the empty
pattern at the *start* of the string – yet the leading empty field is
never present in the output from `split`! Evidently, `split` actively
suppresses this initial empty field.

> Thanks for any pointers. I tried to find the source for split, and I
> think I may have found it in pp_split in pp.c. But there's not really
> any reason to expect the source code to include the justification, and
> I couldn't find one.
>
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Splitter.html

Presumably `split` suppresses the leading empty field because that one
cannot be suppressed selectively by the user as trailing empty fields
can, and it would therefore always present an obstacle to step around
in user code.

My best guess is that the user’s ability to expose the trailing empty
match using a limit of -1 was deemed harmless, but conversely, giving
the user a corresponding ability to expose the leading empty match was
deemed not worthwhile.

If this reasoning is correct, then the demonstrated behaviour with the
explicit limit and the trailing empty field has no particular semantic
worth, negative or positive, and is essentially arbitrary. The aim was
simply to make `split` DWIM in the simple case.

> In particular, is it a decision worth replicating to language's
> libraries?

In light of the above I’d say the answer is: do as you will.

If your splitter function is defined in terms of pattern matching, and
you follow the example of Perl’s `split` regarding an absent vs negative
limit, and your regexp engine operates in a way that would lead to these
empty matches, etc. – then you might follow Perl’s example and simply
suppress only the leading empty field while leaving the absent limit to
suppress the trailing fields.

If this is not how your splitter function works – then I don’t see the
reason to go out of your way in order to emulate the behaviour of Perl’s
`split` either.

HTH,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>
0
pagaltzis
3/10/2013 5:32:09 PM
Thank you. That's very helpful. It should be enough for us to make a
decision one way or the other. (And I'm sure that I'll find another
use for re=debug now that I know about it.)
0
cpovirk
3/11/2013 2:33:59 PM
Reply:

Similar Artilces:

String.Empty or string.Empty
Howdie y'all, Can somebody just tell me what's the difference between String.Empty and string.Empty I use C# in VWD and the 'String' shows up green and the 'string' shows up blue. Thanks, Wes String (with a big S) is actually the System.String class from the Framework Class Library. string (with a small s) is the C# keyword alias for the System.String class, which is why it is blue. There is no difference in their use, other than the case of the S, and the compiler generates identical code in both cases. This is one of the very few cases in C# where case doesn't matter. I've see...

Difference between String.Empty and string.Empty?
Is there a difference if you use String.Empty or string.Empty?  Is there a performance hit....

NullValue (Empty) in dataset column properties return string "(empty)"
hi all! Please help to fix error indicated in subj. Usualy I check if string field is empty like thisif a = string.empty then....But once this check didnt work. I watched length of "a" variable - it was 7 (exactly for "(empty)" string.I know exactly that there were no other possibility to assign a value for the variable except to get it from dataset.  Not sure that I understood it, but can you use this method? string.IsNullOrEmpty(a) Please remember to click “Mark as Answer” on the post that helps you. it doesn't help, because string is not empty ))&nbs...

Replacing a string with an empty string
Hi,    Thanks for your help. It can work now . Now i have another problem. Now, I want to find the part that contains the word "cos" to replace it with " "(empty string). How do I do that? I used the InStr() but there is a problem which I cannot solved. Can someone please help? Thanks! Dim value As String         value = txtValue.Text.Replace(" ", String.Empty)         Dim i As Integer = 0         While True          &nb...

Why String.Split(char[]) is possible but String.Split(string[]) is Not!
Hi everyone, i have a string for e:g String st= "Delete a record****** *************************************************** Delete Manually PatID=123456 PatName=navdeep *****Delete a record****** *************************************************** Delete Manually *****************PatID=123 PatName=navdeep6666" I need to Split this String by "Delete a record" string. Now, str[] will contain the array elements. str = st.Split("Delete a record"); Please Help Regards Navdeep check below link or try code  http://www.codeproject.co...

superreview requested: [Bug 232503] Start using Empty[C]String() : [Attachment 140140] Use Empty[C]String()
Johnny Stenback <jst@mozilla.jstenback.com> has asked Peter Van der Beken <peterv@propagandism.org> for superreview: Bug 232503: Start using Empty[C]String() http://bugzilla.mozilla.org/show_bug.cgi?id=232503 Attachment 140140: Use Empty[C]String() http://bugzilla.mozilla.org/attachment.cgi?id=140140&action=edit ...

superreview granted: [Bug 232503] Start using Empty[C]String() : [Attachment 140140] Use Empty[C]String()
Peter Van der Beken <peterv@propagandism.org> has granted Johnny Stenback <jst@mozilla.jstenback.com>'s request for superreview: Bug 232503: Start using Empty[C]String() http://bugzilla.mozilla.org/show_bug.cgi?id=232503 Attachment 140140: Use Empty[C]String() http://bugzilla.mozilla.org/attachment.cgi?id=140140&action=edit ------- Additional Comments from Peter Van der Beken <peterv@propagandism.org> > Index: editor/libeditor/html/Makefile.in > =================================================================== > @@ -58,19 +58,18 @@ CPPSRCS =...

match an empty string?
hello, using perl6 version 2014.04 built on MoarVM version 2014.04 i'm trying to write a grammar to parse the result of a `getent passwd` but when i test the gecos (which could be empty), i fall into a loop. the pb appears whenever i try to match something that can be empty using the * quantifier. I guess i miss a simple point there but i can't figure out what. Any help (or feedback on any other part of the code) would be appreciated. regards use v6; use Test; grammar PAccountDB { token TOP { ^ <account>* $ } token account { <log...

split on empty string
While cleaning up tests for release: "".split(':') => () # Perl 5 ("",) # pugs Which is correct? It doesn's seem to be specced yet. -- Gaal Yahas <gaal@forum2.org> http://gaal.livejournal.com/ On 2006-01-17 12:24 PM, "Gaal Yahas" <gaal@forum2.org> wrote: > While cleaning up tests for release: > > "".split(':') => > > () # Perl 5 > ("",) # pugs > > Which is correct? It doesn&...

String = String.Empty or null or ""?
Not sure if it matters, but over the last year have used all there and was curious if one is better than the rest (memory/processor view).When making a new string and not immediately assigning a value, is it better to...string strName = "";  // (two double quotes)..or..string strName = null;..or..string strName = String.Empty;Thanks in advance for anyone answering. Hi, jeffmc First of all, String.Empty represent "" (a cero lenght string). Each one is better depending on the context in wich you are using it. I always set a string to null, this is because strings are inmutable, and t...

Should a string property default be String.Empty or NULL?
This question seems trivial and it perhaps should belong to the programming practises section, but I still don't understand it thoroughly... Suppose I have the following property:public string SomeProperty { get { object o = ViewState["SomeProperty"]; return (o == null ? <some_default_value> : (string)o); } set { ViewState["SomeProperty"] = value; } }From what I know, if it were an Int or other types property, I would have no choice and give it a default value of the same type as the property's type....

String Field with Empty String As NULL turned on....
produces the following error SQLSTATE = 42000 Microsoft OLE DB Provider for SQL Server Implicit conversion from data type ntext to varchar is not allowed. Use the CONVERT function to run this query. setup: PB10 with latest ebf, using OLE-DB and SQLOLEDB to connect to sqlserver 2000 DW's uses stored procedure updates, and sometimes as retrieve too I just migrated to pb10, and was regression testing when I came across the above error. I've narrowed it down to a few fields (string(20)) that have the Empty String As Null attribute turned on. apparently passing a NUL...

split string by string
Hi All I am trying to split a string with other string. Does any one help me how to do this.I have "237943--;--dgsa78a--;--dasd732" I want to split with --;-- And ideas please thanks,Aruna.G     string contents = "237943--;--dgsa78a--;--dasd732"; you can do contents.split(';')which would split it on ; Thanks,Karan~ Remember To Mark The Post(s) That Helped You As The ANSWER ~ You can use String.Split(String[], StringSplitOptions)   string source = "237943--;--dgsa78a--;--dasd732"; string[] ...

String is empty But Lenght is not Zero
hiUsing Vs2005, Sql Server 2005, FormViewI use the lenght approach and not string = null because I found string=null did not return correct results.I need to check if a textbox is empty or not.Before anything is entered the textbox value is zero i.e Textbox.Text.Lenght==0When I enter a single value Textbox.Text.Lenght=1when the single value has been deleted and the string is Empty, the lenght is checked and the result is confusingTextbox.Text.Lenght=95 It should be textbox.Text.Lenght=0Could anyone advise me where I am going wrong and also what is the best way to determine an empty string?&n...

Web resources about - Zero-width split() match creates empty trailing strings but not empty leading strings - perl.perl5.porters

Zero the height and width of the PageView canvas before deleting. by nnethercote · Pull Request #4920 ...
... happen immediately, which means the pixel data can be held on to for a while. Robert O'Callahan told me that if you zero the - width - and - ...

Resources last updated: 12/20/2015 9:47:37 PM