split and nils?

Hi All,

What is with the starting ending Nils?  There are only four
elements, why now six?

And how to I correct this?

$ p6 'my Str $x="abcd";
      for split( "",@$x ).kv -> $i,$j {
      say "Index <$i> = <$j> = ord <" ~ ord($j) ~ ">";}'

Use of Nil in string context
   in block  at -e line 1
Index <0> = <> = ord <>         <----------------- nil ???
Index <1> = <a> = ord <97>
Index <2> = <b> = ord <98>
Index <3> = <c> = ord <99>
Index <4> = <d> = ord <100>
Use of Nil in string context
   in block  at -e line 1
Index <5> = <> = ord <>         <----------------- nil ???


Many thanks,
-T
0
perl6
2/6/2019 5:04:55 AM
perl.perl6.users 1200 articles. 0 followers. Follow

12 Replies
93 Views

Similar Articles

[PageSpeed] 2

The reason there is a Nil, is you asked for the ord of an empty string.

    "".ord =:= Nil

The reason there are two empty strings is you asked for them.

When you split with "", it will split on every character boundary,
which includes before the first character, and after the last.
That's literally what you asked for.

    my Str $x = "abcd";
    say split( "", $x ).perl;
    # ("", "a", "b", "c", "d", "").Seq

Perl6 doesn't treat this as a special case like other languages do.
You basically asked for this:

    say split( / <after .> | <before .> /, $x ).perl;
    # ("", "a", "b", "c", "d", "").Seq

Perl6 gave you what you asked for.

That is actually useful btw:

    say split( "", "abcd" ).join("|");
    # |a|b|c|d|

You should be using `comb` if you want a list of characters not `split`.

    # these are all identical
    'abcd'.comb.kv
    'abcd'.comb(1).kv
    comb( 1, 'abcd' ).kv

Also why did you add a pointless `@` to `$x` ?
(Actually I'm fairly sure I know why.)

On Tue, Feb 5, 2019 at 11:05 PM ToddAndMargo via perl6-users
<perl6-users@perl.org> wrote:
>
> Hi All,
>
> What is with the starting ending Nils?  There are only four
> elements, why now six?
>
> And how to I correct this?
>
> $ p6 'my Str $x="abcd";
>       for split( "",@$x ).kv -> $i,$j {
>       say "Index <$i> = <$j> = ord <" ~ ord($j) ~ ">";}'
>
> Use of Nil in string context
>    in block  at -e line 1
> Index <0> = <> = ord <>         <----------------- nil ???
> Index <1> = <a> = ord <97>
> Index <2> = <b> = ord <98>
> Index <3> = <c> = ord <99>
> Index <4> = <d> = ord <100>
> Use of Nil in string context
>    in block  at -e line 1
> Index <5> = <> = ord <>         <----------------- nil ???
>
>
> Many thanks,
> -T
0
b2gills
2/6/2019 1:19:06 PM
 > On Tue, Feb 5, 2019 at 11:05 PM ToddAndMargo via perl6-users
 > <perl6-users@perl.org> wrote:
 >>
 >> Hi All,
 >>
 >> What is with the starting ending Nils?  There are only four
 >> elements, why now six?
 >>
 >> And how to I correct this?
 >>
 >> $ p6 'my Str $x="abcd";
 >>        for split( "",@$x ).kv -> $i,$j {
 >>        say "Index <$i> = <$j> = ord <" ~ ord($j) ~ ">";}'
 >>
 >> Use of Nil in string context
 >>     in block  at -e line 1
 >> Index <0> = <> = ord <>         <----------------- nil ???
 >> Index <1> = <a> = ord <97>
 >> Index <2> = <b> = ord <98>
 >> Index <3> = <c> = ord <99>
 >> Index <4> = <d> = ord <100>
 >> Use of Nil in string context
 >>     in block  at -e line 1
 >> Index <5> = <> = ord <>         <----------------- nil ???
 >>
 >>
 >> Many thanks,
 >> -T

On 2/6/19 5:19 AM, Brad Gilbert wrote:
> The reason there is a Nil, is you asked for the ord of an empty string.
> 
>      "".ord =:= Nil
> 
> The reason there are two empty strings is you asked for them.
> 
> When you split with "", it will split on every character boundary,
> which includes before the first character, and after the last.
> That's literally what you asked for.
> 
>      my Str $x = "abcd";
>      say split( "", $x ).perl;
>      # ("", "a", "b", "c", "d", "").Seq
> 
> Perl6 doesn't treat this as a special case like other languages do.
> You basically asked for this:
> 
>      say split( / <after .> | <before .> /, $x ).perl;
>      # ("", "a", "b", "c", "d", "").Seq
> 
> Perl6 gave you what you asked for.
> 
> That is actually useful btw:
> 
>      say split( "", "abcd" ).join("|");
>      # |a|b|c|d|
> 
> You should be using `comb` if you want a list of characters not `split`.
> 
>      # these are all identical
>      'abcd'.comb.kv
>      'abcd'.comb(1).kv
>      comb( 1, 'abcd' ).kv
> 
> Also why did you add a pointless `@` to `$x` ?
> (Actually I'm fairly sure I know why.)
> 

Hi Brad,

Thank you!

So it is a "feature" of split.  Split sees the non-existent
index before the start and the non-existent index after
the end as something.  Mumble. Mumble.

To answer you question about the stray "@".  I forgot
to remove it.

But it brings up an inconsistency in Perl 6.

This works and also is the source of the stay "@" I forgot
to remove from the split example.


$ p6 'my Buf $x=Buf.new(0x66,0x61,0x62,0x63); for @$x.kv -> $i, $j {say 
"Index <$i> = <$j> = chr <" ~ chr($j) ~ ">";}'

Index <0> = <102> = chr <f>
Index <1> = <97> = chr <a>
Index <2> = <98> = chr <b>
Index <3> = <99> = chr <c>



So, this should also work, but does not:

$ p6 'my Str $x="abcd"; for @$x.kv -> $i, $j {say "Index <$i> = <$j> = 
ord <" ~ ord($j) ~ ">";}'

Index <0> = <abcd> = ord <97>


Strings only have one index (0) and why we have the substr command.

$ p6 'my Str $x="abcd"; say $x[0];'
abcd


So all the rules for other arrays go out the window for
a Str.  A string is an array of one cell.  And if I
truly want an array of characters, I need to use Buf
and not Str.  Only problem is that Str has all the cool
tools.

-T
0
perl6
2/6/2019 7:55:25 PM
On 2/6/19 5:19 AM, Brad Gilbert wrote:
> The reason there is a Nil, is you asked for the ord of an empty string.
> 
>      "".ord =:= Nil
> 
> The reason there are two empty strings is you asked for them.

What would be the most practice way of converting a string to and
array of characters?

$x="abc" goes to @y[0]="a", @y[1]="b", @y[2]="c"
0
perl6
2/6/2019 7:57:25 PM
--000000000000e420a105813f586f
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Brad told you already: use comb.


Le mer. 6 f=C3=A9vr. 2019 =C3=A0 20:57, ToddAndMargo via perl6-users <
perl6-users@perl.org> a =C3=A9crit :

> On 2/6/19 5:19 AM, Brad Gilbert wrote:
> > The reason there is a Nil, is you asked for the ord of an empty string.
> >
> >      "".ord =3D:=3D Nil
> >
> > The reason there are two empty strings is you asked for them.
>
> What would be the most practice way of converting a string to and
> array of characters?
>
> $x=3D"abc" goes to @y[0]=3D"a", @y[1]=3D"b", @y[2]=3D"c"
>

--000000000000e420a105813f586f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Brad told you already: use comb.</div><div><br></div>=
</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=
Le=C2=A0mer. 6 f=C3=A9vr. 2019 =C3=A0=C2=A020:57, ToddAndMargo via perl6-us=
ers &lt;<a href=3D"mailto:perl6-users@perl.org">perl6-users@perl.org</a>&gt=
; a =C3=A9crit=C2=A0:<br></div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex">On 2/6/19 5:19 AM, Brad Gilbert wrote:<br>
&gt; The reason there is a Nil, is you asked for the ord of an empty string=
..<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 &quot;&quot;.ord =3D:=3D Nil<br>
&gt; <br>
&gt; The reason there are two empty strings is you asked for them.<br>
<br>
What would be the most practice way of converting a string to and<br>
array of characters?<br>
<br>
$x=3D&quot;abc&quot; goes to @y[0]=3D&quot;a&quot;, @y[1]=3D&quot;b&quot;, =
@y[2]=3D&quot;c&quot;<br>
</blockquote></div>

--000000000000e420a105813f586f--
0
perl6
2/6/2019 8:12:27 PM
--000000000000a2aaf605814151f4
Content-Type: text/plain; charset="UTF-8"

On Wed, Feb 6, 2019, 11:57 AM ToddAndMargo via perl6-users <
perl6-users@perl.org> said


What would be the most practice way of converting a string to and
array of characters


Brad said-

You should be using `comb` if you want a list of characters not `split`.

# these are all identical
'abcd'.comb.kv
'abcd'.comb(1).kv
comb( 1, 'abcd' ).kv

--000000000000a2aaf605814151f4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div><br>On Wed, Feb 6, 2019, 11:57 AM ToddAndMargo via perl6-users &lt;<a =
href=3D"mailto:perl6-users@perl.org">perl6-users@perl.org</a>&gt; said<br><=
/div><div><br></div><div><br>What would be the most practice way of convert=
ing a string to and<br>array of characters<br></div><div><br></div><div><br=
></div>Brad said-<div><br>You should be using `comb` if you want a list of =
characters not `split`.<br><br>    # these are all identical<br>    &#39;ab=
cd&#39;.comb.kv<br>    &#39;abcd&#39;.comb(1).kv<br>    comb( 1, &#39;abcd&=
#39; ).kv<br><div class=3D"gmail_quote"><div dir=3D"ltr">=C2=A0</div></div>=
</div>

--000000000000a2aaf605814151f4--
0
not
2/6/2019 8:17:36 PM
>>     On 2/6/19 5:19 AM, Brad Gilbert wrote:
>>      > The reason there is a Nil, is you asked for the ord of an empty=

>>     string.
>>      >
>>      >      "".ord =3D:=3D Nil
>>      >
>>      > The reason there are two empty strings is you asked for them.
>>=20
>>     What would be the most practice [practical] way of converting a st=
ring to and
>>     array of characters?
>>=20
>>     $x=3D"abc" goes to @y[0]=3D"a", @y[1]=3D"b", @y[2]=3D"c"
>>=20


On 2/6/19 12:12 PM, Laurent Rosenfeld via perl6-users wrote:
> Brad told you already: use comb.
>=20
>=20
> Le=C2=A0mer. 6 f=C3=A9vr. 2019 =C3=A0=C2=A020:57, ToddAndMargo via perl=
6-users=20
> <perl6-users@perl.org <mailto:perl6-users@perl.org>> a =C3=A9crit=C2=A0=
:
>=20


Hi Laurent,

Pretty!  Thank you!

$ p6 'my Str $x=3D"abcd"; for $x.comb.kv -> $i, $j {say "Index <$i> =3D <=
$j>=20
=3D ord <" ~ ord($j) ~ ">";}'

Index <0> =3D <a> =3D ord <97>
Index <1> =3D <b> =3D ord <98>
Index <2> =3D <c> =3D ord <99>
Index <3> =3D <d> =3D ord <100>

Certainly very practical.  If dealing with large strings, is
it the most efficient?

-T
0
perl6
2/6/2019 8:38:01 PM
On Wed, Feb 06, 2019 at 12:38:01PM -0800, ToddAndMargo via perl6-users wrote:
> $ p6 'my Str $x="abcd"; for $x.comb.kv -> $i, $j {say "Index <$i> = <$j> =
> ord <" ~ ord($j) ~ ">";}'
> 
> Index <0> = <a> = ord <97>
> Index <1> = <b> = ord <98>
> Index <2> = <c> = ord <99>
> Index <3> = <d> = ord <100>
> 
> Certainly very practical.  If dealing with large strings, is
> it the most efficient?

..comb is intended to be more efficient than .split for this particular application, yes.

"comb" is about obtaining the substrings you're looking for (individual characters in this case); "split" is about finding substrings between the things you're looking for.

Pm
0
pmichaud
2/6/2019 8:58:37 PM
On 2/6/19 12:58 PM, Patrick R. Michaud wrote:
> On Wed, Feb 06, 2019 at 12:38:01PM -0800, ToddAndMargo via perl6-users wrote:
>> $ p6 'my Str $x="abcd"; for $x.comb.kv -> $i, $j {say "Index <$i> = <$j> =
>> ord <" ~ ord($j) ~ ">";}'
>>
>> Index <0> = <a> = ord <97>
>> Index <1> = <b> = ord <98>
>> Index <2> = <c> = ord <99>
>> Index <3> = <d> = ord <100>
>>
>> Certainly very practical.  If dealing with large strings, is
>> it the most efficient?
> 
> .comb is intended to be more efficient than .split for this particular application, yes.
> 
> "comb" is about obtaining the substrings you're looking for (individual characters in this case); "split" is about finding substrings between the things you're looking for.
> 
> Pm
> 

Thank you!
0
perl6
2/6/2019 9:01:48 PM
--0000000000003a15ed0581404bbb
Content-Type: text/plain; charset="UTF-8"

Leave off the '.kv' to get a Seq (array-like-thing)

> my $letters='abc def g';

abc def g

> $letters.comb().perl

("a", "b", "c", " ", "d", "e", "f", " ", "g").Seq

> ($letters.comb())[0,3,4,6]

(a   d f)

> my @letter_array = $letters.comb()

[a b c   d e f   g]

> @letter_array[1]

b

-y


On Wed, Feb 6, 2019 at 1:02 PM ToddAndMargo via perl6-users <
perl6-users@perl.org> wrote:

> On 2/6/19 12:58 PM, Patrick R. Michaud wrote:
> > On Wed, Feb 06, 2019 at 12:38:01PM -0800, ToddAndMargo via perl6-users
> wrote:
> >> $ p6 'my Str $x="abcd"; for $x.comb.kv -> $i, $j {say "Index <$i> =
> <$j> =
> >> ord <" ~ ord($j) ~ ">";}'
> >>
> >> Index <0> = <a> = ord <97>
> >> Index <1> = <b> = ord <98>
> >> Index <2> = <c> = ord <99>
> >> Index <3> = <d> = ord <100>
> >>
> >> Certainly very practical.  If dealing with large strings, is
> >> it the most efficient?
> >
> > .comb is intended to be more efficient than .split for this particular
> application, yes.
> >
> > "comb" is about obtaining the substrings you're looking for (individual
> characters in this case); "split" is about finding substrings between the
> things you're looking for.
> >
> > Pm
> >
>
> Thank you!
>

--0000000000003a15ed0581404bbb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div di=
r=3D"ltr"><div dir=3D"ltr">Leave off the &#39;.kv&#39; to get a Seq (array-=
like-thing)</div><div dir=3D"ltr"><br><div><p style=3D"margin:0px;font-stre=
tch:normal;font-size:12px;line-height:normal;font-family:Courier;color:rgb(=
59,35,34)"><span style=3D"font-variant-ligatures:no-common-ligatures;backgr=
ound-color:rgb(255,255,255)">&gt; my $letters=3D&#39;abc def g&#39;;</span>=
</p>
<p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:norma=
l;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-liga=
tures:no-common-ligatures;background-color:rgb(255,255,255)">abc def g</spa=
n></p><div><span style=3D"font-variant-ligatures:no-common-ligatures;backgr=
ound-color:rgb(255,255,255)"><p style=3D"margin:0px;font-stretch:normal;fon=
t-size:12px;line-height:normal;font-family:Courier;color:rgb(59,35,34)"><sp=
an style=3D"font-variant-ligatures:no-common-ligatures">&gt; $letters.comb(=
).perl</span></p>
<p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:norma=
l;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-liga=
tures:no-common-ligatures">(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &q=
uot; &quot;, &quot;d&quot;, &quot;e&quot;, &quot;f&quot;, &quot; &quot;, &q=
uot;g&quot;).Seq</span></p><p style=3D"margin:0px;font-stretch:normal;font-=
size:12px;line-height:normal;font-family:Courier;color:rgb(59,35,34)"><span=
 style=3D"font-variant-ligatures:no-common-ligatures">&gt; ($letters.comb()=
)[0,3,4,6]</span></p><p style=3D"margin:0px;font-stretch:normal;font-size:1=
2px;line-height:normal;font-family:Courier;color:rgb(59,35,34)"><span style=
=3D"font-variant-ligatures:no-common-ligatures">
</span></p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-h=
eight:normal;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-v=
ariant-ligatures:no-common-ligatures">(a =C2=A0 d f)</span></p><p style=3D"=
margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-famil=
y:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-ligatures:no-com=
mon-ligatures">&gt; my @letter_array =3D $letters.comb()</span></p><p style=
=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-f=
amily:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-ligatures:no=
-common-ligatures">
</span></p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-h=
eight:normal;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-v=
ariant-ligatures:no-common-ligatures">[a b c =C2=A0 d e f =C2=A0 g]</span><=
/p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:no=
rmal;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-l=
igatures:no-common-ligatures">&gt; @letter_array[1]</span></p><p style=3D"m=
argin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family=
:Courier;color:rgb(59,35,34)"><span style=3D"font-variant-ligatures:no-comm=
on-ligatures">
</span></p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-h=
eight:normal;font-family:Courier;color:rgb(59,35,34)"><span style=3D"font-v=
ariant-ligatures:no-common-ligatures">b</span></p></span></div><div><div di=
r=3D"ltr" class=3D"gmail_signature"><br></div><div dir=3D"ltr" class=3D"gma=
il_signature">-y<br></div></div><br></div></div></div></div></div></div></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On =
Wed, Feb 6, 2019 at 1:02 PM ToddAndMargo via perl6-users &lt;<a href=3D"mai=
lto:perl6-users@perl.org">perl6-users@perl.org</a>&gt; wrote:<br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);paddi=
ng-left:1ex">On 2/6/19 12:58 PM, Patrick R. Michaud wrote:<br>
&gt; On Wed, Feb 06, 2019 at 12:38:01PM -0800, ToddAndMargo via perl6-users=
 wrote:<br>
&gt;&gt; $ p6 &#39;my Str $x=3D&quot;abcd&quot;; for $x.comb.kv -&gt; $i, $=
j {say &quot;Index &lt;$i&gt; =3D &lt;$j&gt; =3D<br>
&gt;&gt; ord &lt;&quot; ~ ord($j) ~ &quot;&gt;&quot;;}&#39;<br>
&gt;&gt;<br>
&gt;&gt; Index &lt;0&gt; =3D &lt;a&gt; =3D ord &lt;97&gt;<br>
&gt;&gt; Index &lt;1&gt; =3D &lt;b&gt; =3D ord &lt;98&gt;<br>
&gt;&gt; Index &lt;2&gt; =3D &lt;c&gt; =3D ord &lt;99&gt;<br>
&gt;&gt; Index &lt;3&gt; =3D &lt;d&gt; =3D ord &lt;100&gt;<br>
&gt;&gt;<br>
&gt;&gt; Certainly very practical.=C2=A0 If dealing with large strings, is<=
br>
&gt;&gt; it the most efficient?<br>
&gt; <br>
&gt; .comb is intended to be more efficient than .split for this particular=
 application, yes.<br>
&gt; <br>
&gt; &quot;comb&quot; is about obtaining the substrings you&#39;re looking =
for (individual characters in this case); &quot;split&quot; is about findin=
g substrings between the things you&#39;re looking for.<br>
&gt; <br>
&gt; Pm<br>
&gt; <br>
<br>
Thank you!<br>
</blockquote></div>

--0000000000003a15ed0581404bbb--
0
not
2/6/2019 9:20:12 PM
>> On Wed, Feb 6, 2019 at 1:02 PM ToddAndMargo via perl6-users=20
>> <perl6-users@perl.org <mailto:perl6-users@perl.org>> wrote:
>>=20
>>     On 2/6/19 12:58 PM, Patrick R. Michaud wrote:
>>      > On Wed, Feb 06, 2019 at 12:38:01PM -0800, ToddAndMargo via
>>     perl6-users wrote:
>>      >> $ p6 'my Str $x=3D"abcd"; for $x.comb.kv -> $i, $j {say "Index=

>>     <$i> =3D <$j> =3D
>>      >> ord <" ~ ord($j) ~ ">";}'
>>      >>
>>      >> Index <0> =3D <a> =3D ord <97>
>>      >> Index <1> =3D <b> =3D ord <98>
>>      >> Index <2> =3D <c> =3D ord <99>
>>      >> Index <3> =3D <d> =3D ord <100>
>>      >>
>>      >> Certainly very practical.  If dealing with large strings, is
>>      >> it the most efficient?
>>      >
>>      > .comb is intended to be more efficient than .split for this
>>     particular application, yes.
>>      >
>>      > "comb" is about obtaining the substrings you're looking for
>>     (individual characters in this case); "split" is about finding
>>     substrings between the things you're looking for.
>>      >
>>      > Pm
>>      >
>>=20
>>     Thank you!
>>=20

On 2/6/19 1:20 PM, yary wrote:
> Leave off the '.kv' to get a Seq (array-like-thing)
>=20
>> my $letters=3D'abc def g';
>=20
> abc def g
>=20
>> $letters.comb().perl
>=20
> ("a", "b", "c", " ", "d", "e", "f", " ", "g").Seq
>=20
>> ($letters.comb())[0,3,4,6]
>=20
> (a =C2=A0 d f)
>=20
>> my @letter_array =3D $letters.comb()
>=20
> [a b c =C2=A0 d e f =C2=A0 g]
>=20
>> @letter_array[1]
>=20
> b
>=20
>=20
> -y

Hi Yary,

Thank you!

I do the

$x.comb.kv -> $i, $j {say "Index  <$i> =3D <$j> =3D ord <" ~ ord($j) ~ ">=
";}'

thing a lot as I am constantly reading web pages and sometimes
the unprintable bizarre characters I get can astound.  So
I am looking to see what the ascii values are of what I am
actually reading.

One trick I use is if there is weird stuff I can not see after
what I want is to use greedy and regex:

    "abc" ~ weird ~ weird ~ weird ~~ s/ abc .* /abc/;

Which could also be written

    "abc" ~ weird ~ weird ~ weird ~~ s/ (abc) .*/$0/;

Problem solved and I did not even have to figure out how
to drop a 0x01.

Regex's are kind of fun, well, once you get the hang of them.

-T
0
perl6
2/6/2019 10:33:27 PM
On 2/6/19 12:17 PM, yary wrote:
> 
> On Wed, Feb 6, 2019, 11:57 AM ToddAndMargo via perl6-users 
> <perl6-users@perl.org <mailto:perl6-users@perl.org>> said
> 
> 
> What would be the most practice way of converting a string to and
> array of characters
> 
> 
> Brad said-
> 
> You should be using `comb` if you want a list of characters not `split`.
> 
> # these are all identical
> 'abcd'.comb.kv
> 'abcd'.comb(1).kv
> comb( 1, 'abcd' ).kv



And you read through my typos too!

$ p6 'my Str $x="abcd"; for $x.comb.kv -> $i, $j {say "Index <$i> = <$j> 
= ord <" ~ ord($j) ~ ">";}'

Index <0> = <a> = ord <97>
Index <1> = <b> = ord <98>
Index <2> = <c> = ord <99>
Index <3> = <d> = ord <100>
0
perl6
2/6/2019 10:36:24 PM
First off a Str is a singular value, not a list.

Which is a good thing.

    my $a =3D "abc";
    my $b =3D "\c[COMBINING ACUTE ACCENT]";

    say $a.chars; # 3
    say $b.chars; # 1

If they were a list, then combining them should create something that
is 4 chars long, but it doesn't.

    say ($a ~ $b).chars; # 3
    say "$a$b".chars; # 3

Also let's look at the Unicode names for the characters

    .say for $a.uninames;
    # LATIN SMALL LETTER A
    # LATIN SMALL LETTER B
    # LATIN SMALL LETTER C

    .say for $b.uninames;
    # COMBINING ACUTE ACCENT

Now what is the Unicode names for the combined Str?

    .say for ($a ~ $b).uninames;
    # LATIN SMALL LETTER A
    # LATIN SMALL LETTER B
    # LATIN SMALL LETTER C WITH ACUTE

---

The `@` is also perfectly consistent.

    my $a =3D [1,2];
    my $b =3D 'abc';
    my $c =3D 45;
    my $d =3D List;
    my $e =3D List.new;

    say $a.perl;   # $[1, 2] # The `$` makes it act like a singular value
    say $b.perl;   # "abc"
    say $c.perl;   # 45
    say $d.perl;   # List
    say $e.perl;   # $( )

    say @$a.perl;  # [1, 2]
    say @$b.perl;  # ("abc",)
    say @$c.perl;  # (45,)
    say @$d.perl;  # (List,)
    say @$e.perl;  # ()

The `@$=E2=80=A6` is short for `@($=E2=80=A6)`

    say @('abc').perl; # ("abc",)
    say @(45).perl; #(45,)
    say @(List).perl; # (List,)
    say @(List.new).perl; # ()

It removes the item context from something that is [an instance of] a list,=
 or
it creates a single element list with that value.

Basically @(=E2=80=A6) always returns a list, and singular items act like a
list with a single value in them.

    say 123.elems; # 1
    say "abc".elems; # 1

    say "".elems; # 1

---

Since a Str is an opaque object, the internals can store them however they =
like.

Someone posted Perl6 code and the C[++] equivalent in #perl6 once.
They reported that the Perl6 code was faster.
My guess is that since C[++] treats strings as an array it has to copy
the strings repeatedly.

So Perl6 is faster because a Str is a singular object.

The way MoarVM deals with strings structurally similar to the
following Perl6ish pseudo code:
(There are mistakes, but they should not detract from it I hope. Some
of the mistakes are even intentional.)

    role STRING {}

    # strings that are valid ASCII
    class STRING_RAW8 does STRING {
        has int32 $.length;
        has Buf[int8] $.buffer;
    }
    # strings which contain Unicode outside of the ASCII range
    class STRING_NFG32 does STRING {
        has int32 $.length;
        has Buf[int32] $.buffer;
    }
    # strings made from other strings
    class STRING_CONCAT does STRING {
        has STRING @.a;
    }
    # a string made out of part of another string
    class STRING_SUBSTR does STRING {
        has STRING $.ref;
        has int32 $.position;
        has int32 $.length;
    }

So when you write something like this:

    us v6;
    my $a =3D "123";
    my $b =3D "=E2=85=92";
    my $c =3D "$a and $b";

That turns into this:

    # pseudo Perl6ish
    my Str $a =3D STRING_RAW8( "123" );
    my Str $b =3D STRING_NFG32( "=E2=85=92" );
    my Str $TEMP =3D STRING_RAW8( " and " );
    my Str $c =3D STRING_CONCAT( $a, $TEMP, $b );

If you then get a substring out of it:

    us v6;
    my Str $d =3D $c.substr(0,2);

It does something like

    # pseudo Perl6ish
    my STRING $d =3D STRING_SUBSTR( $c, 0, 2 );

    # the whole structure
    STRING_SUBSTR(
        STRING_CONCAT(
            STRING_RAW8( "123" ),
            STRING_RAW8( " and " ),
            STRING_NFG32( "=E2=85=92" )
        ),
        0,
        2
    )

At no point was the contents of the STRING ever copied.
In fact it didn't have to read the contents of the STRING at all.

(In the Real World, string concatenation does have to look at the
first and last characters of each segment for ones that will combine.)

---

Basically C has to copy the contents of strings while MoarVM can just
copy pointers to string objects.

If Perl6 treated strings as an Array then some of this performance
improvement wouldn't quite work as well.

Let's pretend that it acts like an Array:

    us v6;
    my Str $e =3D $c[0,1];

That would result in the following Per6ish VM code:

    # pseudo Perl6ish
    my $e =3D STRING_CONCAT( STRING_SUBSTR( $c, 0, 1 ), STRING_SUBSTR(
$c, 1, 1 ) );

    # the whole structure
    STRING_CONCAT(
        STRING_SUBSTR(
            STRING_CONCAT(
                STRING_RAW8( "123" ),
                STRING_RAW8( " and " ),
                STRING_NFG32( "=E2=85=92" )
            ),
            0,
            1
        ),
        STRING_SUBSTR(
            STRING_CONCAT(
                STRING_RAW8( "123" ),
                STRING_RAW8( " and " ),
                STRING_NFG32( "=E2=85=92" )
            ),
            1,
            1
        )
    )

Translating that back into real Perl6

    use v6;
    my Str $e =3D substr( $c, 0, 1 ) ~ substring( $c, 1, 1 );

So if Perl6 did treat Str as an Array, then it would be slower, and
use more memory.
It also might not be able handle Unicode correctly.

Also my guess is that the majority of string related bugs in other language=
s are
caused by them treating strings as an array of characters.


On Wed, Feb 6, 2019 at 1:56 PM ToddAndMargo via perl6-users
<perl6-users@perl.org> wrote:
>
>  > On Tue, Feb 5, 2019 at 11:05 PM ToddAndMargo via perl6-users
>  > <perl6-users@perl.org> wrote:
>  >>
>  >> Hi All,
>  >>
>  >> What is with the starting ending Nils?  There are only four
>  >> elements, why now six?
>  >>
>  >> And how to I correct this?
>  >>
>  >> $ p6 'my Str $x=3D"abcd";
>  >>        for split( "",@$x ).kv -> $i,$j {
>  >>        say "Index <$i> =3D <$j> =3D ord <" ~ ord($j) ~ ">";}'
>  >>
>  >> Use of Nil in string context
>  >>     in block  at -e line 1
>  >> Index <0> =3D <> =3D ord <>         <----------------- nil ???
>  >> Index <1> =3D <a> =3D ord <97>
>  >> Index <2> =3D <b> =3D ord <98>
>  >> Index <3> =3D <c> =3D ord <99>
>  >> Index <4> =3D <d> =3D ord <100>
>  >> Use of Nil in string context
>  >>     in block  at -e line 1
>  >> Index <5> =3D <> =3D ord <>         <----------------- nil ???
>  >>
>  >>
>  >> Many thanks,
>  >> -T
>
> On 2/6/19 5:19 AM, Brad Gilbert wrote:
> > The reason there is a Nil, is you asked for the ord of an empty string.
> >
> >      "".ord =3D:=3D Nil
> >
> > The reason there are two empty strings is you asked for them.
> >
> > When you split with "", it will split on every character boundary,
> > which includes before the first character, and after the last.
> > That's literally what you asked for.
> >
> >      my Str $x =3D "abcd";
> >      say split( "", $x ).perl;
> >      # ("", "a", "b", "c", "d", "").Seq
> >
> > Perl6 doesn't treat this as a special case like other languages do.
> > You basically asked for this:
> >
> >      say split( / <after .> | <before .> /, $x ).perl;
> >      # ("", "a", "b", "c", "d", "").Seq
> >
> > Perl6 gave you what you asked for.
> >
> > That is actually useful btw:
> >
> >      say split( "", "abcd" ).join("|");
> >      # |a|b|c|d|
> >
> > You should be using `comb` if you want a list of characters not `split`=
..
> >
> >      # these are all identical
> >      'abcd'.comb.kv
> >      'abcd'.comb(1).kv
> >      comb( 1, 'abcd' ).kv
> >
> > Also why did you add a pointless `@` to `$x` ?
> > (Actually I'm fairly sure I know why.)
> >
>
> Hi Brad,
>
> Thank you!
>
> So it is a "feature" of split.  Split sees the non-existent
> index before the start and the non-existent index after
> the end as something.  Mumble. Mumble.
>
> To answer you question about the stray "@".  I forgot
> to remove it.
>
> But it brings up an inconsistency in Perl 6.
>
> This works and also is the source of the stay "@" I forgot
> to remove from the split example.
>
>
> $ p6 'my Buf $x=3DBuf.new(0x66,0x61,0x62,0x63); for @$x.kv -> $i, $j {say
> "Index <$i> =3D <$j> =3D chr <" ~ chr($j) ~ ">";}'
>
> Index <0> =3D <102> =3D chr <f>
> Index <1> =3D <97> =3D chr <a>
> Index <2> =3D <98> =3D chr <b>
> Index <3> =3D <99> =3D chr <c>
>
>
>
> So, this should also work, but does not:
>
> $ p6 'my Str $x=3D"abcd"; for @$x.kv -> $i, $j {say "Index <$i> =3D <$j> =
=3D
> ord <" ~ ord($j) ~ ">";}'
>
> Index <0> =3D <abcd> =3D ord <97>
>
>
> Strings only have one index (0) and why we have the substr command.
>
> $ p6 'my Str $x=3D"abcd"; say $x[0];'
> abcd
>
>
> So all the rules for other arrays go out the window for
> a Str.  A string is an array of one cell.  And if I
> truly want an array of characters, I need to use Buf
> and not Str.  Only problem is that Str has all the cool
> tools.
>
> -T
0
b2gills
2/7/2019 4:35:56 PM
Reply: