Fastest way to convert from a Buf to a Str?

Hi All,

I need to read a file into a buffer (NO CONVERSIONS!)
and then convert it to a string (again with no
conversions).

I have been doing this:

    for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }

But it takes a bit of time.  What is the fastest way to do this?

I guess there is not a way to create/declare a variable that is
both Buf and Str at the same time?  That would mean I did not
have to convert anything.  I use to get away with this under
Module 2 all the time.

$ p6 'my $B = Buf.new(0x66, 0x66, 0x77); $B.Str ~= "z";'
Cannot use a Buf as a string, but you called the Str method on it
   in block <unit> at -e line 1

$ p6 'my $B = Buf.new(0x66, 0x66, 0x77); Str($B) ~= "z";'
Cannot use a Buf as a string, but you called the Str method on it
   in block <unit> at -e line 1


Many thanks,
-T

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A computer without Microsoft is like
a chocolate cake without the mustard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0
perl6
2/3/2019 3:22:43 AM
perl.perl6.users 1158 articles. 0 followers. Follow

6 Replies
39 Views

Similar Articles

[PageSpeed] 32

This:

    for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }

is better written as

    my $StrFile = $BinaryFile.map(*.chr).reduce(* ~ *);

It is also exactly equivalent to just e

    # if $BinaryFile is a Buf
    my $StrFile = $BinaryFile.decode('latin1');

    # if it isn't
    my $StrFile = Buf.new($BinaryFile).decode('latin1');

If you don't otherwise need $BinaryFile

    my $fh = open 'test', :enc('latin1');
    my $StrFile = $fh.slurp;

or

    my $StrFile = 'test'.IO.slurp(:enc('latin1'));

---

Buf and Str used to be treated more alike, and it was very confusing.

There should be more methods on Buf that work like the methods on Str,
but that is about it.

Having a string act like a buffer in Modula 2 probably works fine
because it barely supports Unicode at all.

Here is an example of why it can't work like that in Perl6:

    my $a = 'a';
    my $b = "\c[COMBINING ACUTE ACCENT]";

    my $c = $a ~ $b;
    my $d = $a.encode ~ $b.encode;
    my $e = Buf.new($a.encode) ~ Buf.new($b.encode);

    say $a.encode; # utf8:0x<61>
    say $b.encode; # utf8:0x<CC 81>

    say $c.encode; # utf8:0x<C3 A1>

    say $d; # utf8:0x<61 CC 81>
    say $e; # Buf:0x<61 CC 81>

Notice that `$c.encode` and `$d` are different even though they are
made from the same parts.
`$d` and `$e` are similar because they are dealing with lists of
numbers not strings.

On Sat, Feb 2, 2019 at 9:23 PM ToddAndMargo via perl6-users
<perl6-users@perl.org> wrote:
>
> Hi All,
>
> I need to read a file into a buffer (NO CONVERSIONS!)
> and then convert it to a string (again with no
> conversions).
>
> I have been doing this:
>
>     for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }
>
> But it takes a bit of time.  What is the fastest way to do this?
>
> I guess there is not a way to create/declare a variable that is
> both Buf and Str at the same time?  That would mean I did not
> have to convert anything.  I use to get away with this under
> Module 2 all the time.
>
> $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); $B.Str ~= "z";'
> Cannot use a Buf as a string, but you called the Str method on it
>    in block <unit> at -e line 1
>
> $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); Str($B) ~= "z";'
> Cannot use a Buf as a string, but you called the Str method on it
>    in block <unit> at -e line 1
>
>
> Many thanks,
> -T
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> A computer without Microsoft is like
> a chocolate cake without the mustard
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0
b2gills
2/3/2019 6:15:51 AM
 >
 > On Sat, Feb 2, 2019 at 9:23 PM ToddAndMargo via perl6-users
 > <perl6-users@perl.org> wrote:
 >>
 >> Hi All,
 >>
 >> I need to read a file into a buffer (NO CONVERSIONS!)
 >> and then convert it to a string (again with no
 >> conversions).
 >>
 >> I have been doing this:
 >>
 >>      for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }
 >>
 >> But it takes a bit of time.  What is the fastest way to do this?
 >>
 >> I guess there is not a way to create/declare a variable that is
 >> both Buf and Str at the same time?  That would mean I did not
 >> have to convert anything.  I use to get away with this under
 >> Module 2 all the time.
 >>
 >> $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); $B.Str ~= "z";'
 >> Cannot use a Buf as a string, but you called the Str method on it
 >>     in block <unit> at -e line 1
 >>
 >> $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); Str($B) ~= "z";'
 >> Cannot use a Buf as a string, but you called the Str method on it
 >>     in block <unit> at -e line 1
 >>
 >>
 >> Many thanks,
 >> -T



On 2/2/19 10:15 PM, Brad Gilbert wrote:
> This:
> 
>      for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }
> 
> is better written as
> 
>      my $StrFile = $BinaryFile.map(*.chr).reduce(* ~ *);
> 
> It is also exactly equivalent to just e
> 
>      # if $BinaryFile is a Buf
>      my $StrFile = $BinaryFile.decode('latin1');
> 
>      # if it isn't
>      my $StrFile = Buf.new($BinaryFile).decode('latin1');
> 
> If you don't otherwise need $BinaryFile
> 
>      my $fh = open 'test', :enc('latin1');
>      my $StrFile = $fh.slurp;
> 
> or
> 
>      my $StrFile = 'test'.IO.slurp(:enc('latin1'));
> 
> ---
> 
> Buf and Str used to be treated more alike, and it was very confusing.
> 
> There should be more methods on Buf that work like the methods on Str,
> but that is about it.
> 
> Having a string act like a buffer in Modula 2 probably works fine
> because it barely supports Unicode at all.
> 
> Here is an example of why it can't work like that in Perl6:
> 
>      my $a = 'a';
>      my $b = "\c[COMBINING ACUTE ACCENT]";
> 
>      my $c = $a ~ $b;
>      my $d = $a.encode ~ $b.encode;
>      my $e = Buf.new($a.encode) ~ Buf.new($b.encode);
> 
>      say $a.encode; # utf8:0x<61>
>      say $b.encode; # utf8:0x<CC 81>
> 
>      say $c.encode; # utf8:0x<C3 A1>
> 
>      say $d; # utf8:0x<61 CC 81>
>      say $e; # Buf:0x<61 CC 81>
> 
> Notice that `$c.encode` and `$d` are different even though they are
> made from the same parts.
> `$d` and `$e` are similar because they are dealing with lists of
> numbers not strings.

Hi Brad,

Thank you!

I want ZERO decoding.  I want exactly the same bytes in the
string as are in the Buffer.  And it has to be done FAST.

Are you saying this the fastest way?

     my $StrFile = $BinaryFile.map(*.chr).reduce(* ~ *);

Please keep in mind.  NO DECODING!

-T
0
perl6
2/3/2019 6:29:57 AM
--00000000000008d3d50580fa62b5
Content-Type: text/plain; charset="UTF-8"

Are all characters in the range 0-255, ie latin-1 characters?

You could then try: my $str =  $buf.decode("latin-1");

There's one potential  issue if your data could contain DOS end of lines
("\r\n"), which will get translated to a single logical "\n" in the decoded
string.

- David


On Sun, Feb 3, 2019 at 7:16 PM Brad Gilbert <b2gills@gmail.com> wrote:

> This:
>
>     for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }
>
> is better written as
>
>     my $StrFile = $BinaryFile.map(*.chr).reduce(* ~ *);
>
> It is also exactly equivalent to just e
>
>     # if $BinaryFile is a Buf
>     my $StrFile = $BinaryFile.decode('latin1');
>
>     # if it isn't
>     my $StrFile = Buf.new($BinaryFile).decode('latin1');
>
> If you don't otherwise need $BinaryFile
>
>     my $fh = open 'test', :enc('latin1');
>     my $StrFile = $fh.slurp;
>
> or
>
>     my $StrFile = 'test'.IO.slurp(:enc('latin1'));
>
> ---
>
> Buf and Str used to be treated more alike, and it was very confusing.
>
> There should be more methods on Buf that work like the methods on Str,
> but that is about it.
>
> Having a string act like a buffer in Modula 2 probably works fine
> because it barely supports Unicode at all.
>
> Here is an example of why it can't work like that in Perl6:
>
>     my $a = 'a';
>     my $b = "\c[COMBINING ACUTE ACCENT]";
>
>     my $c = $a ~ $b;
>     my $d = $a.encode ~ $b.encode;
>     my $e = Buf.new($a.encode) ~ Buf.new($b.encode);
>
>     say $a.encode; # utf8:0x<61>
>     say $b.encode; # utf8:0x<CC 81>
>
>     say $c.encode; # utf8:0x<C3 A1>
>
>     say $d; # utf8:0x<61 CC 81>
>     say $e; # Buf:0x<61 CC 81>
>
> Notice that `$c.encode` and `$d` are different even though they are
> made from the same parts.
> `$d` and `$e` are similar because they are dealing with lists of
> numbers not strings.
>
> On Sat, Feb 2, 2019 at 9:23 PM ToddAndMargo via perl6-users
> <perl6-users@perl.org> wrote:
> >
> > Hi All,
> >
> > I need to read a file into a buffer (NO CONVERSIONS!)
> > and then convert it to a string (again with no
> > conversions).
> >
> > I have been doing this:
> >
> >     for ( @$BinaryFile ) -> $Char { $StrFile ~= chr($Char); }
> >
> > But it takes a bit of time.  What is the fastest way to do this?
> >
> > I guess there is not a way to create/declare a variable that is
> > both Buf and Str at the same time?  That would mean I did not
> > have to convert anything.  I use to get away with this under
> > Module 2 all the time.
> >
> > $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); $B.Str ~= "z";'
> > Cannot use a Buf as a string, but you called the Str method on it
> >    in block <unit> at -e line 1
> >
> > $ p6 'my $B = Buf.new(0x66, 0x66, 0x77); Str($B) ~= "z";'
> > Cannot use a Buf as a string, but you called the Str method on it
> >    in block <unit> at -e line 1
> >
> >
> > Many thanks,
> > -T
> >
> > --
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > A computer without Microsoft is like
> > a chocolate cake without the mustard
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

--00000000000008d3d50580fa62b5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Are all characters in the range 0-255, ie latin-1 characte=
rs?<div><br></div><div>You could then try: my $str =3D=C2=A0 $buf.decode(&q=
uot;latin-1&quot;);</div><div><br></div><div>There&#39;s one potential=C2=
=A0 issue if your data could contain DOS end of lines (&quot;\r\n&quot;), w=
hich will get translated to a single logical &quot;\n&quot; in the decoded =
string.</div><div><br></div><div>- David</div><div><br></div></div><br><div=
 class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun, Feb 3,=
 2019 at 7:16 PM Brad Gilbert &lt;<a href=3D"mailto:b2gills@gmail.com">b2gi=
lls@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex">This:<br>
<br>
=C2=A0 =C2=A0 for ( @$BinaryFile ) -&gt; $Char { $StrFile ~=3D chr($Char); =
}<br>
<br>
is better written as<br>
<br>
=C2=A0 =C2=A0 my $StrFile =3D $BinaryFile.map(*.chr).reduce(* ~ *);<br>
<br>
It is also exactly equivalent to just e<br>
<br>
=C2=A0 =C2=A0 # if $BinaryFile is a Buf<br>
=C2=A0 =C2=A0 my $StrFile =3D $BinaryFile.decode(&#39;latin1&#39;);<br>
<br>
=C2=A0 =C2=A0 # if it isn&#39;t<br>
=C2=A0 =C2=A0 my $StrFile =3D Buf.new($BinaryFile).decode(&#39;latin1&#39;)=
;<br>
<br>
If you don&#39;t otherwise need $BinaryFile<br>
<br>
=C2=A0 =C2=A0 my $fh =3D open &#39;test&#39;, :enc(&#39;latin1&#39;);<br>
=C2=A0 =C2=A0 my $StrFile =3D $fh.slurp;<br>
<br>
or<br>
<br>
=C2=A0 =C2=A0 my $StrFile =3D &#39;test&#39;.IO.slurp(:enc(&#39;latin1&#39;=
));<br>
<br>
---<br>
<br>
Buf and Str used to be treated more alike, and it was very confusing.<br>
<br>
There should be more methods on Buf that work like the methods on Str,<br>
but that is about it.<br>
<br>
Having a string act like a buffer in Modula 2 probably works fine<br>
because it barely supports Unicode at all.<br>
<br>
Here is an example of why it can&#39;t work like that in Perl6:<br>
<br>
=C2=A0 =C2=A0 my $a =3D &#39;a&#39;;<br>
=C2=A0 =C2=A0 my $b =3D &quot;\c[COMBINING ACUTE ACCENT]&quot;;<br>
<br>
=C2=A0 =C2=A0 my $c =3D $a ~ $b;<br>
=C2=A0 =C2=A0 my $d =3D $a.encode ~ $b.encode;<br>
=C2=A0 =C2=A0 my $e =3D Buf.new($a.encode) ~ Buf.new($b.encode);<br>
<br>
=C2=A0 =C2=A0 say $a.encode; # utf8:0x&lt;61&gt;<br>
=C2=A0 =C2=A0 say $b.encode; # utf8:0x&lt;CC 81&gt;<br>
<br>
=C2=A0 =C2=A0 say $c.encode; # utf8:0x&lt;C3 A1&gt;<br>
<br>
=C2=A0 =C2=A0 say $d; # utf8:0x&lt;61 CC 81&gt;<br>
=C2=A0 =C2=A0 say $e; # Buf:0x&lt;61 CC 81&gt;<br>
<br>
Notice that `$c.encode` and `$d` are different even though they are<br>
made from the same parts.<br>
`$d` and `$e` are similar because they are dealing with lists of<br>
numbers not strings.<br>
<br>
On Sat, Feb 2, 2019 at 9:23 PM ToddAndMargo via perl6-users<br>
&lt;<a href=3D"mailto:perl6-users@perl.org" target=3D"_blank">perl6-users@p=
erl.org</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi All,<br>
&gt;<br>
&gt; I need to read a file into a buffer (NO CONVERSIONS!)<br>
&gt; and then convert it to a string (again with no<br>
&gt; conversions).<br>
&gt;<br>
&gt; I have been doing this:<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0for ( @$BinaryFile ) -&gt; $Char { $StrFile ~=3D ch=
r($Char); }<br>
&gt;<br>
&gt; But it takes a bit of time.=C2=A0 What is the fastest way to do this?<=
br>
&gt;<br>
&gt; I guess there is not a way to create/declare a variable that is<br>
&gt; both Buf and Str at the same time?=C2=A0 That would mean I did not<br>
&gt; have to convert anything.=C2=A0 I use to get away with this under<br>
&gt; Module 2 all the time.<br>
&gt;<br>
&gt; $ p6 &#39;my $B =3D Buf.new(0x66, 0x66, 0x77); $B.Str ~=3D &quot;z&quo=
t;;&#39;<br>
&gt; Cannot use a Buf as a string, but you called the Str method on it<br>
&gt;=C2=A0 =C2=A0 in block &lt;unit&gt; at -e line 1<br>
&gt;<br>
&gt; $ p6 &#39;my $B =3D Buf.new(0x66, 0x66, 0x77); Str($B) ~=3D &quot;z&qu=
ot;;&#39;<br>
&gt; Cannot use a Buf as a string, but you called the Str method on it<br>
&gt;=C2=A0 =C2=A0 in block &lt;unit&gt; at -e line 1<br>
&gt;<br>
&gt;<br>
&gt; Many thanks,<br>
&gt; -T<br>
&gt;<br>
&gt; --<br>
&gt; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>
&gt; A computer without Microsoft is like<br>
&gt; a chocolate cake without the mustard<br>
&gt; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>
</blockquote></div>

--00000000000008d3d50580fa62b5--
0
david
2/3/2019 9:55:46 AM
On 2/3/19 1:55 AM, David Warring wrote:
> Are all characters in the range 0-255, ie latin-1 characters?
>=20
> You could then try: my $str =3D=C2=A0 $buf.decode("latin-1");
>=20
> There's one potential=C2=A0 issue if your data could contain DOS end of=
 lines=20
> ("\r\n"), which will get translated to a single logical "\n" in the=20
> decoded string.
>=20
> - David

Hi David,

It has to be an exact match.  That includes all carriage returns,
line feeds, page feeds, eofs, tabs, etc..  But thank you anyway.
:-)

-T
0
perl6
2/3/2019 10:11:35 AM
On 2019-02-02 7:22 PM, ToddAndMargo via perl6-users wrote:
> I need to read a file into a buffer (NO CONVERSIONS!)
> and then convert it to a string (again with no
> conversions).

I think you're making an impossible request.  If preserving exact bytes is 
important, then you want to keep your data in a type that represents a sequence 
of bytes, such as Blob of Buf.  A Str represents a sequence of characters, which 
are NOT bytes, so if you're wanting to have a Str that is saying you don't care 
about the bytes.  Given what you keep saying, I'd say skip the Str and just use 
Buf or Blob etc full stop. -- Darren Duncan
0
darren
2/4/2019 1:26:51 AM
On 2/3/19 5:26 PM, Darren Duncan wrote:
> On 2019-02-02 7:22 PM, ToddAndMargo via perl6-users wrote:
>> I need to read a file into a buffer (NO CONVERSIONS!)
>> and then convert it to a string (again with no
>> conversions).
>=20
> I think you're making an impossible request. =20

Don't forget that I think everywhere on this list is
a bloody genius.

> If preserving exact bytes=20
> is important, then you want to keep your data in a type that represents=
=20
> a sequence of bytes, such as Blob of Buf.=C2=A0 A Str represents a sequ=
ence=20
> of characters, which are NOT bytes, so if you're wanting to have a Str =

> that is saying you don't care about the bytes.=C2=A0 Given what you kee=
p=20
> saying, I'd say skip the Str and just use Buf or Blob etc full stop. --=
=20
> Darren Duncan



Hi Darren,

for ( @$BinaryFile ) -> $Char { $StrFile ~=3D chr($Char); }

Does the trick, but it takes up to 15 seconds.  Way
too slow.

I have another post looking to see if any of the other
decodes will work.  So maybe...

My big issue is that the data I am looking through uses four
nuls in a row as a delimiter.  If these get dropped, I
won't be able to find anything.

Your idea about just skipping Str is along the line I have
also been thinking.  Brad has been helping me with "index" for
a Buf.  I haven't had a shot at trying his corrections to my
code yet.

-T
0
perl6
2/4/2019 2:33:38 AM
Reply: