Making join() respect string-concat operator

(this is simpler rehash of
  https://www.nntp.perl.org/group/perl.perl5.porters/2010/06/msg160741.html )

I would like core's join() operator to respect the string-concat
operator overloading of any arguments passed to it.

I.e. that the result of

  join( $sep, $x, $y, $z )

always be indistinguishable from the result of

  $x . $sep . $y . $sep . $z

even if any of $sep, $x, $y or $z has operator overloading.

In particular, if any of those operators returned an object rather than
a plain string, then the overall join() operator should by now have
returned that object.

As things currently stand, join() always stringifies each argument
individually, then yields a plain string containing all the characters
concatenated into it.

I am happy to write docs and tests, and implement this.

---

Stating my interest: I am the author of https://metacpan.org/pod/String::Tagged

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/
0
leonerd
6/4/2019 5:36:25 PM
perl.perl5.porters 47760 articles. 1 followers. Follow

9 Replies
65 Views

Similar Articles

[PageSpeed] 54

On Tue, Jun 04, 2019 at 06:36:25PM +0100, Paul "LeoNerd" Evans wrote:
> (this is simpler rehash of
>   https://www.nntp.perl.org/group/perl.perl5.porters/2010/06/msg160741.html )
> 
> I would like core's join() operator to respect the string-concat
> operator overloading of any arguments passed to it.
> 
> I.e. that the result of
> 
>   join( $sep, $x, $y, $z )
> 
> always be indistinguishable from the result of
> 
>   $x . $sep . $y . $sep . $z
> 
> even if any of $sep, $x, $y or $z has operator overloading.
> 
> In particular, if any of those operators returned an object rather than
> a plain string, then the overall join() operator should by now have
> returned that object.
> 
> As things currently stand, join() always stringifies each argument
> individually, then yields a plain string containing all the characters
> concatenated into it.


I'm not very keen on the idea. It's making join() into a very special
case when it comes to overloading. If we were to go down the path of
making perl  builtin functions overloadable, then we really ought to
add the facility to make them overloaded (cf the mathematical functions
sin cos etc, which are already overloadable).

Not that I'm terribly keen on that either.

Your proposal would change how often $sep is evaluated. At the moment
if it is magical, its magic is called once. Then if the result of the
magic call is overloaded, the '""' method is called once.

If the join is treated as
    $x . $sep . $y . $sep . $z
then the magic on $sep would be called multiple times, each time
potentially returning a different value.

Also, if we're going down the path of "find places where perl concatenates
strings and if so call the overload concatenation method", then where do
we stop? Do we support overloading in Perl_sv_catsv() and friends?

At the moment its reasonably clear that where you explicitly use the '.'
operator (and double-quoted strings, which are syntactic sugar for
concatenation) you should expect concat overloading to be honoured.

We should keep a clear distinction between:

    concat operator '.' appears in the source code

and

    perl concatenates two strings for some reason.

Only the first should support concat overloading.


-- 
Fire extinguisher (n) a device for holding open fire doors.
0
davem
6/5/2019 9:30:30 AM
On Wed, Jun 5, 2019 at 5:30 AM Dave Mitchell <davem@iabyn.com> wrote:
> At the moment its reasonably clear that where you explicitly use the '.'
> operator (and double-quoted strings, which are syntactic sugar for
> concatenation) you should expect concat overloading to be honoured.
[...]
>
> We should keep a clear distinction between:
>
>     concat operator '.' appears in the source code
>
> and
>
>     perl concatenates two strings for some reason.

To me "" seems like the latter, not the former.

This is probably documented somewhere but I find it surprising that
"$foo" calls concat overloading on $foo before string overloading.

Anyway... my concern with changing this is the performance
implications Dave mentioned, and also what problems this change in
behaviour might introduce to existing code. We can test CPAN for that
but not DARKPAN.

-- Matthew Horsfall (alh)
0
wolfsage
6/10/2019 2:22:09 PM
On Mon, Jun 10, 2019 at 10:22:09AM -0400, Matthew Horsfall (alh) wrote:
> On Wed, Jun 5, 2019 at 5:30 AM Dave Mitchell <davem@iabyn.com> wrote:
> > At the moment its reasonably clear that where you explicitly use the '.'
> > operator (and double-quoted strings, which are syntactic sugar for
> > concatenation) you should expect concat overloading to be honoured.
> [...]
> >
> > We should keep a clear distinction between:
> >
> >     concat operator '.' appears in the source code
> >
> > and
> >
> >     perl concatenates two strings for some reason.
> 
> To me "" seems like the latter, not the former.
> 
> This is probably documented somewhere but I find it surprising that
> "$foo" calls concat overloading on $foo before string overloading.

I'm not sure I follow. "$foo" calls '""' overloading, not '.':

This:

    use overload
        '""' => sub { print "STRFY($_[0][0])\n"; $_[0][0] },
        '.'  => sub { print "CONCAT$_[0][0], $_[1][0])\n";
                        bless [ $_[0][0] . $_[1][0] ] }
        ;

    my $s = bless [ "foo" ];
    my $t = "$s";

outputs:

    STRFY(foo)


-- 
Diplomacy is telling someone to go to hell in such a way that they'll
look forward to the trip
0
davem
6/11/2019 11:22:22 AM
On Tue, Jun 11, 2019 at 7:22 AM Dave Mitchell <davem@iabyn.com> wrote:
> I'm not sure I follow. "$foo" calls '""' overloading, not '.':

Sorry, my example had one too few arguments:

alh@hyrule:~$ perl
   use overload
        '""' => sub { print "STRFY($_[0][0])\n"; $_[0][0] },
        '.'  => sub { print "CONCAT$_[0][0], $_[1][0])\n";
                        bless [ $_[0][0] . $_[1][0] ] }
        ;

    my $s = bless [ "foo" ];
    my $t = "$s $s";
CONCATfoo, )
CONCATfoo, foo)

-- Matthew Horsfall (alH)
0
wolfsage
6/11/2019 2:29:28 PM
On Tue, Jun 11, 2019 at 10:29:28AM -0400, Matthew Horsfall (alh) wrote:
> Sorry, my example had one too few arguments:
> 
> alh@hyrule:~$ perl
>    use overload
>         '""' => sub { print "STRFY($_[0][0])\n"; $_[0][0] },
>         '.'  => sub { print "CONCAT$_[0][0], $_[1][0])\n";
>                         bless [ $_[0][0] . $_[1][0] ] }
>         ;
> 
>     my $s = bless [ "foo" ];
>     my $t = "$s $s";
> CONCATfoo, )
> CONCATfoo, foo)

Yeah, but double-quotish string interpolation is well understood(*) to
be just syntactic sugar for string concatenation, and indeed compiles to
OP_CONCAT ops.

(*) for some definition of "well".

-- 
Music lesson: a symbiotic relationship whereby a pupil's embellishments
concerning the amount of practice performed since the last lesson are
rewarded with embellishments from the teacher concerning the pupil's
progress over the corresponding period.
0
davem
6/12/2019 9:32:15 AM
There are some exciting discussions happening on this thread, but not
really going in the direction I intended to go.

Let me maybe start again.

  TL;DR: I want to be able to write stringy algorithms that work nicely
    on overloaded objects, the way that numerical algorithms already do.

Observe, in Perl, that we have a rich set of numerical operators for
doing all sorts of complicated maths work. Observe also that they all
form a nice well-behaved set with respect to operator overloading,
allowing such modules as Math::BigRat to exist. This allows someone to
write a numerical algorithm, say, without any knowledge or upfront
design to take bigrat in mind, and yet because the operators all play
nicely, a user can trade runtime for precision and use Math::BigRat to
get answers as precise as they want.

Compare this to the relatively operator-poor world of strings. I have
written a module, String::Tagged, which acts and feels like a string
but stores extra data in extents across it, typically used for
formatting or similar. Because perl's string operators aren't anywhere
near as nicely overloadable, I can't put these objects into some random
string-processing module, say, Text::Wrap, and have it Just Work in
anywhere near the same neatness as Math::BigRat works for numbers.

It does feel a shame that Perl, a language that it traditionally good
at text manipulation, can only manipulate plain strings in such a nice
way, and doesn't let you add extra semantics to string-like object types
and actually have them behave with the core operators.

I did start off on an experiment to see how easily I could fix this;
already I've created overload::substr which adds a new `substr`
overloading slot:

  https://metacpan.org/pod/overload::substr

This actually works, provided the module is loaded early (because it
does Evil Evil Things). With this module loaded, any string-like object
class such as String::Tagged now behaves nicely with respect to the
substr() core function. Other modules, such as Text::Wrap, don't now
have to care at all - they can operate on String::Tagged (or any other
stringy object class) transparently.

Via a bit of learning how the regexp engine works I could use the same
trick to make split() use the same substr operator to extract the
"pieces", as well as maybe make regexp match and substitution work.
While I'm there I could have join() behave with string concat.

I am somewhat hesitant to do that in the manner used in this module,
because of the Evil Evil Things alluded to above. What I do is replace
the core PL_ppaddr[OP_SUBSTR] with a pointer to my own overloaded logic:

  https://metacpan.org/source/PEVANS/overload-substr-0.03/lib/overload/substr.xs#L201

This doesn't work for any code that has already been compiled, so the
entire hack is very sensitive to the exact order that modules are
`use`d. In addition, I don't know of a way that a module which isn't
core perl can nicely provide the $1, $2, ... variables in a lazy way. I
could perhaps pick a static number, say 10, and fix those, but then it
would break for $11. Only core perl via its various magic tables, can
implement that at all.

It would be nice if support for these overloaded operators was moved
into core perl, so that module load order didn't matter, and then it
would become possible to write nice string-like object classes which
actually behave properly against text processing algorithms.

Such as Perl is famous for.

Thanks,

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/
0
leonerd
6/14/2019 3:08:09 PM
--00000000000012b80c058b4c2ab7
Content-Type: text/plain; charset="UTF-8"

On Fri, Jun 14, 2019 at 10:08 AM Paul "LeoNerd" Evans <
leonerd@leonerd.org.uk> wrote:

> Compare this to the relatively operator-poor world of strings. I have
> written a module, String::Tagged, which acts and feels like a string
> but stores extra data in extents across it, typically used for
> formatting or similar.



For what it's worth, I like to refer to compound objects that contain many
smaller strings (formatted documents, deferred template-filling outputs,
etc) as "ropes." Because they're made of many strings.

-- 
"Plant yourself like a tree beside the river of truth and tell the whole
world: No, you move."

--00000000000012b80c058b4c2ab7
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Fri, Jun 14, 2019 at 10:08 AM Paul=
 &quot;LeoNerd&quot; Evans &lt;<a href=3D"mailto:leonerd@leonerd.org.uk">le=
onerd@leonerd.org.uk</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex">Compare this to the relatively operator-poor world of s=
trings. I have<br>
written a module, String::Tagged, which acts and feels like a string<br>
but stores extra data in extents across it, typically used for<br>
formatting or similar.</blockquote><div><br></div><div><br></div><div>For w=
hat it&#39;s worth, I like to refer to compound objects that contain many s=
maller strings (formatted documents, deferred template-filling outputs, etc=
) as &quot;ropes.&quot; Because they&#39;re made of many strings.</div><div=
><br></div></div>-- <br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=
=3D"ltr"><div><div>&quot;Plant yourself like a tree beside the river of tru=
th and tell the whole world: No, you move.&quot;</div></div></div></div></d=
iv>

--00000000000012b80c058b4c2ab7--
0
davidnicol
6/14/2019 5:41:26 PM
--00000000000041d7b4058b4cb924
Content-Type: text/plain; charset="UTF-8"

On Fri, Jun 14, 2019 at 1:42 PM David Nicol <davidnicol@gmail.com> wrote:

> For what it's worth, I like to refer to compound objects that contain many
> smaller strings (formatted documents, deferred template-filling outputs,
> etc) as "ropes." Because they're made of many strings.
>

I've also seen "twine": https://metacpan.org/pod/XML::Easy::NodeBasics#Twine

-Dan

--00000000000041d7b4058b4cb924
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Fri, Jun 14, 2019 at 1:42 PM David Nic=
ol &lt;<a href=3D"mailto:davidnicol@gmail.com">davidnicol@gmail.com</a>&gt;=
 wrote:</div><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pad=
ding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_quote"><div>For what it=
&#39;s worth, I like to refer to compound objects that contain many smaller=
 strings (formatted documents, deferred template-filling outputs, etc) as &=
quot;ropes.&quot; Because they&#39;re made of many strings.</div></div></di=
v></blockquote><div><br></div><div>I&#39;ve also seen &quot;twine&quot;:=C2=
=A0<a href=3D"https://metacpan.org/pod/XML::Easy::NodeBasics#Twine">https:/=
/metacpan.org/pod/XML::Easy::NodeBasics#Twine</a></div><div><br></div><div>=
-Dan=C2=A0</div></div></div>

--00000000000041d7b4058b4cb924--
0
grinnz
6/14/2019 6:21:54 PM
On Fri, Jun 14, 2019 at 6:42 PM David Nicol <davidnicol@gmail.com> wrote:
> For what it's worth, I like to refer to compound objects that contain many smaller strings (formatted documents, deferred template-filling outputs, etc) as "ropes." Because they're made of many strings.


 Not to be confused with https://en.wikipedia.org/wiki/Rope_(data_structure)?
0
rich
6/14/2019 6:53:50 PM
Reply: