I would love to write: overloaded string operators

((A followup to my message
  Subject: Metathread - Programs I would love to write
))

I would love to write

  use Hypothetical::POD::Parser;
  use String::Tagged::Terminal;
  use Text::Wrap qw( wrap );

  my $text = Hypothetical::POD::Parser->parse("some_file.pod");

  $text = wrap("", "", $text);

  String::Tagged::Terminal->new($text)
    ->print_to_terminal;

This would be nice, as it combines two useful modules:

  * String::Tagged and its various subclasses provide an object class
    that stores a string along with name/value extents within it.
    The hypothetical parser module would return one of these to contain
    the formatting information parsed out of the given POD file.
    String::Tagged::Terminal then uses those to render the formatting
    to the terminal.

  * Text::Wrap conveniently splits paragraphs of text at word
    boundaries, wrapping it into lines no wider than the terminal, so
    words are not split in the middle. It makes nicely printed output.

However, currently this program does not work as expected. The
String::Tagged instance passed in to wrap() gets converted down to a
plain perl string by its stringification operator, losing all the
formatting. The output that gets printed contains the right words, but
has lost all its formatting information.

(There's nothing particular about Text::Wrap in this example - any
text-handling CPAN module would do this. I am just using Text::Wrap as
a simple example here).

This problem is due to the fact that Text::Wrap is written using
regular perl string operations like split(), substr(), regexp matches
and join(), which do not have ways to define operator overloading.
Before I get into more detail on that, I first want to establish the
overall theme for this thread; namely that:

  We believe that providing operator overloads for core string
  operators like split(), substr(), regexps and join() is a useful
  ability to try to achieve.

(Compare to all of the number operator overloads like addition, sqrt()
and sin(), for example)

If we generally agree it'd be nice to be able to write this sort of
thing, then I'll expand more on why it currently doesn't work, why I
believe p5p are the place to begin solving it, and then we can work on
how to fix it.

(There has previously been some discussion on this issue with respect
 to join(), but it got derailed into a minor complication of
 stringification vs. concat operators.

  https://www.nntp.perl.org/group/perl.perl5.porters/2019/06/msg255011.html
)

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/
0
leonerd
7/3/2019 11:32:19 AM
perl.perl5.porters 47807 articles. 1 followers. Follow

4 Replies
48 Views

Similar Articles

[PageSpeed] 59

"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:
[...]
:This problem is due to the fact that Text::Wrap is written using
:regular perl string operations like split(), substr(), regexp matches
:and join(), which do not have ways to define operator overloading.
:Before I get into more detail on that, I first want to establish the
:overall theme for this thread; namely that:
:
:  We believe that providing operator overloads for core string
:  operators like split(), substr(), regexps and join() is a useful
:  ability to try to achieve.

I'd love to have that sort of functionality, at the (rare) point I want
to use it, and I think it conceptually fits well with the rest of perl's
existing overloading model.

However such support is likely to come with additional maintenance
overhead and possibly additional runtime cost, so I think the onus
would be on any proposed implementation to demonstrate that such costs
are commensurate with what is, in the grand scheme of things, a relatively
small benefit.

There is also likely to be a fair bit of new complexity in the rules
around fallback behaviour, which is load on the programmer. Those rules
would want to simultaneously DWIM and be simple and easy to remember.

Hugo
0
hv
7/5/2019 1:12:06 PM
On Fri, 05 Jul 2019 14:12:06 +0100
hv@crypt.org wrote:

> I'd love to have that sort of functionality, at the (rare) point I
> want to use it, and I think it conceptually fits well with the rest
> of perl's existing overloading model.
> 
> However such support is likely to come with additional maintenance
> overhead and possibly additional runtime cost, so I think the onus
> would be on any proposed implementation to demonstrate that such costs
> are commensurate with what is, in the grand scheme of things, a
> relatively small benefit.

The runtime cost one is certainly a valid concern, though largely
mitigated by the fact that overloading comes with AMAGIC, and
prettymuch every operator has to be checking for the presence of magic
anyway just for doing regular reads/writes. All of these operators have
already checked for magic, and in the common case of there not being
any they already now know no overloading is going on, and can proceed
as normal.

Obviously I'll have to test and benchmark it to be sure, but I expect
the likely impact of any overloading tests to be fairly minimal in the
grand scheme of things.

In any case, hopefully easy enough to justify for the improved ability
to write such flexible code, by arguments analogous to those that must
have initially been made in allowing numbers to have overloaded
addition, for the cost that it makes to every use of the `$x + $y`
operator anywhere.

> There is also likely to be a fair bit of new complexity in the rules
> around fallback behaviour, which is load on the programmer. Those
> rules would want to simultaneously DWIM and be simple and easy to
> remember.

Yes this is honestly my biggest concern - the fallbacks especially.

My original proposal started - I thought quite modestly - by just asking
for join() to respect string concatenation overloading and in particular
I wanted to make it semantically equivalent to

  reduce { $a . $sep . $b } @strings

but already that seemed to cause some upset, with a suggestion that
join() ought to be subject to its own operator name. Given as join()
takes an entire list of values, it is hard to see at an initial glance,
what the dispatch rules would be for such an operator.

My other precedent on this idea is `overload::substr`[1] which only
overloads the substr() operator. I did have ideas of using it to
synthesize the split() and regexp match operators out of it, because
all of those can be performed by just substr'ing on appropriate
positions. It remains to be seen whether that would be a good plan of
attack still, or whether a more complete set of overloads with fallback
rules between them would be better.


[1]: https://metacpan.org/pod/overload::substr
     and in particular the 4th bullet point of
     https://metacpan.org/pod/overload::substr#TODO

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/
0
leonerd
7/5/2019 6:13:07 PM
Ah, I've been toying with doing something like String::Tagged for many
years now (my name would probably be Text::Properties).

Being able to overload string operations would make the idea more
useful, though I would be surprised if it turned out to be practical
to implement at this late date.


On 7/3/19, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:
> ((A followup to my message
>   Subject: Metathread - Programs I would love to write
> ))
>
> I would love to write
>
>   use Hypothetical::POD::Parser;
>   use String::Tagged::Terminal;
>   use Text::Wrap qw( wrap );
>
>   my $text = Hypothetical::POD::Parser->parse("some_file.pod");
>
>   $text = wrap("", "", $text);
>
>   String::Tagged::Terminal->new($text)
>     ->print_to_terminal;
>
> This would be nice, as it combines two useful modules:
>
>   * String::Tagged and its various subclasses provide an object class
>     that stores a string along with name/value extents within it.
>     The hypothetical parser module would return one of these to contain
>     the formatting information parsed out of the given POD file.
>     String::Tagged::Terminal then uses those to render the formatting
>     to the terminal.
>
>   * Text::Wrap conveniently splits paragraphs of text at word
>     boundaries, wrapping it into lines no wider than the terminal, so
>     words are not split in the middle. It makes nicely printed output.
>
> However, currently this program does not work as expected. The
> String::Tagged instance passed in to wrap() gets converted down to a
> plain perl string by its stringification operator, losing all the
> formatting. The output that gets printed contains the right words, but
> has lost all its formatting information.
>
> (There's nothing particular about Text::Wrap in this example - any
> text-handling CPAN module would do this. I am just using Text::Wrap as
> a simple example here).
>
> This problem is due to the fact that Text::Wrap is written using
> regular perl string operations like split(), substr(), regexp matches
> and join(), which do not have ways to define operator overloading.
> Before I get into more detail on that, I first want to establish the
> overall theme for this thread; namely that:
>
>   We believe that providing operator overloads for core string
>   operators like split(), substr(), regexps and join() is a useful
>   ability to try to achieve.
>
> (Compare to all of the number operator overloads like addition, sqrt()
> and sin(), for example)
>
> If we generally agree it'd be nice to be able to write this sort of
> thing, then I'll expand more on why it currently doesn't work, why I
> believe p5p are the place to begin solving it, and then we can work on
> how to fix it.
>
> (There has previously been some discussion on this issue with respect
>  to join(), but it got derailed into a minor complication of
>  stringification vs. concat operators.
>
>   https://www.nntp.perl.org/group/perl.perl5.porters/2019/06/msg255011.html
> )
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an
> email to doom+unsubscribe@kzsu.stanford.edu.
>
>
0
doomvox
7/5/2019 7:41:28 PM
--0000000000004c79e5058d2ee9be
Content-Type: text/plain; charset="UTF-8"

allowing the core string manipulation ops to do indirect method calls when
their operand is a blessed reference would be a step in the right
direction. Until then you could rewrite these libraries using method calls
along with the core ops present in UNIVERSAL:: to handle the scalars?
Fragile, would break on strings that happen to be package names... but only
if they have methods with these reserved words for names.

$ perl -le 'sub S::split{print "indirect"}; my $s=bless \(my $o),"S"; split
$s,2,3'
$ perl -le 'sub S::split{print "direct"}; my $s=bless \(my $o),"S";
 $s->split(2,3)'
direct
$


That is, implementing "rich text" is a SMOP, but the interoperability
semantics are going to be tricky.

--0000000000004c79e5058d2ee9be
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div><br></div>allowing the core string m=
anipulation ops to do indirect method calls when their operand is a blessed=
 reference would be a step in the right direction. Until then you could rew=
rite these libraries using method calls along with the core ops present in =
UNIVERSAL:: to handle the scalars? Fragile, would break on strings that hap=
pen to be package names... but only if they have methods with these reserve=
d words for names.<br><div><br></div><div>$ perl -le &#39;sub S::split{prin=
t &quot;indirect&quot;}; my $s=3Dbless \(my $o),&quot;S&quot;; split $s,2,3=
&#39;<br>$ perl -le &#39;sub S::split{print &quot;direct&quot;}; my $s=3Dbl=
ess \(my $o),&quot;S&quot;; =C2=A0$s-&gt;split(2,3)&#39;<br></div></div>dir=
ect<div>$</div><div><br></div><div><br></div><div>That is, implementing &qu=
ot;rich text&quot; is a SMOP, but the interoperability semantics are going =
to be tricky.</div><div><br></div><div><br></div></div>

--0000000000004c79e5058d2ee9be--
0
davidnicol
7/8/2019 5:37:51 PM
Reply: