[Pugs] A couple of string interpolation edge cases

I stumbled across a couple of interesting quote interpolation
edge cases:

Case 1
------

# cat ttt.p6
my $x = "{";

# pugs ttt.p6

unexpected end of input
expecting "\"", "$!", "$/", "\\" or block
NonTerm SourcePos "ttt.p6" 2 1

Is this a bug?

Case 2
------

# cat q1.pl
my $x = "$";
print "x='$x'\n";

# perl -w q1.pl
Final $ should be \$ or $name at q1.pl line 1, within string
syntax error at q1.pl line 1, near "= "$""
Execution of q1.pl aborted due to compilation errors.

# pugs q1.pl
x='$'

Wow, is pugs better than p5 here? ;-)

/-\


Find local movie times and trailers on Yahoo! Movies.
http://au.movies.yahoo.com
0
ajsavige
3/26/2005 4:32:18 AM
perl.perl6.compiler 1237 articles. 0 followers. Follow

29 Replies
393 Views

Similar Articles

[PageSpeed] 5

On Sat, Mar 26, 2005 at 03:32:18PM +1100, Andrew Savige wrote:
: I stumbled across a couple of interesting quote interpolation
: edge cases:
: 
: Case 1
: ------
: 
: # cat ttt.p6
: my $x = "{";
: 
: # pugs ttt.p6
: 
: unexpected end of input
: expecting "\"", "$!", "$/", "\\" or block
: NonTerm SourcePos "ttt.p6" 2 1
: 
: Is this a bug?

No, but it could probably use a better error message about a possible
runaway closure in the string, much as Perl 5 warns about runaway
strings.

: Case 2
: ------
: 
: # cat q1.pl
: my $x = "$";
: print "x='$x'\n";
: 
: # perl -w q1.pl
: Final $ should be \$ or $name at q1.pl line 1, within string
: syntax error at q1.pl line 1, near "= "$""
: Execution of q1.pl aborted due to compilation errors.
: 
: # pugs q1.pl
: x='$'
: 
: Wow, is pugs better than p5 here? ;-)

Well, yes, actually.  Perl 5 manages to catch the error only because
it does bogus terminator-parsing lookahead.  Pugs ought to have got a
syntax error on the backslash, I think, and reported a runaway quote
as the likely cause.

Though in fact, with a backtracking parser you should be able to
determine the point at which a different decision would have produced
a successful parse.  We shouldn't generate code from the successful
parse once we're in error state, but it could give us much better
diagnostics in certain cases if we install backtracking decision points
for things that look suspiciously like they meant something else.
In this particular case the parser could have intuited exactly what
went wrong.

Hmm, well, if it got that far.  Given strict being on by default,
this particular example should probably just die on the fact that $"
isn't declared, since there's no $" in Perl 6.

Larry
0
larry
3/26/2005 6:03:45 AM
--SLDf9lqlvOQaIe6s
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Mar 25, 2005 at 10:03:45PM -0800, Larry Wall wrote:
> Hmm, well, if it got that far.  Given strict being on by default,
> this particular example should probably just die on the fact that $"
> isn't declared, since there's no $" in Perl 6.

Is $" okay as a variable name?  Is everything from perlvar.pod legal? :)

    my $" = 3;

Pugs parses that because it only considers $! and $/ as legal
symbolic variable names.

Thanks,
/Autrijus/

--SLDf9lqlvOQaIe6s
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFCRP0RtLPdNzw1AaARAijCAKCZ3Eap9BZt4GdHQ0CczfMTcl2dvgCcCh6v
6OJxGvOCfwU0njjMOAZ+CsY=
=hLHt
-----END PGP SIGNATURE-----

--SLDf9lqlvOQaIe6s--
0
autrijus
3/26/2005 6:11:29 AM
Andrew Savige writes:
> I stumbled across a couple of interesting quote interpolation
> edge cases:
> 
> Case 1
> ------
> 
> # cat ttt.p6
> my $x = "{";
> 
> # pugs ttt.p6
> 
> unexpected end of input
> expecting "\"", "$!", "$/", "\\" or block
> NonTerm SourcePos "ttt.p6" 2 1
> 
> Is this a bug?

No.  Braces in strings are interpolator markers:

    say "The output of foo is { foo() }";

> Case 2
> ------
> 
> # cat q1.pl
> my $x = "$";
> print "x='$x'\n";
> 
> # perl -w q1.pl
> Final $ should be \$ or $name at q1.pl line 1, within string
> syntax error at q1.pl line 1, near "= "$""
> Execution of q1.pl aborted due to compilation errors.
> 
> # pugs q1.pl
> x='$'

Hmm, given that we require no whitespace between the $ and the
identifier now, we might treat a $ not followed by \w or / as literal.

Luke
0
luke
3/26/2005 6:14:22 AM
On Sat, Mar 26, 2005 at 02:11:29PM +0800, Autrijus Tang wrote:
: On Fri, Mar 25, 2005 at 10:03:45PM -0800, Larry Wall wrote:
: > Hmm, well, if it got that far.  Given strict being on by default,
: > this particular example should probably just die on the fact that $"
: > isn't declared, since there's no $" in Perl 6.
: 
: Is $" okay as a variable name?  Is everything from perlvar.pod legal? :)

Considering nobody's written perlvar.pod for Perl 6 yet, yeah, everything
in that pod is legal.  :-)

:     my $" = 3;
: 
: Pugs parses that because it only considers $! and $/ as legal
: symbolic variable names.

$! will be a legal variable name.  $/ is going away, as is $", which
means they fail under "use strict", but they'd still autocreate
globals under laxity as Perl 5 does.  (I know Perl 5 exempted all
special variables from strict, but I don't see why we have to do
that for Perl 6.  Merely having $_ in the lexical scope or $*! in the
global scope should be sufficient declaration to get around strict.
Though perhaps we can exempt people from having to write $*! under
strict.  In fact, that probably goes for all predeclared $* names,
so $IN is legal for $*IN as long as you don't have "my $IN" hiding
it.  Another way to look at it is that * variables are basically
autodeclared "our" implicitly in the outermost lexical scope.)

Sigh, I'd better rough it all in here, even if I don't have time to
do a good job on it.  Maybe somebody can beat this into a real S28 pod.

$? and $@ are gone, merged in with $!.  (Frees up ? twigil for $?FOO
syntax.)  $^E is merged too.  $! is an object with as much info as
you'd like on the current exception (unthrown outside of CATCH, thrown
inside).  Unthrown exceptions are typically interesting values of undef.

$$ is now $*PID.  ($$foo is now unambuous.)

$0 is gone in favor of $*PROGRAM_NAME or some such.

Anything that varied with the selected output filehandle like $|
is now a method on that filehande, and the variables don't exist.
(The p5-to-p6 translator will probably end up depending on some
$Perl5ish::selected_output_filehandle variable to emulate Perl 5's
single-arg select().)  Likewise $/ and $. should be attached to
a particular input filehandle.  (In fact, $/ is now the result of
the last regular expression match, though we might keep the idea of
$. around in some form or other just because it's awfully handy for
error messages.  But the localizing $. business is yucky.  We have
to clean that up.)

All the special format variables ($%, $=, $-, $:, $~, $^, $^A, $^L)
are gone.  (Frees up the = twigil for %= POD doc structures and
old __DATA__ stream, the : twigil for private attributes, and the ~
twigil for autodeclared parameters.)

$`, $', and $+ don't exist any more, but you can dig that info out
of $/'s structures.  Shortcuts into $/ include $1, $2, and such, and
the newfangled $<foo> things.  Also, $& is changed to $0 for the whole
matched string.  $` and $' may be $<pre> and $<post>, but you probably
have to explicitly match <pre> and <post> to get them remembered,
so we don't have a repeat of the Perl 5 sawampersand fiasco.  <pre>
and <post> would automatically exclude themselves from $0.  Or you
need some special flag to remember them, maybe.

%+ and %- are gone.  $0, $1, $2,  etc. are all objects that know
where they .start and .end.  (Mind you, those methods return magical
positions that are Unicode level independent.)

$* and $# have been deprecated half of forever and are gone.  $[
is a fossil that I suppose could turn into an evil pragma, if we
try to translate it at all.  (Frees up * twigil for $*FOO syntax.)

$(, $), $<, and $> should all change to various $*FOO names.  $] is either
something in $* or a trait of the Perl namespace.  Likewise $^V, if
they aren't in fact merged.

${...} is reserved for hard refs only now.  ($::(...) must be used
for symbolics refs.)  ${^foo} should just change to $*foo or $*_foo
or some such.

$; is gone because the multidim hash hack is gone.  $" is gone,
replaced by @foo.join(":") or some such.  Likewise for $, in print
statements.

We never did find a use for $}, thank goodness.

And we still are keeping $_ around, though it's lexically scoped.

Let's see, what other damage can we do to perlvar.  $a and $b are
no longer special.  No bareword filehandles.  $*IN, $*OUT, $*ERR.
Args come in @*ARGS rather than @ARGV.  (Environment still in %ENV,
will wonders never cease.)  I don't know whether @INC and %INC will
make as much sense when we're looking installed modules in a database,
though I suppose you still have to let the user add places to look.

%SIG is now %*SIG.  The __DIE__ and __WARN__ hooks should be brought
out as separate &*ON_DIE and &*ON_WARN variables--they really
have nothing to do with signals.  I suppose we could even do away
with %SIG and replace it with &*ON_SIGINT and such, though then we'd
lose a bit of signal introspection which would have to be provided
some other way.  Oh, and we probably ought to split out &?ON_PARSEERROR
from $*ON_DIE to get rid of the $^S fiasco of Perl 5.

$^C, $^D, $^F, $^I, $^M, $^O, $^P, $^S, $^T, $^V, $^X are all renamed
to something $*FOOish, at least the ones that aren't going away entirely.

$^W is is too blunt an instrument even in Perl 5, so it's probably gone.

I'm not quite sure what to do with $^N or $^R yet.  Most likely they
end up as something $<foo>ish, if they stay.

You weren't ever supposed to know about $^H and %^H.  Or %{^FNORD}...

Other things might show up as global variables in support of
command-line options, like $*ARGVOUT or @*F.  Some of the special
variables we've blissfull relegated to the trash heap might
creep back in as global variables that just happen to know about
$*Perl5ish::current_selected_filehandle and such, but we should
probably try to keep them as lvalue subs in &Perl5ish::ors() and such.

Anyway, it's all negotiable, except for the parts that aren't.

Larry
0
larry
3/26/2005 8:27:24 AM
--- Luke Palmer wrote:
> Andrew Savige writes:
> > I stumbled across a couple of interesting quote interpolation
> > edge cases:

Just toppled over the edge of another two sand traps.

Case 3
------

# cat q7.p6
my $x = '\\x';
print "x='$x'\n";

# perl -w q7.p6
x='\x'

# pugs q7.p6
x='\\x'

Case 4
------

# cat q8.p6
my $x = '\\';
print "x='$x'\n";

# perl -w q8.p6
x='\'

# pugs q8.p6

unexpected "'"
expecting word character, "::", term postfix, operator, postfix conditional,
postfix loop, postfix i
teration, ";" or end of input
NonTerm SourcePos "q8.p6" 2 13

/-\


Find local movie times and trailers on Yahoo! Movies.
http://au.movies.yahoo.com
0
ajsavige
3/26/2005 9:06:29 AM
Larry Wall creates Sish28:
> On Sat, Mar 26, 2005 at 02:11:29PM +0800, Autrijus Tang wrote:
> : On Fri, Mar 25, 2005 at 10:03:45PM -0800, Larry Wall wrote:
> : > Hmm, well, if it got that far.  Given strict being on by default,
> : > this particular example should probably just die on the fact that $"
> : > isn't declared, since there's no $" in Perl 6.
> : 
> : Is $" okay as a variable name?  Is everything from perlvar.pod legal? :)
> 
> Considering nobody's written perlvar.pod for Perl 6 yet, yeah, everything
> in that pod is legal.  :-)
> 
> :     my $" = 3;
> : 
> : Pugs parses that because it only considers $! and $/ as legal
> : symbolic variable names.
> 
> $! will be a legal variable name.  $/ is going away, 

By which you mean that $/ is turning into a special $0.

> Anything that varied with the selected output filehandle like $|
> is now a method on that filehande, and the variables don't exist.
> (The p5-to-p6 translator will probably end up depending on some
> $Perl5ish::selected_output_filehandle variable to emulate Perl 5's
> single-arg select().)

I think $| et al. could just translate to methods on $*OUT, and select
would look like this:

    sub perl5_select($fh) {
        $*OUT = $fh;
    }

Is there some subtlety that that doesn't cover?

> %+ and %- are gone.  $0, $1, $2,  etc. are all objects that know
> where they .start and .end.  (Mind you, those methods return magical
> positions that are Unicode level independent.)

Uh, it might be a bad idea to make $# objects.  It might not, but it
might.  I think it would be fine if they turned into regular strings
upon assignment (and to pass their full objecthood around, you'd have to
backwhack them).  But the problem with keeping them objects is that if
you put them somewhere else and change them, they turn back into regular
strings without .start and .end, which may be a hard-to-track-down bug
if you're thinking that they stay objects... haven't really thought
about this much (and my head is irritatingly foggy at the moment).

> $; is gone because the multidim hash hack is gone.

Funny, I never used the multidim hash hack, I just emulated it:

    $hash{"$foo$;$bar"} = $value;

> We never did find a use for $}, thank goodness.

Isn't that the "enable all of Damian's unpublished modules" variable?

> $^W is is too blunt an instrument even in Perl 5, so it's probably gone.

Well, almost.  When writing a recent module, I found that one of the
modules I was using was spitting out an error from its own internal code
on one of my calls, and there was nothing wrong with the call.  I
submitted a bug report to the author, and searched for a way to shut it
up so my users wouldn't complain at me.  It ended up having to use $^W
at compile time (and it looks very hackish).  We ought to have a
(perhaps not quite as hackish) ability to say "there's no reason for
that warning, but I can't modify your code, so just be quiet".

> I'm not quite sure what to do with $^N or $^R yet.  Most likely they
> end up as something $<foo>ish, if they stay.

For $^N, how about $/[-1]?

Luke
0
luke
3/26/2005 10:37:41 AM
On Sat, 2005-03-26 at 00:27 -0800, Larry Wall wrote:

> $$ is now $*PID.  ($$foo is now unambuous.)
> 
> $0 is gone in favor of $*PROGRAM_NAME or some such.

You know, Java did one thing in this respect that I liked, and managed
to do it in a way that I couldn't stand. The idea of program as object
was nice, but they made the programmer manage it, which was really kind
of silly.

If you think of the OS-level shell around a Perl interpreter as an
object, and make perl manage that for you, then this falls out rather
nicely:

	$*PID := $*PROC.pid;
	$*PPID := $*PROC.ppid;
	$*PROGRAM_NAME := ~$*PROC;

Perhaps even some often-used data could be shoved in there:

	$life = time() - $*PROC.start_time;

In fact, it seems like a good place for any OS-level globals:

	$*IN := $*PROC.pio_in // $*PROC.stdin;

If we consider $*PROC to be the invocant of the implicit "main", then:

	say "I am number {.pid}, who is number 1?";

works just fine in global context. This also gives you a nice simple way
to drill down into your interpreter / runtime / VM / whatever state:

	say "I'm {.name} running under {.interp.name}";


0
ajs
3/26/2005 2:59:10 PM
On Sat, Mar 26, 2005 at 03:37:41AM -0700, Luke Palmer wrote:
: > $! will be a legal variable name.  $/ is going away, 
: 
: By which you mean that $/ is turning into a special $0.

I'd say that $0 is a specialization of $/, but yes, basically, they
both represent the current match result, albeit differently.  $0 is
explicitly what would have been returned by $1 if you'd put parens
around the entire match, which is not quite the same as the complete match.
result.

: > Anything that varied with the selected output filehandle like $|
: > is now a method on that filehande, and the variables don't exist.
: > (The p5-to-p6 translator will probably end up depending on some
: > $Perl5ish::selected_output_filehandle variable to emulate Perl 5's
: > single-arg select().)
: 
: I think $| et al. could just translate to methods on $*OUT, and select
: would look like this:
: 
:     sub perl5_select($fh) {
:         $*OUT = $fh;
:     }
: 
: Is there some subtlety that that doesn't cover?

Like, it renders standard output nameless?  In Perl 5, the selected output
handle is a level of indirection above the standard names for the streams
attached to fd 0, 1, and 2.  Saying select(FH) doesn't change the meaning
of STDOUT.

: > %+ and %- are gone.  $0, $1, $2,  etc. are all objects that know
: > where they .start and .end.  (Mind you, those methods return magical
: > positions that are Unicode level independent.)
: 
: Uh, it might be a bad idea to make $# objects.  It might not, but it
: might.  I think it would be fine if they turned into regular strings
: upon assignment (and to pass their full objecthood around, you'd have to
: backwhack them).  But the problem with keeping them objects is that if
: you put them somewhere else and change them, they turn back into regular
: strings without .start and .end, which may be a hard-to-track-down bug
: if you're thinking that they stay objects... haven't really thought
: about this much (and my head is irritatingly foggy at the moment).

My head is always irritatingly foggy.  :-)

Anyway, I'm think of them more as COW objects, and they'd have to know
if their original string was yanked out from under them in any case, so
that's probably the correct moment to invalidate .start and .end, if
we even bother.

: > $; is gone because the multidim hash hack is gone.
: 
: Funny, I never used the multidim hash hack, I just emulated it:
: 
:     $hash{"$foo$;$bar"} = $value;

Well, guess how we'll emulate it in Perl 6.  :-)

: > We never did find a use for $}, thank goodness.
: 
: Isn't that the "enable all of Damian's unpublished modules" variable?

Shh.  Impressionable people are listening.

: > $^W is is too blunt an instrument even in Perl 5, so it's probably gone.
: 
: Well, almost.  When writing a recent module, I found that one of the
: modules I was using was spitting out an error from its own internal code
: on one of my calls, and there was nothing wrong with the call.  I
: submitted a bug report to the author, and searched for a way to shut it
: up so my users wouldn't complain at me.  It ended up having to use $^W
: at compile time (and it looks very hackish).  We ought to have a
: (perhaps not quite as hackish) ability to say "there's no reason for
: that warning, but I can't modify your code, so just be quiet".

Yes, we need to be able to suppress warnings in dynamic scopes as well
as lexical, but that's probably not a scalar proposition anymore, unless
the replacement for $^W is taken as a pointer to a hash of potential
warnings.  Presumably you could temporize the whole hash to suppress
all warnings, or individual elements to suppress individual warnings.
But maybe that's a good place for temporized methods instead, and then
we could name sets of warnings.  Or maybe there's yet some other approach
that makes more sense.  We want to encourage people to suppress only
the exact warnings they want to suppress, and not just cudgel other
modules into silence.

: > I'm not quite sure what to do with $^N or $^R yet.  Most likely they
: > end up as something $<foo>ish, if they stay.
: 
: For $^N, how about $/[-1]?

I guess that makes some sense.  I was thinking of $/[-$n] as relative
to the current match position, but hadn't thought it through to the
point of deciding how to count those.  $^N mandates counting based on
right parentheses rather than left, which I guess makes sense.  So
let's say that $/[-2] means (one) rather the incomplete ((three)two):

    /(one)((three) { $/[-2] } two)

I note that this is another difference between $/ and $0, since $/
is representing the current state of the match, while $0 isn't bound
till the match succeeds (unless you explicitly bind it earlier, which
is yet another difference between $0 and $/, since you can't bind $/
to mean a portion of itself).

Larry
0
larry
3/26/2005 8:29:37 PM
Larry Wall wrote:

>%+ and %- are gone.  $0, $1, $2,  etc. are all objects that know
>where they .start and .end.  (Mind you, those methods return magical
>positions that are Unicode level independent.)
>
How can you have a level independent position?
The matching itself happens at a specified level. (Note that which level 
the match happens at can change what is matched.) So it makes sense that 
all the positions that come out of it are in terms of that level.

Now, that position can be translated to a lower level, but not to an 
upper level, since you can happily land in the middle of a char.

This is part of what I'm having trouble with your concept of a Str being 
at several levels at once: There's no reliable way to have a notion of 
"position", expect to have it as attached to the highest possible level, 
and the second someone does something at lower level, you void the 
position, and possibly the ability to remain at that high level.

I still see my notion of a Str having only one level and encoding at a 
time as being preferable. Having the ability to recast a string to other 
levels/encoding should be easy, and many builtins should do that 
recasting for you.

I do _not_ see $/ & friends getting ported across a recasting. .pos can 
be translated if new level <= old level, otherwise gets set to undef.

Please convince me your view works in practice. I'm not seeing it work 
well when I attempt to define the relevent parts of S29. But I might 
just be dense on this.

-- Rod Adams
0
rod
3/26/2005 8:37:24 PM
On Sat, Mar 26, 2005 at 09:59:10AM -0500, Aaron Sherman wrote:
: On Sat, 2005-03-26 at 00:27 -0800, Larry Wall wrote:
: 
: > $$ is now $*PID.  ($$foo is now unambuous.)
: > 
: > $0 is gone in favor of $*PROGRAM_NAME or some such.
: 
: You know, Java did one thing in this respect that I liked, and managed
: to do it in a way that I couldn't stand. The idea of program as object
: was nice, but they made the programmer manage it, which was really kind
: of silly.

Well, there is a process object, but it actually exists inside the
operating system.  It's a little silly to force people to name their
own process all the time.  I think we can assume that global variables
belong to the current process, sort of on the "you're soaking in it"
principle.

: If you think of the OS-level shell around a Perl interpreter as an
: object, and make perl manage that for you, then this falls out rather
: nicely:
: 
: 	$*PID := $*PROC.pid;
: 	$*PPID := $*PROC.ppid;
: 	$*PROGRAM_NAME := ~$*PROC;
: 
: Perhaps even some often-used data could be shoved in there:
: 
: 	$life = time() - $*PROC.start_time;
: 
: In fact, it seems like a good place for any OS-level globals:
: 
: 	$*IN := $*PROC.pio_in // $*PROC.stdin;

We can certainly have various objects proxying for various contexts.
It's not clear how those should be broken out though.  To me, an OS
isn't a process, and there's not necessarily going to be a one-to-one
correspondence.

: If we consider $*PROC to be the invocant of the implicit "main", then:
: 
: 	say "I am number {.pid}, who is number 1?";
: 
: works just fine in global context. This also gives you a nice simple way
: to drill down into your interpreter / runtime / VM / whatever state:
: 
: 	say "I'm {.name} running under {.interp.name}";

That's an interesting idea, the more so now that we're leaning away
from .foo ever assuming the current topic unless it also happens to
be the invocant.  But it probably wouldn't do to have one common name for
the .pid outside of methods and force people to use a different name
inside methods.  Here's where $*PID works much better, because it can
be the same everywhere.

Larry
0
larry
3/26/2005 8:48:12 PM
On Sat, 2005-03-26 at 12:48 -0800, Larry Wall wrote:
> On Sat, Mar 26, 2005 at 09:59:10AM -0500, Aaron Sherman wrote:

> Well, there is a process object, but it actually exists inside the
> operating system.  It's a little silly to force people to name their
> own process all the time.  I think we can assume that global variables
> belong to the current process, sort of on the "you're soaking in it"
> principle.

That seems to be a self-limiting position. It leads (as it did in Perl
5) to a desire to reduce the number of times you add access to new OS
features (as it requires global namespace suckage, though not as bad as
in Perl 5), and you'll still split out an object, module or data
structure to contain all of the information that's not in Perl proper
because it's platform specific (e.g. current drive letter context under
DOS).

I agree that $*PID is a useful alias for $*PROC.pid (though the extra *
still bothers me), but providing a unified API for interacting with
myself as an OS-level construct seems to make sense.

That's perhaps just my preference. I'm a hybrid OO/procedural guy, so I
tend to reach into the OO toolbox whenever I think it will make my life
easier.

> : If you think of the OS-level shell around a Perl interpreter as an
> : object[...]
> We can certainly have various objects proxying for various contexts.
> It's not clear how those should be broken out though.  To me, an OS
> isn't a process, and there's not necessarily going to be a one-to-one
> correspondence.

True enough, and you would certainly NOT:

	my $sock = $*PROC.socket;

That makes no sense at all. However, things like "what IO layer am I
using" or "am I a thread" are perfectly valid questions to pose of a
process abstraction.

> : If we consider $*PROC to be the invocant of the implicit "main", then:
> : 
> : 	say "I am number {.pid}, who is number 1?";

> That's an interesting idea, the more so now that we're leaning away
> from .foo ever assuming the current topic unless it also happens to
> be the invocant.  But it probably wouldn't do to have one common name for
> the .pid outside of methods and force people to use a different name
> inside methods.  Here's where $*PID works much better, because it can
> be the same everywhere.

Well, it's always:

	$*PROC.pid

The invocant goodness is just handy in a certain circumstance (what *is*
main's invocant, out of curiosity? I guess it could be the interpreter
context, but that should probably have some relationship to your process
info anyway (either is or does ... probably does.) If I were writing
Learning Perl 6, I would teach "$*PID" and/or "$*PROC.pid", but not
".pid".


0
ajs
3/26/2005 10:50:31 PM
On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote:
: Larry Wall wrote:
: 
: >%+ and %- are gone.  $0, $1, $2,  etc. are all objects that know
: >where they .start and .end.  (Mind you, those methods return magical
: >positions that are Unicode level independent.)
: >
: How can you have a level independent position?

By not confusing positions with numbers.  They're just pointers into
a particular string.

: The matching itself happens at a specified level. (Note that which level 
: the match happens at can change what is matched.) So it makes sense that 
: all the positions that come out of it are in terms of that level.

When we're dealing with mostly variable length encodings, it makes
more sense that the positions come out as string pointers that only
convert to numbers grudgingly under duress.  If you're just going to
feed a position back into a substr() or as the start position of the
next index(), there's no reason to translate it to a number and back
to a pointer.  It's a lot more efficient if you don't.

: Now, that position can be translated to a lower level, but not to an 
: upper level, since you can happily land in the middle of a char.

I talked about this problem in one of the As.  I think the fail soft
approach is to round to the next "ceiling" boundary and issue a warning.

: This is part of what I'm having trouble with your concept of a Str being 
: at several levels at once: There's no reliable way to have a notion of 
: "position", expect to have it as attached to the highest possible level, 
: and the second someone does something at lower level, you void the 
: position, and possibly the ability to remain at that high level.

A position that is a pointer can be true for all levels simultaneously.
It has the additional benefit of a type that is subtype constrained
to operate with other values from the same string, so if you subtract
two pointers from different strings, you can actually detect the error.

: I still see my notion of a Str having only one level and encoding at a 
: time as being preferable. Having the ability to recast a string to other 
: levels/encoding should be easy, and many builtins should do that 
: recasting for you.

And I still see that you can have your view if you install a pragma
that forces all incoming strings to a single level.  But I think we
can do that lazily, or not at all, in many cases.

The basic underlying problem is that there is no simple mapping from
math to Unicode.  The language that lets people express their solution
in terms of Unicode instead of in terms of math is going to have a leg
up on the future, at least in the Unicode problem space.  Strings were
never arrays in Perl, and they're only getting further apart as the
world makes greater demands on strings to represent human language.

So I'd much rather introduce an abstraction like "string position"
now that is not a number.  It's a dimensional value, where the scaling
of the dimensionality is bound to a particular string.  You can have
a pragma that says, "Untyped numbers are assumed to be meters, kilograms,
and seconds", and a different lexical scope might have a pragma that
says "Untyped numbers are assumed to be centimeters, grams, and seconds."
These scopes can get along as long as they don't try to exchange untyped
integers.  Or if they do, they have some way of ascertaining what an
untyped integer meant when it was generated.

: I do _not_ see $/ & friends getting ported across a recasting. .pos can 
: be translated if new level <= old level, otherwise gets set to undef.

The interesting thing about a pointer is that you can pass it through
a higher level transparently as long as you don't actually try to
use it.  But if you do try to use it, I think undef is overkill.
Just as a float stuffed into an int truncates, we should just pick
a direction to find the next boundary and go from there, maybe with
a loss of precision warning.  The right way to suppress the warning
would be to install an explicit function that rounds up or down.

: Please convince me your view works in practice. I'm not seeing it work 
: well when I attempt to define the relevent parts of S29. But I might 
: just be dense on this.

Well, let's work through an example.

    multi method substr(Str $s: Ptr $start, PtrDiff ?$len, Str ?$repl)

Depending on the typology of Ptr and PtrDiff, we can either coerce
various dimensionalities into an appropriate Ptr and PtrDiff type
within those classes, or we could rely on MMD to dispatch to a suite
of substr implementations with more explicit classes.  Interestingly,
since Ptrs aren't integers, we might also allow

    multi method substr(Str $s: Ptr $start, Ptr ?$end, Str ?$repl)

which might be a more natural way to deal with variable length encodings,
and we just leave the "lengthy" version in there for old times sake.

We could go as far as to allow a range as the second argument:

    $x = substr($a, $start..^$end);

or its evil twin:

    $x = $a[$start..^$end];

Of course, with the evil twin notation you lose the "repl" facility.

But in fact, we probably can't actually allow both of

    multi method substr(Str $s: Ptr $start, PtrDiff ?$len, Str ?$repl)
    multi method substr(Str $s: Ptr $start, Ptr ?$end, Str ?$repl)

since MMD might well tie on who to dispatch

    substr($a, 5, 10)

to, unless we forced it to Perl 5 interpretation in case of tie.  So I'm
guessing we put in

    $x = $a[$start..^$end];

for the non-destructive slicing of a string, and leave substr() with
Perl 5 semantics, in which case it's just a SMOP to coerce the user's

    substr($a, 5, 10);

to something the effectively means

    substr($a, Ptr.new($a, 5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL));

Actually, in this case, I expect we're actually calling into

    multi method substr(Str $s: PtrDiff $start, PtrDiff ?$len, Str ?$repl)

where $start will be counted from the begining of the string, so the
call is effectively

    substr($a, PtrDiff.new(5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL));

Okay, that looks scary, but if as in my previous message we define
"chars" as the highest Unicode level allowed by the context and the
string, then we can just write that in some notation resembling:

    substr($a, 5`Chars, 10`Chars);

or whatever notation we end up with for labeling units on numbers.
Even if we don't define "chars" that way, they just end up labeled with
the current level (here assuming "Codes"):

    substr($a, 5`Codes, 10`Codes);

or whatever.

But this is all implicit, which is why you can just write

    substr($a, 5, 10);

and have it DWYM.

Now, I admit that I've handwaved the tricksy bit, which is, "How do
you know, Larry, that substr() wants 5`Codes rather than 5`Meters?
It's all very well if you have a single predeclared subroutine and
can look at the signature at compile time, but you wrote those as multi
methods up above, so we don't know the signature at compile time."

Well, that's correct, we don't know it at compile time.  But what
*do* we know?  We know we have a number, and that it was generated
in a context where, if you use it like a string position, it should
turn into a number of code points, and if you use it like a weight,
it should turn into a number of kilograms (or pounds, if you're NASA).

In other words, the effective type of that literal 5 is not "Int",
but "Int|Codes|Meters|Kilograms|Seconds|Bogomips" or some such.
And if MMD can handle that in an argument type and match up Codes as
a subtype of Ptr, and if we write our method signature only in the
abstract types like Ptr, we're pretty much home free.  That certainly
simplifies how you write S29, though I don't know if the MMD folks
will be terribly happy with the notion of dispatching arguments with
junctional types.  But it does fall out of the design rather naturally.

I know that offhand this seems like a lot of needless complication,
and it's is a little bit hard to explain, but I believe it's very close
to how we actually use context in real language, so I think people
will find it intuitively obvious once they start using it.  More to
the point, it will produce very *clean* code, uncluttered with most of
the crufty conversions and coercions you find in singly typed languages
that compiler writers tend to love, and programmers tend to hate.

And you can still put in all that cruft if you want to.  You can even
force yourself to have to do it.  But to me, it feels a bit like slavery,
so I'm still looking for a land flowing with milk and honey, even if
there are a few giants in it.

Er, sorry for waxing poetic.

Larry
0
larry
3/27/2005 3:29:06 AM
Larry Wall wrote:

 > ... SKIP ...

> Okay, that looks scary, but if as in my previous message we define
> "chars" as the highest Unicode level allowed by the context and the
> string, then we can just write that in some notation resembling:
> 
>     substr($a, 5`Chars, 10`Chars);
> 
> or whatever notation we end up with for labeling units on numbers.
> Even if we don't define "chars" that way, they just end up labeled with
> the current level (here assuming "Codes"):
> 
>     substr($a, 5`Codes, 10`Codes);
> 
> or whatever.
> 
> But this is all implicit, which is why you can just write
> 
>     substr($a, 5, 10);
> 
> and have it DWYM.
> 
> Now, I admit that I've handwaved the tricksy bit, which is, "How do
> you know, Larry, that substr() wants 5`Codes rather than 5`Meters?
> It's all very well if you have a single predeclared subroutine and
> can look at the signature at compile time, but you wrote those as multi
> methods up above, so we don't know the signature at compile time."
> 
> Well, that's correct, we don't know it at compile time.  But what
> *do* we know?  We know we have a number, and that it was generated
> in a context where, if you use it like a string position, it should
> turn into a number of code points, and if you use it like a weight,
> it should turn into a number of kilograms (or pounds, if you're NASA).
> 
> In other words, the effective type of that literal 5 is not "Int",
> but "Int|Codes|Meters|Kilograms|Seconds|Bogomips" or some such.
> And if MMD can handle that in an argument type and match up Codes as
> a subtype of Ptr, and if we write our method signature only in the
> abstract types like Ptr, we're pretty much home free.  That certainly
> simplifies how you write S29, though I don't know if the MMD folks
> will be terribly happy with the notion of dispatching arguments with
> junctional types.  But it does fall out of the design rather naturally.
> 
 > ...

So do you actually envision perl6 to allow a junction of units on 
numbers? This would have huge implications, depending on what exactly is 
possible with these units...


Would/could some of these DWIM in perl6?

     # import proper MMD-subs for + - * etc...
     use MyUnitConversions;

     my $length = 1.3`Meters + 4.6`Yards;
     my $weight = 4`Pounds - 1'Kilograms;
     my $money = 12`� + 5.78`� + 12`US$;

Then, how would I specify the unit of the result?

     my $unit = 1`Minutes|Kilograms|Meters;

     my $time_mins`Minutes = $unit * 5.45; # 5.45 Minutes
     my $time_secs`Seconds = $unit * 5.45; # 327 Seconds

     my $weight`Weight = $unit * 2.34;
     my $length`Length = $unit * 12.56;

But these would need MMD based on return-type which is out IIRC?
So perhaps just simple:

     my $time_mins = unit_conversion($unit * 5.45, Minutes);
     my $time_secs = unit_conversion($unit * 5.45, Seconds);

Or using some infix-op:

     my $time_mins = ($unit * 5.45) ` Minutes;
     my $time_secs = ($unit * 5.45) ` Seconds;


And what about hex/dec/oct/bin? Could these be included in this system?

     my $number`Dec = 12`Oct + 010101`Bin + 0cab`Hex;

And then we of course definitely need

     my $length = 12`Oct&Miles + 0c`Hex&Kilometers;    :)


Not sure if I'm on right track at all - but Units on numbers with MMD 
gives some really nice ideas to a mathematically minded person...

     my $speed_a = 78`Kilometers / 2`Hour;
     my $speed_b = 50`(Miles/Hour);  # or 50`Miles_per_Hour
     my $delta = $speed_a - $speed_b;


-- 
Markus Laire
<Jam. 1:5-6>
0
markus
3/28/2005 1:47:24 PM
Markus Laire wrote:

> Larry Wall wrote:
>
>>
>> Now, I admit that I've handwaved the tricksy bit, which is, "How do
>> you know, Larry, that substr() wants 5`Codes rather than 5`Meters?
>> It's all very well if you have a single predeclared subroutine and
>> can look at the signature at compile time, but you wrote those as multi
>> methods up above, so we don't know the signature at compile time."
>>
>> Well, that's correct, we don't know it at compile time.  But what
>> *do* we know?  We know we have a number, and that it was generated
>> in a context where, if you use it like a string position, it should
>> turn into a number of code points, and if you use it like a weight,
>> it should turn into a number of kilograms (or pounds, if you're NASA).
>>
>> In other words, the effective type of that literal 5 is not "Int",
>> but "Int|Codes|Meters|Kilograms|Seconds|Bogomips" or some such.
>> And if MMD can handle that in an argument type and match up Codes as
>> a subtype of Ptr, and if we write our method signature only in the
>> abstract types like Ptr, we're pretty much home free.  That certainly
>> simplifies how you write S29, though I don't know if the MMD folks
>> will be terribly happy with the notion of dispatching arguments with
>> junctional types.  But it does fall out of the design rather naturally.
>>
> > ...
>
> So do you actually envision perl6 to allow a junction of units on 
> numbers? This would have huge implications, depending on what exactly 
> is possible with these units...
>
>
> Would/could some of these DWIM in perl6?
>
>     # import proper MMD-subs for + - * etc...
>     use MyUnitConversions;
>
>     my $length = 1.3`Meters + 4.6`Yards;
>     my $weight = 4`Pounds - 1'Kilograms;
>     my $money = 12`� + 5.78`� + 12`US$;
>
> Then, how would I specify the unit of the result?
>
The real "fun" in determining what should happen with units comes to 
when you do operations that _change_ the units.

  my $Current    = 5`Amps;
  my $Resistance = 10`Ohms;
  my $Power      = $Current * $Resistance; # Do I get 50`Watts here?


  my $theta  = 45`Degrees;
  my $x      = cos($theta);  # no units on $x
  my $theta2 = acos($x);     # in radians? or does $x carry a
                             # "used to be Degrees" property?


  my $distance1    = 100`Meters;
  my $distance2    = 0.25`Kilometers;
  my $timeinterval = 5'Seconds;
  my $velocity1    = $distance1 / $timeinterval;
  my $velocity2    = $distance2 / $timeinterval;
  my $acceleration = ($velocity2-$velocity1)/$timeinterval;
      # is $acceleration something like 30`Meters/Second/Second ?

Don't forget fun operations like C<$x**2> and others, which should be 
nicely reversible, or used as is. (Think cubic meters for volume). And 
there will likely always be corner cases like C<exp(2*log(5`Feet)) != 
5`Feet**2> which I would be very surprised about it Perl caught them.

Another fun thing is dealing with ambiguous unit names. "Pound" can 
refer to Force, Mass, or Money. "1`Gallon" can be either "4`Quarts" or 
"5`Quarts" depending on which side of the Atlantic you're on. "Ounce" 
can be either Mass or Liquid Volume, _and_ has US/UK issues. Having to 
specify "5`FluidUSOunces" can get tedious, though hopefully short 
aliases can be made. Then there's the fun of currency, where the 
exchange rates are time dependent. (Do we start say "5'US$.at($time)"?)


All that being said, I think it would be great if we could come up with 
a way of integrating units into Perl 6. But I'd want the following features:
 - It's more than just a "toy" for solving a few minor problems like 
specifying characters.
 - It's fairly comprehensive in that it should be easy to write 
functions which mutate the units intelligently, and know when to flag a 
type mismatch.
 - It all goes away by default for the user who doesn't want to bother 
with it.


-- Rod Adams
0
rod
3/28/2005 8:07:18 PM
Yow -- units would be extra cool for perl6: I know of no other language tha=
t=20
has units support built in.  It would go a long way toward making perl6 the=
=20
language of choice for students in the physical sciences...

The perl5 CPAN modules already have a pretty good unit system that could be=
=20
ported to the junctive strategy.  The problem of resolving ambiguous units =
is=20
subject to DWIMming based on context; my own units engines always include a=
=20
context field that lets you choose what context should be used for further=
=20
unit string parsing (e.g. "SI", "currency", etc.).  If not specified, any=20
unambiguous units in a compound statement can be used to guess an appropria=
te=20
context for the ambiguous ones.  It's not clear how that would fit into the=
=20
junction scheme, but that might bear some thinking about...

Another point:  one should probably worry more about making the unit parser=
=20
extensible, than about making it complete.  The main symptom of an incomple=
te=20
or confused parser is a bunch of units-junk that is usually parsable by a=20
human but not in "simplest form" (e.g. "barn megaparsec tsp^-1" or some suc=
h=20
[0.63]); that's generally not a fatal problem, especially if the user has=20
access to the units database and can add another simplifying resolution.

Yet another point: there are plenty of non-obvious reductions that people=20
worry about, such as "N m" -> "J" (energy) but "m N" -> "m N" (torque); but
it's probably not worth worrying about such things: if the coder knows that=
=20
s/he wants a torque, s/he should be able to ask for reduction to a particul=
ar
form [e.g. 'units( $val, $template )' should exist and return $val in whate=
ver=20
units $template has, if possible.]





Quoth Rod Adams on Monday 28 March 2005 01:07 pm,
> Markus Laire wrote:
> > Larry Wall wrote:
> >> Now, I admit that I've handwaved the tricksy bit, which is, "How do
> >> you know, Larry, that substr() wants 5`Codes rather than 5`Meters?
> >> It's all very well if you have a single predeclared subroutine and
> >> can look at the signature at compile time, but you wrote those as multi
> >> methods up above, so we don't know the signature at compile time."
> >>
> >> Well, that's correct, we don't know it at compile time.  But what
> >> *do* we know?  We know we have a number, and that it was generated
> >> in a context where, if you use it like a string position, it should
> >> turn into a number of code points, and if you use it like a weight,
> >> it should turn into a number of kilograms (or pounds, if you're NASA).
> >>
> >> In other words, the effective type of that literal 5 is not "Int",
> >> but "Int|Codes|Meters|Kilograms|Seconds|Bogomips" or some such.
> >> And if MMD can handle that in an argument type and match up Codes as
> >> a subtype of Ptr, and if we write our method signature only in the
> >> abstract types like Ptr, we're pretty much home free.  That certainly
> >> simplifies how you write S29, though I don't know if the MMD folks
> >> will be terribly happy with the notion of dispatching arguments with
> >> junctional types.  But it does fall out of the design rather naturally.
> >>
> > > ...
> >
> > So do you actually envision perl6 to allow a junction of units on
> > numbers? This would have huge implications, depending on what exactly
> > is possible with these units...
> >
> >
> > Would/could some of these DWIM in perl6?
> >
> >     # import proper MMD-subs for + - * etc...
> >     use MyUnitConversions;
> >
> >     my $length =3D 1.3`Meters + 4.6`Yards;
> >     my $weight =3D 4`Pounds - 1'Kilograms;
> >     my $money =3D 12`=80 + 5.78`=A3 + 12`US$;
> >
> > Then, how would I specify the unit of the result?
>
> The real "fun" in determining what should happen with units comes to
> when you do operations that _change_ the units.
>
>   my $Current    =3D 5`Amps;
>   my $Resistance =3D 10`Ohms;
>   my $Power      =3D $Current * $Resistance; # Do I get 50`Watts here?
>
>
>   my $theta  =3D 45`Degrees;
>   my $x      =3D cos($theta);  # no units on $x
>   my $theta2 =3D acos($x);     # in radians? or does $x carry a
>                              # "used to be Degrees" property?
>
>
>   my $distance1    =3D 100`Meters;
>   my $distance2    =3D 0.25`Kilometers;
>   my $timeinterval =3D 5'Seconds;
>   my $velocity1    =3D $distance1 / $timeinterval;
>   my $velocity2    =3D $distance2 / $timeinterval;
>   my $acceleration =3D ($velocity2-$velocity1)/$timeinterval;
>       # is $acceleration something like 30`Meters/Second/Second ?
>
> Don't forget fun operations like C<$x**2> and others, which should be
> nicely reversible, or used as is. (Think cubic meters for volume). And
> there will likely always be corner cases like C<exp(2*log(5`Feet)) !=3D
> 5`Feet**2> which I would be very surprised about it Perl caught them.
>
> Another fun thing is dealing with ambiguous unit names. "Pound" can
> refer to Force, Mass, or Money. "1`Gallon" can be either "4`Quarts" or
> "5`Quarts" depending on which side of the Atlantic you're on. "Ounce"
> can be either Mass or Liquid Volume, _and_ has US/UK issues. Having to
> specify "5`FluidUSOunces" can get tedious, though hopefully short
> aliases can be made. Then there's the fun of currency, where the
> exchange rates are time dependent. (Do we start say "5'US$.at($time)"?)
>
>
> All that being said, I think it would be great if we could come up with
> a way of integrating units into Perl 6. But I'd want the following
> features: - It's more than just a "toy" for solving a few minor problems
> like specifying characters.
>  - It's fairly comprehensive in that it should be easy to write
> functions which mutate the units intelligently, and know when to flag a
> type mismatch.
>  - It all goes away by default for the user who doesn't want to bother
> with it.
>
>
> -- Rod Adams
0
deforest
3/28/2005 8:30:14 PM
On Mon, 2005-03-28 at 15:07, Rod Adams wrote:
> Markus Laire wrote:

> > So do you actually envision perl6 to allow a junction of units on 
> > numbers? This would have huge implications, depending on what exactly 
> > is possible with these units...

> >     # import proper MMD-subs for + - * etc...
> >     use MyUnitConversions;
> >
> >     my $length = 1.3`Meters + 4.6`Yards;
> >     my $weight = 4`Pounds - 1'Kilograms;
> >     my $money = 12`� + 5.78`� + 12`US$;
> >
> > Then, how would I specify the unit of the result?

You ask for the unit you want. I'm not sure I like or dislike this
syntax, but it's easy enough to see that:

	12`�

is just

	12 but �

where � is probably an alias for the class (role?) Units::Money::Euro.
Addition would be defined in a base class in such a way that conversion
to an appropriate intermediate unit would be done and then addition
performed. The derived classes would provide addition only for the
special case where both operands were in the derived unit (for
performance).

Now you can ask for whatever you like:

	say "We have {�.new $money}�"

Though you might have some snazzy way of saying that.

> The real "fun" in determining what should happen with units comes to 
> when you do operations that _change_ the units.
> 
>   my $Current    = 5`Amps;
>   my $Resistance = 10`Ohms;
>   my $Power      = $Current * $Resistance; # Do I get 50`Watts here?

Again, if Amps and Ohms know that they can do that, then you're all set.
Otherwise, you just construct a new value like so:

	my Watts $Power = $Current * $Resistance;

Which again are probably all aliases for Units::Physics::*

-- 
Aaron Sherman <ajs@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


0
ajs
3/28/2005 8:40:14 PM
On Mon, Mar 28, 2005 at 03:40:14PM -0500, Aaron Sherman wrote:
: Now you can ask for whatever you like:
: 
: 	say "We have {€.new $money}€"
: 
: Though you might have some snazzy way of saying that.

Just by the by, that's illegal syntax.  Methods with arguments
require parens.  You could, however, say

    say "We have {new €: $money}€"

: > The real "fun" in determining what should happen with units comes to 
: > when you do operations that _change_ the units.

Doing the proper dimensionalysis, that's really just a specialized form of type
inferencing, I expect.

: >   my $Current    = 5`Amps;
: >   my $Resistance = 10`Ohms;
: >   my $Power      = $Current * $Resistance; # Do I get 50`Watts here?
: 
: Again, if Amps and Ohms know that they can do that, then you're all set.
: Otherwise, you just construct a new value like so:
: 
: 	my Watts $Power = $Current * $Resistance;
: 
: Which again are probably all aliases for Units::Physics::*

I've always thought that we should make use of the database of the
"units" program for standardized names of units.  The units database
has a pretty good list of which units are just differently scaled
units of the actual underlying fundamental dimensions, and a lot
of encoded experience in distinguishing ambiguous units names.  It'd
be a shame to reinvent all that.

But the basic underlying principle here is just the same as with
characters.  Define your formal parameters in terms of the fundamental
dimensions like Length and Time, and let the computer worry about
the scaling issues.  I consider types like Velocity and Acceleration
to also be fundamental in that sense, even though composed of other
fundamental dimensions, so it's not necessary to write Length/Time
or Length/Time**2 (however you care to spell those types).

It is this sense in which Position or Ptr is fundamental, even though
it is a location in a particular string at a particular time.

Larry
0
larry
3/28/2005 9:00:15 PM
On Mon, Mar 28, 2005 at 01:30:14PM -0700, Craig DeForest wrote:
: Yow -- units would be extra cool for perl6: I know of no other language that 
: has units support built in.  It would go a long way toward making perl6 the 
: language of choice for students in the physical sciences...

Well, yes.  I certainly would have liked it back when I was taking
physics, but I had to make do with BASIC/PLUS.  One of the reasons I
eventually got hired on at the college's computer center was that the
director was impressed by the fact that my physics program to calculate
the gravitational constant from my experimental data was named "G".
He said, "Everyone else always names their programs with the full six
available characters."  So he knew I was a Lazy Critter from the start.

Larry
0
larry
3/28/2005 9:06:31 PM
On Mon, 2005-03-28 at 16:00, Larry Wall wrote:

> I've always thought that we should make use of the database of the
> "units" program for standardized names of units.  The units database
> has a pretty good list of which units are just differently scaled
> units of the actual underlying fundamental dimensions, and a lot
> of encoded experience in distinguishing ambiguous units names.  It'd
> be a shame to reinvent all that.

That makes fine sense, and I think it would be fairly trivial to
generate a set of roles from the Units database at run-time,
pre-compiled with the source or both (selectable in some way).

Of course, there are going to be people who have to re-define chunks of
that namespace because they have special needs (e.g. money -- this is
such a huge bear of a problem that it can only be solved for the
domain-specific cases), but that's fine, and does not preclude your
suggestion.

here's a start:

perl -nle 'while(<>) {print("# $_"),next if /^\s*($|\#)/;$c="";s/\s+\#.*// && ($c=$&);($unit,$def)=split /\s+/, $_, 2;if ($def eq "!") {$base{$unit}=1;print "class units::$unit does unit { ... }$c"} elsif ($unit =~ /-$/){print "# No handling for prefixes yet ($unit=$def)$c"}elsif($base{$def}){print "class units::$unit is units::$def;$c"}else{print "# No handling for derived units yet ($unit=$def)$c"}}' < /usr/share/units.dat

-- 
Aaron Sherman <ajs@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


0
ajs
3/28/2005 10:40:33 PM
The problem with using the units(1) database is that it only deals with 
multiplicative relations -- so, e.g., it won't handle temperature.
 
Units resolvers are not so hard to come by -- the strategy is to try to break
each compound unit out into a collection of fundamental quantities that
are interconvertible (e.g. all lengths get converted to "meters"), while 
keeping track of the conversion constant.  I haven't looked at units(1) in
a while, but it used to do this to both the original units and the destination
units, then take the ratio of the accumulated constants.  That's why the 
"conformability" error messages are sometimes a little weird: they always
refer to the fully expanded SI representation of each unit.


Quoth Aaron Sherman on Monday 28 March 2005 03:40 pm,
> On Mon, 2005-03-28 at 16:00, Larry Wall wrote:
> > I've always thought that we should make use of the database of the
> > "units" program for standardized names of units.  The units database
> > has a pretty good list of which units are just differently scaled
> > units of the actual underlying fundamental dimensions, and a lot
> > of encoded experience in distinguishing ambiguous units names.  It'd
> > be a shame to reinvent all that.
>
> That makes fine sense, and I think it would be fairly trivial to
> generate a set of roles from the Units database at run-time,
> pre-compiled with the source or both (selectable in some way).
>
> Of course, there are going to be people who have to re-define chunks of
> that namespace because they have special needs (e.g. money -- this is
> such a huge bear of a problem that it can only be solved for the
> domain-specific cases), but that's fine, and does not preclude your
> suggestion.
>
> here's a start:
>
> perl -nle 'while(<>) {print("# $_"),next if /^\s*($|\#)/;$c="";s/\s+\#.*//
> && ($c=$&);($unit,$def)=split /\s+/, $_, 2;if ($def eq "!")
> {$base{$unit}=1;print "class units::$unit does unit { ... }$c"} elsif
> ($unit =~ /-$/){print "# No handling for prefixes yet
> ($unit=$def)$c"}elsif($base{$def}){print "class units::$unit is
> units::$def;$c"}else{print "# No handling for derived units yet
> ($unit=$def)$c"}}' < /usr/share/units.dat
0
deforest
3/28/2005 10:48:23 PM
On Mon, 2005-03-28 at 17:48, Craig DeForest wrote:
> The problem with using the units(1) database is that it only deals with 
> multiplicative relations -- so, e.g., it won't handle temperature.

Well, that's fine. You don't have to get everything from one source.
Larry is right though, units is a fine starting point.

I'm still of the basic opinion that all of the logic can be handled
through roles (classes?) and MMD, rather than trying to code up a units
converter in parallel to the type system.

-- 
Aaron Sherman <ajs@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


0
ajs
3/28/2005 10:56:55 PM
On Monday 28 March 2005 05:48 pm, Craig DeForest wrote:
> The problem with using the units(1) database is that it only deals with
> multiplicative relations -- so, e.g., it won't handle temperature.

andrew@twisted:~$ units
2084 units, 71 prefixes, 32 nonlinear units

Among those "nonlinear units" are such units as tempF and tempC, defined as 
functions of a base unit tempK; units(1) seems to be at least decently 
capable with such things.

--Andrew
0
arodland
3/28/2005 11:03:42 PM
Larry Wall wrote:

>On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote:
>
>: Please convince me your view works in practice. I'm not seeing it work 
>: well when I attempt to define the relevent parts of S29. But I might 
>: just be dense on this.
>
>Well, let's work through an example.
>
>    multi method substr(Str $s: Ptr $start, PtrDiff ?$len, Str ?$repl)
>
>Depending on the typology of Ptr and PtrDiff, we can either coerce
>various dimensionalities into an appropriate Ptr and PtrDiff type
>within those classes, or we could rely on MMD to dispatch to a suite
>of substr implementations with more explicit classes.  Interestingly,
>since Ptrs aren't integers, we might also allow
>
>    multi method substr(Str $s: Ptr $start, Ptr ?$end, Str ?$repl)
>
>which might be a more natural way to deal with variable length encodings,
>and we just leave the "lengthy" version in there for old times sake.
>...snip...
>for the non-destructive slicing of a string, and leave substr() with
>Perl 5 semantics, in which case it's just a SMOP to coerce the user's
>
>    substr($a, 5, 10);
>
>to something the effectively means
>
>    substr($a, Ptr.new($a, 5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL));
>
>Actually, in this case, I expect we're actually calling into
>
>    multi method substr(Str $s: PtrDiff $start, PtrDiff ?$len, Str ?$repl)
>
>where $start will be counted from the begining of the string, so the
>call is effectively
>
>    substr($a, PtrDiff.new(5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL));
>
>Okay, that looks scary, but if as in my previous message we define
>"chars" as the highest Unicode level allowed by the context and the
>string, then we can just write that in some notation resembling:
>
>    substr($a, 5`Chars, 10`Chars);
>
>or whatever notation we end up with for labeling units on numbers.
>Even if we don't define "chars" that way, they just end up labeled with
>the current level (here assuming "Codes"):
>
>    substr($a, 5`Codes, 10`Codes);
>
>or whatever.
>
>But this is all implicit, which is why you can just write
>
>    substr($a, 5, 10);
>
>and have it DWYM.
>  
>
I see some danger here. In particular, there is a huge difference 
between a Ptr (position), and a PtrDiff (length). I'm going to rename 
these classes StrPos and StrLen for the time being.

A StrPos can have multiple char units associated with it, and has the 
ability morph between them. However, it is also strictly bound to a 
given string.

A StrLen can only have one char unit associated with it, since there is 
no binding string and anchors with which to reliably map how many cpts 
there are to so many lchars.

I see the following operations being possible at a logical level:

  StrPos = StrPos + StrLen
  StrLen = StrPos - StrPos  # must specify units (else implied), and 
must be same base Str
  StrLen = StrLen + StrLen  # if same units.
  StrLen = StrLen + Int
   
So I see the following cases of Substr happening:

  multi sub substr(Str $s, StrPos $start  : StrPos ?$end,     ?$replace)

Where $start and $end must be anchored to $s

  multi sub substr(Str $s, StrPos $start,   StrLen $length  : ?$replace)

Same restriction on $start,

  multi sub substr(Str $s, StrLen $offset : StrLen ?$length,  ?$replace)

Where $offset gets used as C<$s.start + $offset> and kicked over to case #2.

Hmm. Okay, that's not dangerous, just a lot to look at.


What gets dangerous is letting users think of a StrPos as a number, 
since it's not. Only StrLen's get to pretend to be numbers. StrPos 
should have some nifty methods to return StrLen's relative to it's base 
Str's .start, and those StrLens can look like a number, but the StrPos 
never gets to ever look like a number.

Make it where StrLen "does Int", and there's a 
C�coerce:<as>(Int,StrLen)� with default units of your "Chars as highest 
supported by string applied to", and I think we're getting somewhere.

We need to define what happens to a StrPos when it's base Str goes away. 
Having it assume some nifty flavor of undef would do the trick. This 
implies that a Str knows all the StrPos's hanging off it, so the 
destructor can undef them. But that shouldn't pose a problem for p6c.

>Now, I admit that I've handwaved the tricksy bit, which is, "How do
>you know, Larry, that substr() wants 5`Codes rather than 5`Meters?
>It's all very well if you have a single predeclared subroutine and
>can look at the signature at compile time, but you wrote those as multi
>methods up above, so we don't know the signature at compile time."
>
>Well, that's correct, we don't know it at compile time.  But what
>*do* we know?  We know we have a number, and that it was generated
>in a context where, if you use it like a string position, it should
>turn into a number of code points, and if you use it like a weight,
>it should turn into a number of kilograms (or pounds, if you're NASA).
>  
>
I don't see the need for all this. Make a C�coerce:<as>(Int,StrLen)� as 
mentioned above, and the MMD should be able to figure out that it can 
take the Int peg and hammer it into the StrLen hole. Then leave it up to 
the coerce sub to complain if the Int happens to have units that make 
the peg not fit.

>I know that offhand this seems like a lot of needless complication,
>and it's is a little bit hard to explain, but I believe it's very close
>to how we actually use context in real language, so I think people
>will find it intuitively obvious once they start using it.
>
As one who is seriously thinking about diving head first into the world 
of Natural Language Processing (aka getting a PhD in it), I can tell you 
that determining how we humans actually infer context out of language is 
a mind boggling complex task, and that there are no simple rules to it. 
It is still very much the case that it's easier to teach humans to talk 
like a computer than have a computer understand the human language 
(easier here is defined as simply being possible). We can make the 
computer language look and feel a lot more like a "regular" language, 
but in the end you're still training the human, not the computer. 
(Fixing this problem is what appeals me to the NLP arena.)

That said, I think that most any solution we pick that let's us easily 
dance up and down the Unicode tree with something significantly less 
onerous than Java will become near intuitive once people start using it. 
Consider that anyone on this list has what amounts to an intuitive 
understanding of what

    next if /^\s*$/;

means, even though it looks little like the pure English equivalent: 
"Skip blank lines." (Though we did put the verb first in each case).

What counts is how much translation has to be done from what's in the 
programmer's head into something the computer can grok unambiguously. 
This is not necessarily the same thing as matching how we use language, 
but I will agree there are often corollaries. The Perl programmer 
thinking something like "Skip blank lines" will translate that to C<next 
if /^\s*$/;>, whereas the one thinking "When I get a blank line, skip 
it" will generate C<if /^\s*$/ {next}>.

Where I'm going with this: your statement of "it's very close to how we 
actually use context in real language" is better said "it's very close 
to one of the common ways we actually use context in real language".  
The OO method is another common way, where the Direct Object of the 
sentence is everything. OOP also happens to be a useful way of mapping 
certain tasks into a computer language. There are many other context 
models that we use, none significantly better than the rest in general, 
but each can stomp any other way in specific.

Therefore, TMTOWTDI is a Good Thing.

Since strings are so fundamental to what Perl is, we should be able to 
support several context models and WTDI at once, without prejudice or 
having to declare what we're doing too much. Now all that's left is 
figuring out which contexts are meaningful, and figure out how to get 
them all at once.


-- Rod Adams

PS - I don't think we're that far away from each other on this stuff. 
We're just looking at it from different sides.
0
rod
3/29/2005 7:07:20 AM
Craig DeForest wrote:
> Yet another point: there are plenty of non-obvious reductions that people 
> worry about, such as "N m" -> "J" (energy) but "m N" -> "m N" (torque); but
> it's probably not worth worrying about such things: if the coder knows that 
> s/he wants a torque, s/he should be able to ask for reduction to a particular
> form [e.g. 'units( $val, $template )' should exist and return $val in whatever 
> units $template has, if possible.]

Oh, and don't underestimate the usefulness of doing things in 
non-base-ten mixed units, and useful fractions: my height is 
5`ft+(6+1/4)`in, not 66.25`in, thank you.

In any case, I'd love something like this, and I suspect many other 
people would as well... but remember, again, extensibility is key.

C< $d=22`AWG; $a=pi*($d/2)**2; print $a`mm**2." qm"; # buying wire in 
Germany> doesn't work very well if there is no way to specify insane 
units like AWG.

25`USD*1.06+5`EUR can't be computed at all without extensibility, 
because the conversion rate from USD to EUR doesn't stay static over 
time.  For that matter, monetary conversions are going to take some 
effort to get right, though they will be very useful if they are gotten 
right, because going through a common intermediary isn't correct.  You 
won't get the same results converting USD to EUR to JPY as you will from 
converting USD to JPY.  (OTOH, if you want to convert from DEM to FRF, 
you /must/ convert to EUR in the middle, or you will get the wrong 
result.  Of course, neither the DEM nor the FRF have existed in several 
years, so it probably isn't that important...)

	-=- James Mastros,
	Who certainly looks forward to this.
0
james
3/29/2005 7:56:56 AM
Craig DeForest wrote:
> Yow -- units would be extra cool for perl6: I know of no other language=
 that=20
> has units support built in.  It would go a long way toward making perl6=
 the=20
> language of choice for students in the physical sciences...

Well, my HP48 pocket calculator used to have it :)
--=20
TSa (Thomas Sandla=DF)

0
Thomas
3/29/2005 3:49:07 PM
Craig DeForest wrote:
> Yow -- units would be extra cool for perl6: I know of no other language that 
> has units support built in.  It would go a long way toward making perl6 the 
> language of choice for students in the physical sciences...

Frink is built around this idea: http://c2.com/cgi/wiki?FrinkLanguage

-
osfameron
0
hakim
3/29/2005 4:24:20 PM
Larry Wall wrote:
> On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote:
> : How can you have a level independent position?
>=20
> By not confusing positions with numbers.  They're just pointers into
> a particular string.

I'm not the Unicode guru but my understanding is that all composition
sequences are finite and stateless with respect to everything before
and after them in the string.  Which brings me to the question if these
positions are defined like positions in Emacs as lying *between* the
chars?  Then the set of positions of a higher level is a subset of the
positions of lower levels.

With defining position as between chars many operations on strings are
downwards compatible between levels, e.g. splitting. If one determines
e.g. an insert position on a higher level there's no problem in letting
the actual insertion beeing handled by a lower level.  With fractional
positions on higher levels some degree of upward or tunneling
compatibility can be achieved.

BTW, will bidirectionality be supported? Does it make sense to reflect
it in the StrPos type such that $pos_start < $pos_end means a non-empty
left to right string, $pos_start > $pos_end is a non-empty right to left
string and $pos_start =3D=3D $pos_end delimit an empty (sub)string? As a
natural consequence the sign indicates direction with negative length
beeing right to left.  And that leads to two times two types of iterators=
:
left to right, right to left, start to end and end to start.

All the above leads me to rant about an array like type. Please forgive
me if the following is not proper Perl6. My point is to illustrate how
I imagine the future communication between implementor and user of such
a class.  Actually some POD support for extracting the type information
into the documentation would be great, too!

And yes, the :analyse should be made lazy. The distinction between the
first and second index method could be even more specific by using
type 'Index ^ List of Str where { $_.elems =3D=3D 1 }' to convey the
information that indexing with a list of one element doesn't result
in a List of Str but a plain Str. OTOH this will incur a performance
penalty and violate the intuitive notion "list in, list out".

class StrPosArray does Array where { ::Index does StrPos }
{
    has Str    $:data;
    has StrPos @:pos;

    multi method postcircumfix:<[ ]>
        (:          Index $i ) returns         Str {...}
    multi method postcircumfix:<[ ]>
        (: List  of Index $i ) returns List of Str {...}
    multi method postcircumfix:<[ ]>
        (: Range of Index $i ) returns List of Str {...}
    multi method postcircumfix:<[ ]>
        (:            Int $i ) returns         Str {...}

    # more stuff here for push, pop, shift etc.

    method infix:<=3D> (: Str $rhs ) returns ::?CLASS
    {
       $:data =3D $rhs;
       :analyse;
    }

    method :analyse ()
    {
       # scan $:data for all between char positions
       # and store them into @:pos
    }
}

Question:
   does the compiler go over this source in multiple passes
   such that the declaration of :analyse is known before its
   usage in infix:<=3D>?
--=20
TSa (Thomas Sandla=DF)


0
Thomas
3/31/2005 1:03:09 PM
On Thu, Mar 31, 2005 at 03:03:09PM +0200, Thomas Sandla� wrote:
: Larry Wall wrote:
: >On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote:
: >: How can you have a level independent position?
: >
: >By not confusing positions with numbers.  They're just pointers into
: >a particular string.
: 
: I'm not the Unicode guru but my understanding is that all composition
: sequences are finite and stateless with respect to everything before
: and after them in the string.  Which brings me to the question if these
: positions are defined like positions in Emacs as lying *between* the
: chars?  Then the set of positions of a higher level is a subset of the
: positions of lower levels.

Yes, that's how I've been thinking of them.  Thanks for making that explicit.

: With defining position as between chars many operations on strings are
: downwards compatible between levels, e.g. splitting. If one determines
: e.g. an insert position on a higher level there's no problem in letting
: the actual insertion beeing handled by a lower level.  With fractional
: positions on higher levels some degree of upward or tunneling
: compatibility can be achieved.

That's my feeling.

: BTW, will bidirectionality be supported? Does it make sense to reflect
: it in the StrPos type such that $pos_start < $pos_end means a non-empty
: left to right string, $pos_start > $pos_end is a non-empty right to left
: string and $pos_start == $pos_end delimit an empty (sub)string? As a
: natural consequence the sign indicates direction with negative length
: beeing right to left.  And that leads to two times two types of iterators:
: left to right, right to left, start to end and end to start.

Offhand I'd rather have end < start be undefined, I think, but I
suppose we could give it a meaning if it turns out not to be an
easily generated degenerate case like 0..-1.  On the other hand,
I think right-to-left might deserve more Huffman visibility than an
itty-bitty sign that might be hidden down in a varible.

But then, we've played games with signs in substr and splice before.
It's not clear that people would want substr($x, -3) to return the
characters in reversed order, though.

: All the above leads me to rant about an array like type. Please forgive
: me if the following is not proper Perl6. My point is to illustrate how
: I imagine the future communication between implementor and user of such
: a class.  Actually some POD support for extracting the type information
: into the documentation would be great, too!
: 
: And yes, the :analyse should be made lazy. The distinction between the
: first and second index method could be even more specific by using
: type 'Index ^ List of Str where { $_.elems == 1 }' to convey the
: information that indexing with a list of one element doesn't result
: in a List of Str but a plain Str. OTOH this will incur a performance
: penalty and violate the intuitive notion "list in, list out".

MEGO.

: class StrPosArray does Array where { ::Index does StrPos }
: {
:    has Str    $:data;
:    has StrPos @:pos;
: 
:    multi method postcircumfix:<[ ]>
:        (:          Index $i ) returns         Str {...}
:    multi method postcircumfix:<[ ]>
:        (: List  of Index $i ) returns List of Str {...}
:    multi method postcircumfix:<[ ]>
:        (: Range of Index $i ) returns List of Str {...}
:    multi method postcircumfix:<[ ]>
:        (:            Int $i ) returns         Str {...}
: 
:    # more stuff here for push, pop, shift etc.
: 
:    method infix:<=> (: Str $rhs ) returns ::?CLASS
:    {
:       $:data = $rhs;
:       :analyse;
:    }
: 
:    method :analyse ()
:    {
:       # scan $:data for all between char positions
:       # and store them into @:pos
:    }
: }
: 
: Question:
:   does the compiler go over this source in multiple passes
:   such that the declaration of :analyse is known before its
:   usage in infix:<=>?

No, you just throw in a forward declaration with {...} in that case.

Larry
0
larry
4/2/2005 7:40:17 PM
Larry Wall wrote:

>On Thu, Mar 31, 2005 at 03:03:09PM +0200, Thomas Sandla� wrote:
>
>: BTW, will bidirectionality be supported? Does it make sense to reflect
>: it in the StrPos type such that $pos_start < $pos_end means a non-empty
>: left to right string, $pos_start > $pos_end is a non-empty right to left
>: string and $pos_start == $pos_end delimit an empty (sub)string? As a
>: natural consequence the sign indicates direction with negative length
>: beeing right to left.  And that leads to two times two types of iterators:
>: left to right, right to left, start to end and end to start.
>
>Offhand I'd rather have end < start be undefined, I think, but I
>suppose we could give it a meaning if it turns out not to be an
>easily generated degenerate case like 0..-1.  On the other hand,
>I think right-to-left might deserve more Huffman visibility than an
>itty-bitty sign that might be hidden down in a varible.
>
>But then, we've played games with signs in substr and splice before.
>It's not clear that people would want substr($x, -3) to return the
>characters in reversed order, though.
>
I don't see how rtl vs ltr changes how we process strings. It's purely a 
display problem. I seriously doubt the someone working with a rtl 
language would ever wish to count the characters ltr. And note that we 
are calling the positions "start" and "end", not "left" and "right".

If I'm missing something basic here, let me know.

-- Rod Adams
0
rod
4/3/2005 9:03:40 AM
Reply: