Content-Type: text/plain; charset=US-ASCII
TL;DR: die()ing from inside call_sv()ed code doesn't pop the CXt_SUB
frame with retop=3D=3DNULL - why?
Background: Today I have a question, related to my Future::AsyncAwait
work, for which I now have a TPF grant :) I'm looking into a failure
case wherein the actual C-level stack (as shown by e.g. gdb's `bt`)
disagrees with the Perl-level context ("CX") stack, after a certain
condition. Looking at this condition, I can't really see what I'm
doing any different from any other normal (XS) perl code, so I
thought I'd ask here if anyone has any insight.
The scenario concerns the operation of the API function `call_sv()` (and
friends like `call_method()` which are just wrappers of it).
Under normal non-exceptional circumstances in which no code dies, the C
and Perl-level context stacks all agree.
/* some amount of context stack is here; cxstack_ix has a value */
call_sv(some CV ref..., G_SCALAR);
/* at this point, cxstack_ix has returned to the same height */
During the operation of `call_sv()` the context stack gets more entries
pushed to it, but then they get popped again by the time it returns, so
the whole operation appears nice and neat. At the C level, the
`call_sv()` function appeared on the C stack, resulting in a nested
call to the main runops loop, which itself eventually terminated,
causing the thing to return. If you consider the ranges of entries on
the C and Perl CX stacks, they are nicely inter-related regions. All
well and good.
Internally, what `call_sv()` does is:
1) creates a new temporary LOGOP with zeroed-out fields
2) sets PL_op to point to it
3) PUSHs()s the SV to invoke onto the args stack
Then in this case where the G_EVAL flag isn't set (as here):
4) calls CALL_BODY_SUB() macro to invoke it - which itself invokes
5) OP_ENTERSUB does a bunch of things, one key part of which is to
call `cx_pushblock()` + `cx_pushsub()` to create the CXt_SUB
context on the CX stack. Since the PL_top is the new temporary op
created at step 1, the PL_op->op_next field will be NULL, and thus
this new CXt_SUB entry will have NULL as its retop field. This part
6) CALL_SUB_BODY() then calls CALLRUNOPS(), which invokes the main
runop loop a nested time on the C stack.
7) The main runop loop runs the body of the SV, which invokes an
8) This OP_LEAVESUB will pop all the context frames that OP_ENTERSUB
put there in step 5, ultimately returning the retop pointer back to
the runop loop. Due to the NULLed field in step 1, this will be
a NULL pointer.
9) This NULL pointer is a signal to the (inner nested) runop loop to
stop its work. It thus returns to the call_sv() function that
At this point, the C stack has been wound back to how we started, as
has the Perl CX stack. The two are in agreement.
What I now don't understand, is what happens on an exception; what
happens if the code body inside the SV we're `call_sv()`'ing in fact
What I am discovering in my scenario in Future::AsyncAwait, is that the
OP_DIE causes a large amount of C stack unwinding as a result of the
setjmp/longjmp magics around the JMPENV_* macros, such that the
`call_sv()` frame is unwound completely. At this point the interpreter
has now returned to the _outer_ toplevel runops loop - the one invoked
by Perl itself to run the toplevel of my actual script. But it hasn't
unwound the CX stack, so that inner CXt_SUB frame still exists on there
- the one with the NULL retop. The C and Perl CX stacks have become
As a result of this, the outer runop loop continues to invoke code
until the next time someone tries to OP_LEAVESUB back past that frame
that should have been tidied away. This causes the main runop loop to
return NULL, thus signalling to Perl that the main program is finished
and it's time to invoke an orderly END and global-destruction time.
This results in the failure seen at
Now what all puzzles me here is that I'm not really doing anything odd
with my `call_sv()` (well, `call_method()`) calls here. Comparing my
code with lots of other XS examples (e.g. of which I've written many
myself), I don't see any other code which has this trouble. I seem to
be unique here in F:AA in arriving at this disconnection.
Is anyone able to shed any light on this puzzle?
Paul "LeoNerd" Evans
firstname.lastname@example.org | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Content-Description: OpenPGP digital signature
-----BEGIN PGP SIGNATURE-----
-----END PGP SIGNATURE-----