The future of XPCOM memory management

AddRef and Release constitute a contract between every XPCOM object
and all its users.  The contract governs object lifetimes,
finalization order, and memory management.

Advantages of this specific contract:

  1. It's relatively simple.
  2. It requires no global coordination.
  3. Prompt destruction.  If there are no cycles, objects are
     destroyed as soon as they're no longer needed.
  4. It destroys objects in the right order.

Disadvantages:

  1. It requires manual bookkeeping throughout the codebase
     (nsCOMPtr, NS_ADDREF, NS_RELEASE, already_AddRefed,
     kungFuDeathGrip, etc.)  This clutters up the code, and all the
     virtual method calls and AtomicIncrements can't be good.
  2. The problem of reference cycles is built in.
  3. Interacting with other memory management schemes is painful
     and slow.  (See cycle collector, XPConnect.)

In Mozilla 2, we should change this to require less effort from the
programmer and less clutter throughout the code.  This means coming
up with a new contract that cooperates with garbage collection,
rather than fighting it.

More on this in a few hours.

-j

0
Jason
8/28/2007 1:34:50 PM
mozilla.dev.tech.xpcom 1345 articles. 0 followers. Post Follow

47 Replies
549 Views

Similar Articles

[PageSpeed] 30

On Aug 28, 9:34 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> More on this in a few hours.

I don't think there's such a thing as a contract that (a) is simple,
(b) is
efficient, (c) supports GC, (d) supports refcounting for objects that
want it,
and (e) really hides memory management implementation details.  So
there will
be design tradeoffs here.

For the sake of having a concrete proposal to chew on, I propose the
following:

  * Drop AddRef and Release from nsISupports.

  * Require all XPCOM objects to be MMgc GCObjects or
GCFinalizedObjects,
    allocated from the same GC allocator as all JavaScript objects.

    Read more about MMgc here:
      http://developer.mozilla.org/en/docs/MMgc

  * Change any code that depends on objects being destroyed in a
specific
    order, or at a specific time, to use some explicit means of
ensuring that
    it really happens that way, rather than depending on reference
counting.

  * Use static tools to replace our uses of nsCOMPtr and friends with
the MMgc
    equivalents, and replace nsISupportsWeakReference with MMgc
GCWeakRefs.

  * Add thread-safety to MMgc using the Spidermonkey request model.

  * Delete the cycle collector.

How did I do?  This proposal nails goal (c), does quite well on (a)
and OK on (b),
and ignores goals (d) and (e).  Maybe you can do better.

I'll be working on the following bug quite soon, so now is the time to
speak up.

  Bug 393034 - Allocate DOM objects using MMgc
  https://bugzilla.mozilla.org/show_bug.cgi?id=393034

-j

0
Jason
8/28/2007 8:32:44 PM
I definitely think we should try to get rid of reference counting in
favor of garbage collection. There are a few things that I'm worried
about and that needs investigation.

How complicated is embedding going to be? I suspect it's going to be
more complicated than now in some ways, since interacting with a GC
engine is probably trickier than simply calling AddRef. At the same
time, environments that use GC, such as java, should have an easier time
avoiding leaks.

How is performance going to be during GC? While we're not writing a
real-time app, we don't want the UI to lock up for seconds while GC is
running. If we get better at detecting inactivity from the user, we
might get away with more here since we can run GC while the user is busy
simply looking at a webpage.

Related to the above, should we attempt to use incremental GC. From what
I understand this should be entierly possible with MMgc. However it 
requires that all pointers use special smart-pointers. Including 
pointers that are currently raw-pointers. This seems a little bit scary 
and easy to forget, but might be very nice for performance.

/ Jonas

Jason Orendorff wrote:
> AddRef and Release constitute a contract between every XPCOM object
> and all its users.  The contract governs object lifetimes,
> finalization order, and memory management.
> 
> Advantages of this specific contract:
> 
>   1. It's relatively simple.
>   2. It requires no global coordination.
>   3. Prompt destruction.  If there are no cycles, objects are
>      destroyed as soon as they're no longer needed.
>   4. It destroys objects in the right order.
> 
> Disadvantages:
> 
>   1. It requires manual bookkeeping throughout the codebase
>      (nsCOMPtr, NS_ADDREF, NS_RELEASE, already_AddRefed,
>      kungFuDeathGrip, etc.)  This clutters up the code, and all the
>      virtual method calls and AtomicIncrements can't be good.
>   2. The problem of reference cycles is built in.
>   3. Interacting with other memory management schemes is painful
>      and slow.  (See cycle collector, XPConnect.)
> 
> In Mozilla 2, we should change this to require less effort from the
> programmer and less clutter throughout the code.  This means coming
> up with a new contract that cooperates with garbage collection,
> rather than fighting it.
> 
> More on this in a few hours.
> 
> -j
> 

0
Jonas
8/28/2007 10:04:30 PM
One thing not raised in your proposal is what to do with objects that
are currently *not* refcounted. Two good examples are strings (nsString
and friends) and arrays, for example nsTArray.

If we turn all currently refcounted objects into GCFinalizedObject, then
any nsString and nsTArray inline members will get their destructor 
called when the hosting object is destroyed. Would that be a big 
overhead? The devmo docs discourage GCFinalizedObject.

An alternative is to make nsString and nsTArray inherit GCObject, and 
make them allocate their internal buffers using GC::Alloc.

However this doesn't fully work for nsTArray since we would still not be 
finalizing the objects in the array. We could of course say that you're 
not allowed to stick objects that need finalizing in nsTArray, but that 
would probably break a good number of current users, for example 
PathExpr::mItems in txExpr.h. This contains a number of PathExprItems like:

      class PathExprItem {
      public:
          nsAutoPtr<Expr> expr;
          PathOperator pathOp;
      };

One way to fix this one example would be to make PathExpr::PathExprItem 
and Exprs be GCObjects too, and so on. However with this strategy we 
would likely be forced to convert a very large number of objects into 
GCObject. This certainly sounds doable, but it seems like a lot of work, 
much of it risky.

We can't simply make nsTArray a GCFinalizedObject, for two reasons. 
First of all we don't really want to pay the overhead of a vtable 
pointer. This class is just 4 bytes big when empty, so it would double 
in size. Second, nsTArrays often appears as inline members in other 
classes. If such classes are GCObjects then the garbage collector will 
not be able to detect the inline GCFinalizedObject and finalize it.

/ Jonas

Jason Orendorff wrote:
> On Aug 28, 9:34 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
>> More on this in a few hours.
> 
> I don't think there's such a thing as a contract that (a) is simple,
> (b) is
> efficient, (c) supports GC, (d) supports refcounting for objects that
> want it,
> and (e) really hides memory management implementation details.  So
> there will
> be design tradeoffs here.
> 
> For the sake of having a concrete proposal to chew on, I propose the
> following:
> 
>   * Drop AddRef and Release from nsISupports.
> 
>   * Require all XPCOM objects to be MMgc GCObjects or
> GCFinalizedObjects,
>     allocated from the same GC allocator as all JavaScript objects.
> 
>     Read more about MMgc here:
>       http://developer.mozilla.org/en/docs/MMgc
> 
>   * Change any code that depends on objects being destroyed in a
> specific
>     order, or at a specific time, to use some explicit means of
> ensuring that
>     it really happens that way, rather than depending on reference
> counting.
> 
>   * Use static tools to replace our uses of nsCOMPtr and friends with
> the MMgc
>     equivalents, and replace nsISupportsWeakReference with MMgc
> GCWeakRefs.
> 
>   * Add thread-safety to MMgc using the Spidermonkey request model.
> 
>   * Delete the cycle collector.
> 
> How did I do?  This proposal nails goal (c), does quite well on (a)
> and OK on (b),
> and ignores goals (d) and (e).  Maybe you can do better.
> 
> I'll be working on the following bug quite soon, so now is the time to
> speak up.
> 
>   Bug 393034 - Allocate DOM objects using MMgc
>   https://bugzilla.mozilla.org/show_bug.cgi?id=393034
> 
> -j
> 

0
Jonas
8/28/2007 11:32:24 PM
Jason Orendorff wrote:
>   * Add thread-safety to MMgc using the Spidermonkey request model.

We currently don't pay the cost of thread-safety for most XPCOM objects, 
since most objects are main-thread-only.  Would doing this impose 
significant performance penalties for locking or atomic operations?  Or 
would most operations still be cheap?

>   * Delete the cycle collector.

Are we dropping the multi-language aspects of the platform (introduced 
in this milestone, at considerable effort in 
https://bugzilla.mozilla.org/show_bug.cgi?id=255942 )?  Or is there a 
good way for python to use MMgc as well?

(That said, the python stuff hasn't caught up with the cycle collector, 
and would probably leak a lot if it were used as intensively as we use 
JS, so I'm not sure how seriously we should take it.)

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/
0
L
8/28/2007 11:46:58 PM
On Aug 29, 9:46 am, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:

> >   * Delete the cycle collector.
>
> Are we dropping the multi-language aspects of the platform (introduced
> in this milestone, at considerable effort inhttps://bugzilla.mozilla.org/show_bug.cgi?id=255942)?  Or is there a
> good way for python to use MMgc as well?
>
> (That said, the python stuff hasn't caught up with the cycle collector,
> and would probably leak a lot if it were used as intensively as we use
> JS, so I'm not sure how seriously we should take it.)

Yes, this will be a challenge.  I can't picture how dropping refcounts
would work in the general case with Python.  The XPCOM objects exposed
by Python can be made a GCObject - but I'm not sure how we would
integrate the rest of the Python universe - eg, assuming we have an
arbitrary number of Python objects holding pointers to xpcom objects,
I'm not sure how we would tell the GC about all such references - and
without that knowledge, the GC would cleanup objects that are
referenced.  Deep hacks specific to Python and MMgc might be possible,
but that still screws Perl, etc.

Building a new version of Python that uses MMgc might be possible but
(a) it might not be and (b) every Python extension module would also
need to either (likely) change or (if we are extremely lucky) be
rebuilt.

To summarize, I see that having an external language integrate with
such MMgc is no more - and no less - difficult than integrating with
the existing spidermonkey GC - and I'm not aware of anyone who
believes that is feasible in either the general case, or even just the
specific cases on the table (ie, existing languages with xpcom
bindings)

On the other side of the coin though, the future may be closer to
something like .NET, where languages like Python are reimplemented on
top of a new VM - in which case the GC comes "for free" - but in such
a world xpcom doesn't make as much sense anyway - the VM itself can
make cross-language calls.  So maybe we are asking the wrong question
- what is the future of XPCOM itself, not just its memory management?

Cheers,

Mark

0
mhammond
8/29/2007 1:07:55 AM
Replying to several messages/thoughts at once:

1. Losing the cycle collector's support for other languages is
necessary to get C++ and JS on a better footing -- a shared GC heap.
But the idea and even code could be harvested for use by other
language runtimes, since we will still face uncollectable cycle
hazards interfacing Java, Python, etc. to C++ and JS.

2. Other languages in Mozilla 2 should prefer to integrate at the VM
level, on Tamarin. See IronMonkey.

3. Jonas's first point: We should avoid making non-refcounted XPCOM
classes be GC-objects without good evidence doing so wins in time and
space overhead. But (separate topic/thread) we hope to use std::string
and the like more where possible, and use Taras's elsa-based tools to
write the mega-patches for us.

4. Jonas's second point: MMgc needs to become more conservative about
interior objects. Currently it does not back up from a pointer to an
interior (via MI or explicit member embedding) to the outermost
(allocation) object.

5. David's first point: the request model is already followed (file
bugs if you can) in Mozilla code (this was not always so). We aim to
keep it, but it does not mean there is any thread-safety cost imposed
on GC'ed objects. Only that the embedding must begin, end, suspend,
resume, and yield requests appropriately (could use some static
analysis help here too). The JS objects that SpiderMonkey creates
already use the request model to do optimistic lock-free
synchronization, so no change there. And we are not imposing such
synchronization on other objects for Mozilla 2, as far as I can see.

/be

0
brendan
8/29/2007 1:09:55 AM
On Aug 28, 6:07 pm, mhammond <mhamm...@skippinet.com.au> wrote:
> The XPCOM objects exposed
> by Python can be made a GCObject - but I'm not sure how we would
> integrate the rest of the Python universe - eg, assuming we have an
> arbitrary number of Python objects holding pointers to xpcom objects,

MMgc is conservative. So long as Python allocates memory to be scanned
for pointers to MMgc allocations using MMgc's malloc wrapper, MMgc
will find these pointers. You need an MMgc GCRoot subclass (one or
more, perhaps at most one per other language implementation) to root
all the Python objects that are not (or not guaranteed to be) pointed
at by pointers in memory MMgc scans.

> I'm not sure how we would tell the GC about all such references - and
> without that knowledge, the GC would cleanup objects that are
> referenced.

That would be bad, so let's not ;-).

> Deep hacks specific to Python and MMgc might be possible,
> but that still screws Perl, etc.

Perl is not helped much by PyXPCOM, right? How does this differ with
XPCOM-on-MMgc?

> Building a new version of Python that uses MMgc might be possible but
> (a) it might not be and (b) every Python extension module would also
> need to either (likely) change or (if we are extremely lucky) be
> rebuilt.

This is not the way, agreed.

> To summarize, I see that having an external language integrate with
> such MMgc is no more - and no less - difficult than integrating with
> the existing spidermonkey GC -

It's easier because MMgc is conservative -- but I should add that it's
harder if MMgc is used in incremental mode, because you need to impose
a write barrier on Python.

> and I'm not aware of anyone who
> believes that is feasible in either the general case, or even just the
> specific cases on the table (ie, existing languages with xpcom
> bindings)

GC-to-GC cycle collection is easier than refcount cycle collection.
See http://www.cs.cmu.edu/~roc/HetGC.html (I hope this is sound -- I
never did proofs ;-) and Parley from IBM research, where the best link
I can find is:

http://researchweb.watson.ibm.com/vee04/video.html#grove

> On the other side of the coin though, the future may be closer to
> something like .NET, where languages like Python are reimplemented on
> top of a new VM - in which case the GC comes "for free" - but in such
> a world xpcom doesn't make as much sense anyway - the VM itself can
> make cross-language calls.

Precisely -- wherefore IronMonkey.

> So maybe we are asking the wrong question
> - what is the future of XPCOM itself, not just its memory management?

And indeed Jason posted a separate thread on that topic. See you
there :-).

/be

0
brendan
8/29/2007 1:23:09 AM
On Aug 28, 6:23 pm, "bren...@mozilla.org" <bren...@mozilla.org> wrote:
> On Aug 28, 6:07 pm, mhammond <mhamm...@skippinet.com.au> wrote:
>
> > The XPCOM objects exposed
> > by Python can be made a GCObject - but I'm not sure how we would
> > integrate the rest of the Python universe - eg, assuming we have an
> > arbitrary number of Python objects holding pointers to xpcom objects,
>
> MMgc is conservative. So long as Python allocates memory to be scanned
> for pointers to MMgc allocations using MMgc's malloc wrapper, MMgc
> will find these pointers.

I should have written "using an appropriate malloc wrapper". Ideally
only memory that might contain pointers to XPGC (heh) objects would be
allocated with the kContainsPointer MMgc flag.

that may be hard to hook into C-Python. Is it?

/be

0
brendan
8/29/2007 1:26:03 AM
mhammond wrote:

> Yes, this will be a challenge.  I can't picture how dropping refcounts
> would work in the general case with Python.  The XPCOM objects exposed
> by Python can be made a GCObject - but I'm not sure how we would
> integrate the rest of the Python universe - eg, assuming we have an
> arbitrary number of Python objects holding pointers to xpcom objects,
> I'm not sure how we would tell the GC about all such references - and

Can't you just root them while python holds the external references?

--BDS

0
Benjamin
8/29/2007 12:11:29 PM
Jonas Sicking wrote:

> How complicated is embedding going to be? I suspect it's going to be
> more complicated than now in some ways, since interacting with a GC
> engine is probably trickier than simply calling AddRef. At the same
> time, environments that use GC, such as java, should have an easier time
> avoiding leaks.

I believe that embedders should never see any of this: embedders should not
use XPCOM any more, we should expose real platform-native embedding layers
with a stable API.

--BDS
0
Benjamin
8/29/2007 3:03:22 PM
Jason Orendorff wrote:


>   * Use static tools to replace our uses of nsCOMPtr and friends with
> the MMgc
>     equivalents, and replace nsISupportsWeakReference with MMgc
> GCWeakRefs.

There are two different uses of weakrefs in our tree:

1) Weak refs used to avoid cycles: A holds a strong-ref to B, which holds a
weak ref to A. This is more COM-safe than holding a raw pointer to A. This
pattern should simply be replaced by object pointers, which I think means
DWB(MyObject*)

>   * Add thread-safety to MMgc using the Spidermonkey request model.

This is the part that has me extremely worried. We would have to propagate
the request model throughout all of our XPCOM code, which is a very tricky
task. As I understand it, the invariants for requests are:

1) GC references may change only within a request
2) blocking (or long-running) activity should not take place within a request

I think that keeping track of whether we're currently in a request would be
 a major headache.

Brendan may kill me for this, but I think that we can and should assert
single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
should only be on the main thread.

There are necko classes which will want to be accessible via MMGC/XPCOM and
also be internally threadsafe, but I believe that they can root themself on
the main thread and perform all of their multi-thread networking using an
internal threadsafe reference-counting and proxying scheme that is "not XPCOM".

> I'll be working on the following bug quite soon, so now is the time to
> speak up.
> 
>   Bug 393034 - Allocate DOM objects using MMgc
>   https://bugzilla.mozilla.org/show_bug.cgi?id=393034

Are we sure that we want to get into the "revamp XPCOM" game just to get
going with fast-path DOM? The approach you mention in comment 14 approach C
makes a lot of sense, without rewriting all of XPCOM:

1) use MMGC for internal DOM references
2) keep using XPCOM for "external" references - XPCOM refs mean the object
is rooted
3) teach XPConnect to use MMGC references for DOM objects

--BDS
0
Benjamin
8/29/2007 3:22:22 PM
Benjamin Smedberg wrote:
> mhammond wrote:
> 
>> Yes, this will be a challenge.  I can't picture how dropping refcounts
>> would work in the general case with Python.  The XPCOM objects exposed
>> by Python can be made a GCObject - but I'm not sure how we would
>> integrate the rest of the Python universe - eg, assuming we have an
>> arbitrary number of Python objects holding pointers to xpcom objects,
>> I'm not sure how we would tell the GC about all such references - and
> 
> Can't you just root them while python holds the external references?

You can root at some cost. But roots aren't cheap, and IIRC Mark used 
delegated Python incref/decref to call XPCOM AddRef/Release directly -- 
cheaper and in sync with Python's ref-counting with background GC memory 
management. And with roots, you can still have cycles between heaps (but 
these were not addressed by PyXPCOM, and as dbaron noted, the XPCOM 
cycle collector's stubs for PyXPCOM need to be fleshed out and tested).

Roots are not the answer for interior nodes. With SpiderMOnkey, 
delegated trace (formerly mark) and finalize class hooks help. But any 
two memory managers colliding, you want cheaper-than-global-root edge 
tracing, and for leak-proofing you must have to do something further to 
deal with cycles.

Life is better with a single GC, which is why we are unifying C++ and JS 
on top of MMgc for Mozilla 2, and why we are supporting IronMonkey to 
add other languages on top of a common memory manager and JITting VM 
(along with memory safety and other benefits as motivation, in addition 
to avoiding multi-heap integration and cycle-breaking hassles).

/be
0
Brendan
8/29/2007 7:01:58 PM
On Aug 29, 11:22 am, Benjamin Smedberg <benja...@smedbergs.us> wrote:
> Jason Orendorff wrote:
> >   * Add thread-safety to MMgc using the Spidermonkey request model.
>
> This is the part that has me extremely worried. We would have to propagate
> the request model throughout all of our XPCOM code, which is a very tricky
> task. As I understand it, the invariants for requests are:
>
> 1) GC references may change only within a request
> 2) blocking (or long-running) activity should not take place within a request
>
> I think that keeping track of whether we're currently in a request would be
> a major headache.

I am not too worried.  I think I see how this is going to work.

All threads will be in a request all the time, except when doing
blocking I/O or CPU-bound, non-GC-touching stuff.  You'll have to
suspend the request before doing that kind of thing, and resume it
afterwards.  I imagine we'll have a C++ object that knows how to do
this.  (Like nsAutoLock. "nsAutoSuspendRequest", maybe.)

Finding those places would be the only hard part. But if you miss one,
it should be *real* easy to spot and debug. Firefox will seem to hang.
You'll attach a debugger, and all threads will be sitting in
MMgc::waitForGC except for one, which will be blocked on DNS or
compositing video buffers.

> Are we sure that we want to get into the "revamp XPCOM" game just to get
> going with fast-path DOM? The approach you mention in comment 14 approach C
> makes a lot of sense, without rewriting all of XPCOM:
> [...]

Well, my short-term plans are still incremental.  But this
conversation has been dormant since December.  I really want to know
the long-term plan, for both short-term and long-term reasons.

-j

0
Jason
8/29/2007 7:05:04 PM
Benjamin Smedberg wrote:
> Jonas Sicking wrote:
> 
>> How complicated is embedding going to be? I suspect it's going to be
>> more complicated than now in some ways, since interacting with a GC
>> engine is probably trickier than simply calling AddRef. At the same
>> time, environments that use GC, such as java, should have an easier time
>> avoiding leaks.
> 
> I believe that embedders should never see any of this: embedders should not
> use XPCOM any more, we should expose real platform-native embedding layers
> with a stable API.

Or an unstable API. It's what everyone else does. Stability arises 
through a "conversation" between producers and consumers, and over time 
increases, until there's a "big shift". This came up in face-to-face 
meetings, and we should hash it out.

New thread in .embedding, cross-posted here with followup-to: set?

/be
0
Brendan
8/29/2007 7:10:09 PM
Jonas Sicking wrote:
> Related to the above, should we attempt to use incremental GC. From what
> I understand this should be entierly possible with MMgc. However it 
> requires that all pointers use special smart-pointers. Including 
> pointers that are currently raw-pointers. This seems a little bit scary 
> and easy to forget, but might be very nice for performance.

Raw pointers should be banned in incremental GC settings. We can use
static analysis to enforce this.

And anyway, we really do need to understand ownership at every edge in
the graph. Right now, a raw pointer is a giant question mark that should
raise alarms about either manual-over-refcounted leak bugs, or else
manually-dropped-early or just plain-old-raw-weak-pointer and therefore
dangling-pointer, exploitable bugs.

/be
0
Brendan
8/29/2007 7:11:40 PM
On Aug 29, 8:11 am, Benjamin Smedberg <benja...@smedbergs.us> wrote:
> mhammond wrote:
> > Yes, this will be a challenge.  I can't picture how dropping refcounts
> > would work in the general case with Python.  The XPCOM objects exposed
> > by Python can be made a GCObject - but I'm not sure how we would
> > integrate the rest of the Python universe - eg, assuming we have an
> > arbitrary number of Python objects holding pointers to xpcom objects,
> > I'm not sure how we would tell the GC about all such references - and
>
> Can't you just root them while python holds the external references?

Yes, but you'd leak any garbage cycles that include both Python and
XPCOM objects--they would be rooted.  That can be fixed, too, with
enough effort.  It's not simple.

-j

0
Jason
8/29/2007 7:17:08 PM
Brendan Eich wrote:

> Or an unstable API. It's what everyone else does. Stability arises
> through a "conversation" between producers and consumers, and over time
> increases, until there's a "big shift". This came up in face-to-face
> meetings, and we should hash it out.
> 
> New thread in .embedding, cross-posted here with followup-to: set?

Sure. The important point was not the stability or instability of the
embedding API, but that it is entirely decoupled from "XPCOMGC".

--BDS
0
Benjamin
8/29/2007 7:25:54 PM
Benjamin Smedberg wrote:
> Brendan may kill me for this, but I think that we can and should assert
> single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
> should only be on the main thread.

Jason addressed the request model fear. Main thread code can't block 
indefinitely for i/o already, so the only request suspend points that I 
can see right now are

* lengthy, non-GC-graph-mutating computations;
* file i/o that's "blocking, but fast", yet not fast enough for us to 
wish to stay in a request;
* required deadlock-with-the-GC avoidance not handled by the request 
model itself.

> There are necko classes which will want to be accessible via MMGC/XPCOM and
> also be internally threadsafe, but I believe that they can root themself on
> the main thread and perform all of their multi-thread networking using an
> internal threadsafe reference-counting and proxying scheme that is "not XPCOM".

More than Necko classes are at stake. We know of AllPeers and Songbird 
MT XPCOM usage, and I believe Joost too uses XPCOM with shared memory 
threads. I do not propose to make all such consumers of XPCOM rewrite 
their code for Mozilla 2, even though it could turn out that everyone 
agrees on doing that, for good wins in reasonable timeframe.

Proceeding incrementally, removing ref-counting from XPCOM and moving it 
to GC, seems a much better approach, since we don't know all the costs 
and benefits, and how they trade off for different platform clients.

>> I'll be working on the following bug quite soon, so now is the time to
>> speak up.
>>
>>   Bug 393034 - Allocate DOM objects using MMgc
>>   https://bugzilla.mozilla.org/show_bug.cgi?id=393034
> 
> Are we sure that we want to get into the "revamp XPCOM" game just to get
> going with fast-path DOM? The approach you mention in comment 14 approach C
> makes a lot of sense, without rewriting all of XPCOM:
> 
> 1) use MMGC for internal DOM references
> 2) keep using XPCOM for "external" references - XPCOM refs mean the object
> is rooted
> 3) teach XPConnect to use MMGC references for DOM objects

This is more work because it bridges two memory managers. It requires 
the cycle collector still. We should try to cut to the chase and move 
XPCOM to MMgc.

/be
0
Brendan
8/29/2007 7:26:38 PM
Benjamin Smedberg wrote:
> Brendan Eich wrote:
> 
>> Or an unstable API. It's what everyone else does. Stability arises
>> through a "conversation" between producers and consumers, and over time
>> increases, until there's a "big shift". This came up in face-to-face
>> meetings, and we should hash it out.
>>
>> New thread in .embedding, cross-posted here with followup-to: set?
> 
> Sure. The important point was not the stability or instability of the
> embedding API, but that it is entirely decoupled from "XPCOMGC".

Indeed, and I'm with you on that point. It's non-trivial with exact GC, 
since you end up requiring fat handles (roots or scannable thread-local 
helpers such as SpiderMonkey's JSTempValueRooters), and these can't be 
hidden even with C++ auto-storage-class automation.

Think of the JNI with its global and local (per-activation) roots, and 
the need to manage the latter when you create thousands of newborns and 
connect each as you go to the live object graph.

We don't want a JNI-like embedding API just to future-proof for exact 
GC. We might evolve MMgc toward a more exact mode of operation, but 
there's little motivation for that now. So we are probably committing to 
at least conservative stack scanning in our GC, by using simple 
embedding APIs.

I'm assuming the embedding APIs will involve pointers to GC-allocated 
things. Copying strings in and out can get expensive depending on the 
embedding. But this is fodder for the new thread.

/be
0
Brendan
8/29/2007 7:32:36 PM
Brendan Eich wrote:
> Benjamin Smedberg wrote:
>> Brendan may kill me for this, but I think that we can and should assert
>> single-threaded behavior for all of JS and "XPCOM/MMGC": that is, MMGC
>> should only be on the main thread.
> 
> Jason addressed the request model fear. Main thread code can't block 
> indefinitely for i/o already, so the only request suspend points that I 
> can see right now are
> 
> * lengthy, non-GC-graph-mutating computations;

These are UI starvation bugs to fix already, btw.

> * file i/o that's "blocking, but fast", yet not fast enough for us to 
> wish to stay in a request;

I'm thinking of local file i/o, but we do that non-blocking too, don't we?

> * required deadlock-with-the-GC avoidance not handled by the request 
> model itself.

This would be something like the cycle collector, for Java and C-Python 
if we care to avoid cross-heap leaks with those runtimes.

/be
0
Brendan
8/29/2007 7:37:05 PM
Brendan Eich wrote:
> Jonas Sicking wrote:
>> Related to the above, should we attempt to use incremental GC. From what
>> I understand this should be entierly possible with MMgc. However it 
>> requires that all pointers use special smart-pointers. Including 
>> pointers that are currently raw-pointers. This seems a little bit 
>> scary and easy to forget, but might be very nice for performance.
> 
> Raw pointers should be banned in incremental GC settings. We can use
> static analysis to enforce this.

Sorry to be unclear: the context here is heap-allocated data structures. 
You need a write barrier for any pointer in another GC-allocated struct.

The thread stack can be full of raw pointers, no problem. Conservative 
scanning means they (along with the odd float ;-) will be taken for 
strong refs, and mutation does not need to update card marks or colors 
since we don't GC the stack.

/be
0
Brendan
8/29/2007 7:39:36 PM
On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:
> >   * Add thread-safety to MMgc using the Spidermonkey request model.
>
> We currently don't pay the cost of thread-safety for most XPCOM objects,
> since most objects are main-thread-only.  Would doing this impose
> significant performance penalties for locking or atomic operations?  Or
> would most operations still be cheap?

The request model helps prevent two kinds of thread-unsafety:
  1) GC colliding with other threads doing stuff
  2) two threads touching an object at the same time

We only need it for item 1, which is cheap.

Individual classes may opt in for item 2.  JSObject does.  But most
XPCOM classes won't-- and so they will incur no cost.

Those following along at home can read up on the request model here:
  SpiderMonkey Internals: Thread Safety
  http://tinyurl.com/yt5rtr

-j

0
Jason
8/29/2007 9:35:21 PM
On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
> Jason Orendorff wrote:
> >   * Delete the cycle collector.
>
> Are we dropping the multi-language aspects of the platform (introduced
> in this milestone, at considerable effort in
> https://bugzilla.mozilla.org/show_bug.cgi?id=255942)?  Or is there a
> good way for python to use MMgc as well?

You're right, we must decide whether to keep this.  If so, I see
several options, none easy:

  - Create the opposite of cycle collector: code to walk CPython
    (refcounted) object graphs so that MMgc can see through them.
    CPython has a cycle collector API that would probably help:

     http://docs.python.org/api/supporting-cycle-detection.html

  - Patch CPython to use MMgc.  Somewhat scary due to the likelihood
    of Python code depending on destructors being called in order.

  - Create a library to facilitate interop among multiple
    language runtimes in a single process, with distributed garbage
    collection, etc.  Like SWIG, only much better.  Implement it for
    Tamarin, XPCOM, CPython, and Java.

If the last option sounds crazy, it should, but I'll go ahead and
point
out that we've done interop at least 3 times already (LiveConnect,
XPConnect, PyXPCOM), and we're about to do it 2 more times
(ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
needs Tamarin/XPCOM interop).  Maybe it's time to do it in a generic
form that other open source projects can use.

-j

0
Jason
8/29/2007 9:44:10 PM
Jason Orendorff wrote:
> On Aug 28, 7:46 pm, "L. David Baron" <dba...@dbaron.org> wrote:
>> Jason Orendorff wrote:
>>>   * Delete the cycle collector.
>> Are we dropping the multi-language aspects of the platform (introduced
>> in this milestone, at considerable effort in
>> https://bugzilla.mozilla.org/show_bug.cgi?id=255942)?  Or is there a
>> good way for python to use MMgc as well?
> 
> You're right, we must decide whether to keep this.  If so, I see
> several options, none easy:
> 
>   - Create the opposite of cycle collector: code to walk CPython
>     (refcounted) object graphs so that MMgc can see through them.
>     CPython has a cycle collector API that would probably help:

The opposite would also work, make the CPython cycle collector walk 
through the MMgc graph, like we currently make our own cycle collector 
walk through the JS graph.

/ Jonas
0
Jonas
8/29/2007 10:23:03 PM
Brendan Eich wrote:
> Jonas Sicking wrote:
>> Related to the above, should we attempt to use incremental GC. From what
>> I understand this should be entierly possible with MMgc. However it 
>> requires that all pointers use special smart-pointers. Including 
>> pointers that are currently raw-pointers. This seems a little bit 
>> scary and easy to forget, but might be very nice for performance.
> 
> Raw pointers should be banned in incremental GC settings. We can use
> static analysis to enforce this.
> 
> And anyway, we really do need to understand ownership at every edge in
> the graph. Right now, a raw pointer is a giant question mark that should
> raise alarms about either manual-over-refcounted leak bugs, or else
> manually-dropped-early or just plain-old-raw-weak-pointer and therefore
> dangling-pointer, exploitable bugs.

Many of our raw pointers exist solely to avoid cycles, like the 
nsNodeInfoManager::mDocument <-> nsDocument::mNodeInfoManager cycle 
where the first is a raw pointer but is nulled out when the nsDocument 
is deleted. In this case both pointers should use normal write barriered 
pointers.

In other cases we use raw pointers in order to store extra bits. For 
example nsINode::mParentPtrBits where we use the two lower bits to store 
data. Here I suspect we could probably create some sort of wrapper class 
that creates a write barrier, but still allows the two lower bits to be 
used.

Yet a third example is nsINode::mFlagsOrSlots which sometimes stores a 
bitfield and sometimes stores a pointer. This situation is approximately 
the same as the previous one, possibly with exception that the wrapper 
class needs to be able to return a bitfield in addition to a pointer.

/ Jonas
0
Jonas
8/29/2007 11:10:29 PM
On Aug 29, 11:26 am, "bren...@mozilla.org" <bren...@mozilla.org>
wrote:
> On Aug 28, 6:23 pm, "bren...@mozilla.org" <bren...@mozilla.org> wrote:
>
> > On Aug 28, 6:07 pm, mhammond <mhamm...@skippinet.com.au> wrote:
>
> > > The XPCOM objects exposed
> > > by Python can be made a GCObject - but I'm not sure how we would
> > > integrate the rest of the Python universe - eg, assuming we have an
> > > arbitrary number of Python objects holding pointers to xpcom objects,
>
> > MMgc is conservative. So long as Python allocates memory to be scanned
> > for pointers to MMgc allocations using MMgc's malloc wrapper, MMgc
> > will find these pointers.
>
> I should have written "using an appropriate malloc wrapper". Ideally
> only memory that might contain pointers to XPGC (heh) objects would be
> allocated with the kContainsPointer MMgc flag.
>
> that may be hard to hook into C-Python. Is it?

I believe it is very hard to hook it into a built Python.  It would be
much easier to hook it in at build time but IIUC, it would also
require that *all* Python extensions you wish to use are also rebuilt;
any prebuilt Python extensions you can find on the web would be
unusable.  My gut tells me that this would be unacceptable to people
using this platform with Python, but hopefully there are some lurkers
here who can throw their 2c in.

Another alternative I'm yet to investigate is that we hack on Python
to offer the ability to hook a memory allocator in at runtime before
Python is initialized.  The downside of this approach is the in the
short-term, we will not be able to work with a released version - it
would need Python 2.6 or later.

But even then, I have a concern regarding other languages - do we
really want to raise the bar for entry into the xpcom world to being
able to integrate with a garbage collection system?  It seems our long
terms goal is to get rid of xpcom in favour of the "one VM for all
languages" approach, so while xpcom remains alive it should keep doing
all it can to be inclusive of the languages able to be supported.

Mark

0
mhammond
8/30/2007 1:01:53 AM
Jason Orendorff wrote:

> All threads will be in a request all the time, except when doing
> blocking I/O or CPU-bound, non-GC-touching stuff.  You'll have to
> suspend the request before doing that kind of thing, and resume it
> afterwards.  I imagine we'll have a C++ object that knows how to do
> this.  (Like nsAutoLock. "nsAutoSuspendRequest", maybe.)
> 
> Finding those places would be the only hard part. But if you miss one,
> it should be *real* easy to spot and debug. Firefox will seem to hang.
> You'll attach a debugger, and all threads will be sitting in
> MMgc::waitForGC except for one, which will be blocked on DNS or
> compositing video buffers.

ok, you have me mostly convinced... let's proceed under the general
assumption that this is what we want to do. To accomplish this, we're going
to have a lot of different things going on:

The list of tasks to accomplish this is at least:

* Add the request model threadsafety to MMGc
* Give MMGc the ability to recognized "inner" pointers to objects
* Identify request start/end points in the codebase (blocking activity)
* Ensure (how?) that existing locking mechanisms for threadsafe code won't
deadlock with GC
* Rewrite XPCOM addref/release handling
** Remove or stub out getter_AddRefs, already_AddRefed, and other helper classes
** Make member-comptrs call/be DWB
** Make stack-comptrs raw pointers
** Fix some COM-holding utility classes
*** nsCOMArray
*** nsInterfaceHashKey
*** nsInterfaceHashtable
** Identify XPCOM weakrefs that can be GCRefs
** Rewrite the other XPCOM weak-references into GCWeakRefs

Other random notes/questions:

What are the rules for objects with finalizers? Is the finalize method
allowed to touch other objects? Presumably these objects may have already
been finalized, right (or else you'd end up with finalization cycles)?

Right now many objects are going to have to be finalized, because they
contain string members or do real work in their constructor. We should
discuss the pros/cons of making strings GCthings, or even sharing the
tamarin string type with XPCOM. We should also automatically identify
destructors that do "real" work to see if we can remove that work, or if the
work is even safe to do when the target object may have already been finalized.

Because the main thread is always non-blocking by design, it would naturally
never exit its request. This is probably ok as long as we force GC to always
take place on the main thread. If GC gets triggered on a worker thread, that
thread would block forever. Alternately we could exit/reenter the request
every time we process the main event queue.

--BDS
0
Benjamin
8/30/2007 1:41:58 PM
Benjamin Smedberg wrote:
> 
> The list of tasks to accomplish this is at least:

Commenting here on things that concern automation.
> 
> * Add the request model threadsafety to MMGc
> * Give MMGc the ability to recognized "inner" pointers to objects
> * Identify request start/end points in the codebase (blocking activity)
> * Ensure (how?) that existing locking mechanisms for threadsafe code won't
> deadlock with GC

I don't see better way to do this other than experimentation. Once we 
discover bugs and unsafe usage patterns we can start thinking about 
hunting those down using static analysis tools.

> * Rewrite XPCOM addref/release handling
> ** Remove or stub out getter_AddRefs, already_AddRefed, and other helper classes
> ** Make member-comptrs call/be DWB
Can rewrite these. Only question is, do these have to be macros? They 
make rewriting things later more painful than needed.

> ** Make stack-comptrs raw pointers
Can detect those easily enough with static analysis.

> ** Fix some COM-holding utility classes
> *** nsCOMArray

We should probably be switching away from moz-specific containers to stl 
ones. I could probably rewrite these.

> *** nsInterfaceHashKey
> *** nsInterfaceHashtable
> ** Identify XPCOM weakrefs that can be GCRefs
This can't be done completely automatically. Once we identify a common 
pattern to look for, then automation can be considered.

> ** Rewrite the other XPCOM weak-references into GCWeakRefs
I'm not sure if the concepts map directly. Shouldn't most weak 
references become gc-managed pointers?

Taras
0
Taras
8/30/2007 7:02:06 PM
Jason Orendorff wrote:

>   - Create a library to facilitate interop among multiple
>     language runtimes in a single process, with distributed garbage
>     collection, etc.  Like SWIG, only much better.  Implement it for
>     Tamarin, XPCOM, CPython, and Java.
> 
> If the last option sounds crazy, it should, but I'll go ahead and
> point
> out that we've done interop at least 3 times already (LiveConnect,
> XPConnect, PyXPCOM), and we're about to do it 2 more times
> (ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
> needs Tamarin/XPCOM interop).  Maybe it's time to do it in a generic
> form that other open source projects can use.

Sorry, just catching up on this thread (and trying to keep it concrete).

Let's untangle "interop" into 3 categories: memory management, 
concurrency and calling. And then untangle each of those categories into 
two concrete sub-headings: semantics and runtime support library.

Here's my current picture of "multiple language interop" with some 
values filled in. Feel free to disagree:

- Memory management
   - semantics: conservative GC + refcount/finalize API
   - runtime support library: mmGC

- Concurrency
   - semantics: ?? something like request model ??
   - runtime support library: ?? JSAPI + moz event queue ??

- Calling
   - semantics: XPIDL type system (~ subset related to MSCOM)
   - runtime support library: tamarin JIT + typelibs

So ... I'm considering what we're doing here to be about "dynamic-izing 
the XPCOM/MSCOM/C++ side" not "static-izing the existing dynamic 
language runtimes".

Dynamic language runtimes can already synthesize their own object 
proxies by inspecting typelibs, and can already walk their own object 
graphs. Many -- perhaps most? -- do user-level "threading" via a central 
event queue. So we may need to ask them for these services, but every 
dynamic language runtime has an API to them. Assuming many dynamic 
language runtimes *need* long term integration with us. Some dynamic 
languages may just give up and just write compilers to ABC bytecode. 
*Cough* JS *cough*. All the better.

C++ does need help, but we *have* the libraries we're intending to use. 
The tamarin JIT and mmGC let us perform very dynamic calling and memory 
management tasks on C++, quite generally, at a sub-language, 
machine-code and memory-address level. That's the whole point. IMO 
there's no need to invent other low-level libraries or additional 
inter-language semantics here.

(To clarify: I'm assuming the "replacement" for XPConnect will drive the 
Tamarin JIT via reflection on typelibs or XPIDL type representations 
bundled in ABC, and xptcall will vanish. Correct me if this is not the 
plan.)

-Graydon
0
Graydon
8/30/2007 8:19:09 PM
On Aug 30, 9:44 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
>   - Create a library to facilitate interop among multiple
>     language runtimes in a single process, with distributed garbage
>     collection, etc.  Like SWIG, only much better.  Implement it for
>     Tamarin, XPCOM, CPython, and Java.
>
> If the last option sounds crazy, it should, but I'll go ahead and
> point
> out that we've done interop at least 3 times already (LiveConnect,
> XPConnect, PyXPCOM), and we're about to do it 2 more times
> (ScreamingMonkey needs Tamarin/MSCOM interop; ActionMonkey
> needs Tamarin/XPCOM interop).  Maybe it's time to do it in a generic
> form that other open source projects can use.

It sounds slightly crazy but I think it's doable. We had a project
related to this at IBM --- Parley, that Brendan mentioned --- although
it didn't get anywhere, partly because I left. The basic idea was to
add a distributed mark and sweep phase that multiple runtimes can plug
into. Even generational and copying collectors can participate; VMs
that do generational collection need to temporarily root references
that escape the VM, and copying collectors need to add wrappers that
don't move, or support pinning (which I believe most real copying
collectors do).

Rob

0
Robert
9/4/2007 4:34:32 AM
On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
>   * Add thread-safety to MMgc using the Spidermonkey request model.

Is this compatible with incremental marking? I don't know the details
of MMgc incremental marking but I fear the complexity, and overhead,
of safe concurrent incremental marking. I would hate to eat a
complexity or performance hit for thread-safe memory management that
is very little used. We've already been down that road with thread-
safe XPConnect.

I suppose it might be possible to perform incremental marking on the
main thread only, avoiding the overhead of concurrent marking.

Also, how about MMgc's reference counted objects, would you make those
thread-safe too? That sounds like another performance hit.

Rob

0
Robert
9/4/2007 4:43:32 AM
On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
>   * Drop AddRef and Release from nsISupports.
>
>   * Require all XPCOM objects to be MMgc GCObjects or
> GCFinalizedObjects,
>     allocated from the same GC allocator as all JavaScript objects.

There are some objects, notably nsIFrame and subclasses, that inherit
from nsISupports but aren't actually refcounted. As a pre-step we
would want to stop them inheriting from nsISupports, and have them
inherit from something with QueryInterface only (or better, just get
rid of all uses of QueryInterface on frames).

Rob

0
Robert
9/4/2007 4:46:56 AM
On Sep 4, 12:43 am, Robert O'Callahan <rocalla...@gmail.com> wrote:
> On Aug 29, 8:32 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> >   * Add thread-safety to MMgc using the Spidermonkey request model.
>
> Is this compatible with incremental marking? I don't know the details
> of MMgc incremental marking but I fear the complexity, and overhead,
> of safe concurrent incremental marking. I would hate to eat a
> complexity or performance hit for thread-safe memory management that
> is very little used.

Here's the post to read:

  https://mail.mozilla.org/pipermail/tamarin-devel/2007-August/000017.html
  (under "Maybe incremental is not so bad")

Here are the costs:

* Incremental marking must happen under a global lock.  No other code
can be touching GC-managed objects while this happens, just the same
as for non-incremental GC.

* There's a synchronization cost per call to GC::IncrementalMark,
probably negligible in the scheme of things.

* There's an additional cost per "write boundary hit".  This happens
when you assign a pointer to a "white" (unmarked, unqueued) object to
a field of a "black" (already marked) object.  The white object has to
be queued.  This should be relatively rare.  The cost, when it
happens, is that you have to atomically test-and-set a bit
(OSAtomicTestAndSet on Mac), which shouldn't be so horrible.

> Also, how about MMgc's reference counted objects, would you make those
> thread-safe too? That sounds like another performance hit.

How about this: split MMgc::RCObject into two classes,
MMgc::ThreadSafeRCObject and MMgc::SingleThreadRCObject.  Choose one
or the other on a per-class basis.  This is like what XPCOM
programmers already do.  I haven't thought this through thoroughly,
though.  I see (theoretical) performance costs even for SingleThread
objects, but not per-refcount and probably acceptable.

The bigger problem with MMgc deferred reference counting (DRC) is how
to expose it to users.  COM's AddRef/Release contract may be annoying,
but at least it's simple.  Supporting both DRC and straight-up GC
means supporting at least 2 totally different memory-management
contracts, on a per-interface or per-object basis.  It reminds me of
the proliferation of open source licenses.  How to do this without
burdening users is an open question.  Suggestions welcome-- DRC is
good stuff (and as Tamarin uses it for everything, it's probably
unavoidable).

-j

0
Jason
9/4/2007 3:43:55 PM
On Sep 5, 3:43 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:
> The bigger problem with MMgc deferred reference counting (DRC) is how
> to expose it to users.  COM's AddRef/Release contract may be annoying,
> but at least it's simple.  Supporting both DRC and straight-up GC
> means supporting at least 2 totally different memory-management
> contracts, on a per-interface or per-object basis.  It reminds me of
> the proliferation of open source licenses.  How to do this without
> burdening users is an open question.  Suggestions welcome-- DRC is
> good stuff (and as Tamarin uses it for everything, it's probably
> unavoidable).

What's the impact of using DRC for everything in our own code?

BTW is there a document somewhere that summarizes the run-time costs
of inheriting from GCObject etc, or should I just look at the source?

Rob

0
Robert
9/4/2007 11:47:18 PM
On Aug 30, 4:19 pm, Graydon Hoare <gray...@mozilla.com> wrote:
> Dynamic language runtimes can already synthesize their own object
> proxies by inspecting typelibs, and can already walk their own object
> graphs. Many -- perhaps most? -- do user-level "threading" via a central
> event queue. So we may need to ask them for these services, but every
> dynamic language runtime has an API to them. Assuming many dynamic
> language runtimes *need* long term integration with us. [...]

I don't think this recognizes the amount of work involved in something
like PyXPCOM.

It seems to me our choices are:
  1) don't try to support multiple language bindings
  2) support them in a common way, with common
     library code, in a form that might solve
     problems for other people too (and thus attract
     contributors)
  3) support them as we do PyXPCOM and
     LiveConnect now, i.e. by ourselves, separately,
     and not especially well

I consider #1 a strong option.  #2 is pretty nutty.  I brought it up
for two reasons.  First, #3 is a lot like #2, only without pooling any
effort or designing for pluggability.  Under #3, lesser-used languages
(like Python) are unavoidably second-class citizens.  They'll never
reflect XPCOM with high fidelity unless someone spends an unlikely
amount of effort on it.  My impression is that PyXPCOM is already
lagging and unlikely to catch up, much less keep pace with coming
changes.

Second, I was writing under a hypothetical that assumed PyXPCOM is
something we want to keep and maintain.

> Some dynamic
> languages may just give up and just write compilers to ABC bytecode.
> *Cough* JS *cough*. All the better.

This also seems to underestimate effort.  A compiler alone doesn't get
you halfway to, say, Jython.

I'm not convinced we want multi-language support.  It's real expensive
and not very useful.

-j

0
Jason
9/5/2007 2:10:10 PM
Robert O'Callahan wrote:

> What's the impact of using DRC for everything in our own code?

I'll summary what I learned from Jason on IRC:

* RCObject is an early-collection optimization: when an RCObject refcount
goes to zero, it is placed in the ZCT (zero-count table)... at some frequent
interval, MMGc scans the stack to make sure there are no stack pointers to
ZCT objects and then collects them. This collection is shallow and fast.

* Tamarin uses RCObject for all JS objects, including strings.

The costs of RCObject:

* the objects keep an extra int member
* if threadsafety is needed, we need atomic increment/decrement
* cycles between RCObjects or any references to RCObjects from GCObjects
will not be collected until a "standard" GC
* Any object that holds a reference to an RCObject (a DRCWB) has to have a
finalizer

From reading code, I've also gleaned that the current DRCWB system assumes
that you have a toplevel pointer, not an internal pointer like we normally
keep in XPCOM. To work around this you either have to
dynamic_cast<RCOBject*>, which requires RTTI and may not be cheap (needs
measurement), or keeping virtual functions, which means AddRef/Releease.

I tend to think that the costs of RCObject for general "XPCOMGC" use are too
high: in particular, I think we want to have pervasive cycles between DOM
objects: parent<->children DOM nodes as well as node<->document references.

--BDS
0
Benjamin
9/5/2007 3:53:28 PM
On Sep 4, 7:47 pm, Robert O'Callahan <rocalla...@gmail.com> wrote:
> What's the impact of using DRC for everything in our own code?

I think we would have to keep AddRef and Release as virtual functions,
and we would have to keep the hack in DOM where child nodes don't hold
real references to their siblings or parents.  We would keep all the
reference-counting scaffolding we have now; it would just be
backstopped by MMgc instead of the cycle collector.

A lot of this pain is because multiple inheritance and DRC don't mix
very well, as Benjamin pointed out.

Benjamin also thinks DRC is, in fact, avoidable in XPCOM-- we'll use a
GCObject wrapper when passing DRC script objects to XPCOM code.

> BTW is there a document somewhere that summarizes the run-time costs
> of inheriting from GCObject etc, or should I just look at the source?

Look at the source, probably.

-j

0
Jason
9/5/2007 8:33:00 PM
> I tend to think that the costs of RCObject for general "XPCOMGC" use are too
> high: in particular, I think we want to have pervasive cycles between DOM
> objects: parent<->children DOM nodes as well as node<->document references.

I agree. I don't see that much added benefit in using RCObjects rather 
than just GCObjects. The only win is earlier destruction of objects, at 
the cost of performance overhead and complexity.

It does worry me a little though that if we make all XPCOM objects 
GCObjects, we won't destroy any XPCOM objects until the first GC. It 
would be good to create a testbuild that doesn't destroy any XPCOM 
objects and see how much memory such a build uses just to start up the 
browser. During startup I don't think we currently do a GC, and we 
probably don't want to for performance reasons.

/ Jonas
0
Jonas
9/5/2007 10:45:34 PM
On Sep 6, 12:10 am, Jason Orendorff <jason.orendo...@gmail.com> wrote:

> It seems to me our choices are:
>   1) don't try to support multiple language bindings
>   2) support them in a common way, with common
>      library code, in a form that might solve
>      problems for other people too (and thus attract
>      contributors)
>   3) support them as we do PyXPCOM and
>      LiveConnect now, i.e. by ourselves, separately,
>      and not especially well
>
> I consider #1 a strong option.  #2 is pretty nutty.  I brought it up
> for two reasons.  First, #3 is a lot like #2, only without pooling any
> effort or designing for pluggability.  Under #3, lesser-used languages
> (like Python) are unavoidably second-class citizens.  They'll never
> reflect XPCOM with high fidelity unless someone spends an unlikely
> amount of effort on it.  My impression is that PyXPCOM is already
> lagging and unlikely to catch up, much less keep pace with coming
> changes.

I think it would be a step backwards for the platform to drop external
languages.  Many people who adopt the platform choose to use Python
for valid reasons - such projects include ActiveState's Komodo and the
OLPC project.  In both cases, the ability to use Python was crucial to
the choice to use the platform - indeed, in ActiveState's case, they
felt so strongly about using Python that they funded the creation of
PyXPCOM.  Still today, for any non-trivial code they still use Python,
even if that means writing a little "shim" in Javascript to enable
that.

Also, pyxpcom is not lagging from an xpcom POV.  xpcom itself is not
undergoing many changes, so it is keeping up fine.  What is *not*
happening is decent integration into the non XPCOM world.  Alot of the
DOM, for example, is exposed in a way that is JS specific.  Work to
integrate JS and other languages so, for example, 'expando' objects
can be accessed in different languages is lagging.  Python does *not*
have access to parts of the platform that have been de-comtaminated,
or implemented using anything other than xpcom.  So I would argue that
it is not pyxpcom that is lagging, but instead the platform itself is
trying to steam away from xpcom, and in the process also steaming away
from the integration opportunities xpcom has already demonstrated.  I
understand xpcom has a number of issues, but I fear that some of the
proposed solutions risk throwing out the baby with the bath water.

As you are, I'm also slightly skeptical that "insisting" that
languages which want to play in our new playground be reimplemented on
a new virtual machine will be fruitful.  Even if such an
implementation of Python was trivial to put together, I don't believe
it would keep those existing Python based projects happy.  People
choose to use Python inside the mozilla architecture both for the
language, and for the library.  In the same way that any non-trivial
Python program can't run the same on CPython and IronPython (ie,
the .NET port), it will not be a simple matter of swapping out the
language implementation and still expecting existing Python code to
run.

> I'm not convinced we want multi-language support.  It's real expensive
> and not very useful.

I think that we should try and remember why we wanted to open up the
existing architecture to external languages in the first place, and
see if those reasons are still valid.  I can't see why they are not,
but if we really do want to scale back the scope of this as a general
purpose "application platform", then I agree it would make our lives
much easier.  But is this really all about making our lives easy? ;)

Cheers,

Mark

0
mhammond
9/5/2007 11:35:49 PM
> BTW is there a document somewhere that summarizes the run-time costs
> of inheriting from GCObject etc, or should I just look at the source?

GCObject has an inlined empty constructor, and no destructor. 
GCFinalizeableObject does have a virtual empty destructor. So the cost 
is nothing. What does cost though is that allocation is now done through 
MMgc functions that are probably slightly slower than simply malloc/free is.

Additionally these functions null out the area before returning it, 
though I'm not entirely sure why it does this, but a guess is that this 
way it's less likely that the conservative GC will find bogus edges.

/ Jonas
0
Jonas
9/6/2007 12:36:31 AM
Jonas Sicking wrote:
> GCObject has an inlined empty constructor, and no destructor. 
> GCFinalizeableObject does have a virtual empty destructor. So the cost 
> is nothing. What does cost though is that allocation is now done through 
> MMgc functions that are probably slightly slower than simply malloc/free 
> is.

I believe this probably depends on how the threadsafety discussion works 
itself out. As I understand things how, malloc is very expensive because 
it's the system malloc and must be threadsafe. If MMgc doesn't have to 
lock around each malloc call, then I think it's very possible that it'll 
be as fast or faster than the system malloc.
-- 
Blake Kaplan
0
Blake
9/6/2007 4:30:15 AM
On Sep 6, 4:30 pm, Blake Kaplan <mrb...@gmail.com> wrote:
> Jonas Sicking wrote:
> I believe this probably depends on how the threadsafety discussion works
> itself out. As I understand things how, malloc is very expensive because
> it's the system malloc and must be threadsafe.

The system malloc might suck, but there are plenty of malloc
implementations that use per-thread allocation pools.

Rob


0
Robert
9/6/2007 9:11:55 AM
On Sep 6, 11:35 am, mhammond <mhamm...@skippinet.com.au> wrote:
> As you are, I'm also slightly skeptical that "insisting" that
> languages which want to play in our new playground be reimplemented on
> a new virtual machine will be fruitful.

Me too. But there's a less intrusive option, which is to ask their VM
to participate in a distributed mark and sweep algorithm using a
common interface. This can be done without constraining the
representation of the VM's objects.

I understand that's still a major requirement, especially since the
interface doesn't exist yet and when it does exist VMs will have to be
retrofitted with it in sensitive areas of their code. But I don't see
any possibility of collecting cycles across VM boundaries unless the
VMs participate in some kind of global tracing algorithm.

Rob

0
Robert
9/6/2007 9:17:40 AM
Jonas Sicking wrote:

> It does worry me a little though that if we make all XPCOM objects
> GCObjects, we won't destroy any XPCOM objects until the first GC. It
> would be good to create a testbuild that doesn't destroy any XPCOM
> objects and see how much memory such a build uses just to start up the
> browser. During startup I don't think we currently do a GC, and we
> probably don't want to for performance reasons.

Or at least instrument how many and what kind of bjects are *deleted* during
a startup run up to some arbitrary point (the beginning of the main event
loop, perhaps).

Of course if we're deleting lots of objects during startup, we should
probably examine why we were allocating those objects in the first place.

--BDS
0
Benjamin
9/6/2007 12:57:48 PM
On Sep 6, 7:17 pm, Robert O'Callahan <rocalla...@gmail.com> wrote:
> On Sep 6, 11:35 am, mhammond <mhamm...@skippinet.com.au> wrote:
>
> > As you are, I'm also slightly skeptical that "insisting" that
> > languages which want to play in our new playground be reimplemented on
> > a new virtual machine will be fruitful.
>
> Me too. But there's a less intrusive option, which is to ask their VM
> to participate in a distributed mark and sweep algorithm using a
> common interface. This can be done without constraining the
> representation of the VM's objects.
>
> I understand that's still a major requirement, especially since the
> interface doesn't exist yet and when it does exist VMs will have to be
> retrofitted with it in sensitive areas of their code. But I don't see
> any possibility of collecting cycles across VM boundaries unless the
> VMs participate in some kind of global tracing algorithm.

That would be reasonable assuming the *only* problem we see with cross-
language xpcom is collecting cycles - but it seems to me that this
thread has identified a number of other issues too - for example,
there was discussion of dropping AddRef and Release and moving to
assuming MMgc or similar is the memory manager.  Such issues go beyond
simply integrating with a cycle collection detector (and bring us
right back to the start of this thread :)

Cheers,

Mark

0
mhammond
9/7/2007 12:17:25 AM
On Sep 7, 12:17 pm, mhammond <mhamm...@skippinet.com.au> wrote:
> That would be reasonable assuming the *only* problem we see with cross-
> language xpcom is collecting cycles - but it seems to me that this
> thread has identified a number of other issues too - for example,
> there was discussion of dropping AddRef and Release and moving to
> assuming MMgc or similar is the memory manager.  Such issues go beyond
> simply integrating with a cycle collection detector (and bring us
> right back to the start of this thread :)

The scheme I'm suggesting would eliminate the need for reference
counting, without forcing everyone to use a common memory manager.

The basic idea is to have all VMs participate in a global mark-and-
sweep collection, by plugging them into a common API so that one VM
can mark objects managed by another VM. Objects that might be
referenced by foreign VMs can only be collected during this global GC.

This approach can collect cycles but it's not just a cycle detector.

Rob

0
Robert
9/7/2007 9:32:27 AM
[Reposted after failing to post to m.d.t.xpcom because some small group 
decided to remove that group(!) while there were live threads ongoing in 
it. /be]

Benjamin Smedberg wrote:
> Robert O'Callahan wrote:
> 
>> What's the impact of using DRC for everything in our own code?
> 
> I'll summary what I learned from Jason on IRC:
> 
> * RCObject is an early-collection optimization: when an RCObject refcount
> goes to zero, it is placed in the ZCT (zero-count table)...

This is not quite right -- see e.g.

http://hg.mozilla.org/tamarin-central/?file/6caba57aa429/MMgc/GCObject.h

line 150. Newborn RCObjects go in the ZCT, and stay there until either
the RCObject is explicitly destroyed or the ZCT fills up and the by-then
unreachable RCObject is reaped.

A newborn RCObject's refcount goes above "zero" (see RefCount method
inline in RCObject) and it goes out of the ZCT only if a reference to
that object is stored in another object or a heap-located root.

Thus DRC optimizes for LIFO-allocated objects referenced only from the
stack (e.g., AS3 strings and other temporaries).

It is true (see RCObject::DecrementRef) that an RCObject, once it has
gone from Deferred to Prompt Reference Counting, goes back into the ZCT
when its ref-count goes to 0.

> at some frequent
> interval, MMGc scans the stack to make sure there are no stack pointers to
> ZCT objects and then collects them.

Stack pointers are ok -- RCObjects so referenced are temporarily pinned.
Any RCObjects not referenced by the stack and not Promptly RC'ed (i.e.,
possibly referenced from the heap) are collected.

> This collection is shallow and fast.

ZCT::Reap in GC.cpp.

> * Tamarin uses RCObject for all JS objects, including strings.
> 
> The costs of RCObject:
> 
> * the objects keep an extra int member
> * if threadsafety is needed, we need atomic increment/decrement

More, you need atomic operations on the |composite| member of RCObject,
not just the ++ and -- ops.

> * cycles between RCObjects or any references to RCObjects from GCObjects
> will not be collected until a "standard" GC

That's not a cost of RCObject, since cycles won't be collected among any
GCObject (RCObject is a subtype of GCObject) until a "standard" GC.
Rather, you lose the Deferred benefit of RCObject if the newborn or even
old but until-now stack-only referenced RCObject goes through a write
barrier and is referenced from the heap.

> * Any object that holds a reference to an RCObject (a DRCWB) has to have a
> finalizer

Right, RCObject <: GCFinalizedObject.

> From reading code, I've also gleaned that the current DRCWB system assumes
> that you have a toplevel pointer, not an internal pointer like we normally
> keep in XPCOM.\

MMgc does not deal with interior pointers in general. All it does is
ignore the low three bits.

We will need to address this lack of conservatism. It should not make
for more false positives if we classify memory well.

> To work around this you either have to
> dynamic_cast<RCOBject*>, which requires RTTI and may not be cheap (needs
> measurement), or keeping virtual functions, which means AddRef/Releease.

Let's talk to Tom Reilly about teaching MMgc about interior pointer
scanning, instead.

> I tend to think that the costs of RCObject for general "XPCOMGC" use are too
> high: in particular, I think we want to have pervasive cycles between DOM
> objects: parent<->children DOM nodes as well as node<->document references.

Yeah, although I wonder why Tamarin uses RCObject as the base of
AvmPlusScriptableObject. Cc'ing tamarin-devel.

/be
0
Brendan
9/11/2007 12:16:05 AM
Reply:

Similar Artilces:

The future of XPCOM memory management (restarting thread from m.d.t.xpcom)
Jason wrote: > Benjamin also thinks DRC is, in fact, avoidable in XPCOM-- we'll use a > GCObject wrapper when passing DRC script objects to XPCOM code. This would have the effect of delaying the deallocation of any such DRC script object until at least the next GC. If we're willing to do that, then there's an easier way: we can just set the DRC object's refcount to its saturated value. No wrapper required. Rob I posted on my blog about the general problem of collecting cycles across memory-manager boundaries, and wrote in more detail about what I think is...

superreview granted: [Bug 267767] Make XPCOM memory management functions frozen exports : [Attachment 165472] Export memory-management functions, rev. 3.1
Mike Shaver <shaver@mozilla.org> has granted Benjamin Smedberg <bsmedberg@covad.net>'s request for superreview: Bug 267767: Make XPCOM memory management functions frozen exports https://bugzilla.mozilla.org/show_bug.cgi?id=267767 Attachment 165472: Export memory-management functions, rev. 3.1 https://bugzilla.mozilla.org/attachment.cgi?id=165472&action=edit ------- Additional Comments from Mike Shaver <shaver@mozilla.org> Looks good to me, but I'd like to make NS_Alloc safe to call with a zero size, like malloc(3) is defined to be. Just bump it to 1 i...

superreview requested: [Bug 267767] Make XPCOM memory management functions frozen exports : [Attachment 165472] Export memory-management functions, rev. 3.1
Benjamin Smedberg <bsmedberg@covad.net> has asked Mike Shaver <shaver@mozilla.org> for superreview: Bug 267767: Make XPCOM memory management functions frozen exports https://bugzilla.mozilla.org/show_bug.cgi?id=267767 Attachment 165472: Export memory-management functions, rev. 3.1 https://bugzilla.mozilla.org/attachment.cgi?id=165472&action=edit ------- Additional Comments from Benjamin Smedberg <bsmedberg@covad.net> /me grins evilly in shaver's direction ...

Purpose of dev-tech-xpcom
What's the purpose of this list? I used to think it was about XPCOM itself, basically for questions about the code in xpcom/**. However, that doesn't seem to be how everyone else uses it. In fact, it seems to be about all coding questions that are somewhat Firefox- or XULRunner-related. Basically what the platform or extension lists are supposed to be. Should we, therefore, delete this list/newsgroup? As it is used, it is redundant. -christian -- All the world's a stage, And all the men and women merely players: They have their exits and their entrances...

XPCOM that use another XPCOM
Hi all, I have the following problem: I want to create a standalone application that uses an XPCOM module "A" written in C++ (a library). I want to create plugins for this application (.xpi package) that install another XPCOM module "B" that use the A XPCom library. I think I doesn't need to cross the interface and to write javascript code. How can I do this in a cross-platform manner? Does mozilla support the dynamic linking? Does it exist some pages of documentation about it? Thank you for your help, Lorenzo On 23 Mar, 14:48, Lorenzo <nos...@pl...

The future of XPCOM
In Mozilla 2, we have an opportunity to make major, non-backwards- compatible changes to XPCOM. What do we actually want to do here? bsmedberg wrote about this last December. BSBlog: Improving XPCOM for Mozilla 2 http://benjamin.smedbergs.us/blog/2006-12-22/improving-xpcom-for-mozilla-2/ That makes as good a launch point as any. It's time to jump-start this discussion, because the window for changes is just opening, and it won't be open forever. XPCOM is big, so it might be a good idea to start sub-discussions in separate threads. For example, I'm going to s...

XPCOM
Name: Jonny tango Email: jonnytangoattescodotnet Product: Firefox Summary: XPCOM Comments: have you fixed XPCOM problem with this update? Browser Details: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 ...

xpcom
since xulplanet is no more, i miss the reference of the xpcom mail. didn't find it at mozdev neither. jef peeraer ...

XPCOM
Hi, I'm trying to build mozzilla in order to see a good example application of XPCOM. I read somewhere (http://www.ibm.com/ developerworks/webservices/library/co-xpcom3.html) that this was a good way to get a working version. However, that website seemed to be out of date. I would like to get some sort of "hello world" type of application of XPCOM on which I could conduct further experiements. Any suggestions about how I should get started? In trying to build an example app., I've been trying to build firefox following the directions at http://developer....

XPCOM
Hi, I'm trying to build mozzilla in order to see a good example application of XPCOM. I read somewhere (http://www.ibm.com/ developerworks/webservices/library/co-xpcom3.html) that this was a good way to get a working version. However, that website seemed to be out of date. I would like to get some sort of "hello world" type of application of XPCOM on which I could conduct further experiements. Any suggestions about how I should get started? In trying to build an example app., I've been trying to build firefox following the directions at http://develope...

xpcom
Hi, I posted this earlier and got no answers. Anyone care to drop a couple of lines on the subject? Hi, I have just started reading the documentation. I am just wondering about xpcom. Is it just a historical thing? i.e., if one is to write a new portable app, given how close java is performing compared to the native binary wouldn't it make sense to do it in java for simplicity and elegance? Surely modularization can be achieved in java or for that matter c#. Microsoft seems to be moving away from COM into managed code. Any thoughts? Thanks, Moheb moheb missaghi w...

XPCOM
I keep finding links for NS_StringToUTF16, NS_StringGetMutableData ? do such a functions exits ? Also found this http://developer.mozilla.org/en/docs/nsCStringEncoding but don't know where it should go Regards pete Pete Morgan wrote: > NS_StringToUTF16, NS_StringGetMutableData > > ? do such a functions exits ? NS_StringGetMutableData exists. NS_StringToUTF16 doesn't (the NS_String* functions all refer to strings that are in the UTF-16 encoding already), but NS_CStringToUTF16 does. > Also found this > http://developer.mozilla.org/en/doc...

XPCOM
Running Mozilla 1.1 and all is basically well, except for the occasional hang on shutdown. I get a message that XPCOM is not responding etc....I know it plays a part in Mozilla, but what does it do? I'd like to fix this, although its not a major issue. Thanks, XPCOM is a lightweight cross platform COM work-alike. It provides interfaces, factories, reference counting, QueryInterface based 'casting', auto pointers, and various other useful stuff. It is the foundation for modularity in mozilla You can get the source code here: http://lxr.mozilla.org/mozilla/source/xpco...

XPCOM how to integrate a XPCOM of thunderbird in firefox
Hello everybody! Well I have a question about the XPCOM. I want to developp a toolbar for firefox, which notify the arrival of new messages in my webmail account. This Webmail is a imap server. So to do that, i tried to download for mozilla source to get the composants wich allow to connect with imap protocol. When i downloaded it, i compiled all sources. Finally, i have many files : *.idl, *.h, *.xpt, so now i want to integrate them to firefox to use it. And i don't know how to do that? to be more precise i don't know what is the best solution to do this application .....

Web resources about - The future of XPCOM memory management - mozilla.dev.tech.xpcom

Management - Wikipedia, the free encyclopedia
Management in business and organizations is the function that coordinates the efforts of people to accomplish goals and objectives using available ...

Management - Wikipedia, the free encyclopedia
Management in businesses and organizations is the function that coordinates the efforts of people to accomplish goals and objectives by using ...

Management - Wikipedia, the free encyclopedia
Management in business and organizations is an art that coordinates the efforts of people to accomplish goals and objectives using available ...

Management - Wikipedia, the free encyclopedia
Management in business and organizations is the function that coordinates the efforts of people to accomplish goals and objectives using available ...

Victorian bushfires: Wye River blaze management holds safety lessons for other towns
Victorians have just witnessed again the terrifying speed with which a beautiful environment can be transformed by bushfire.

Develop A Management Culture
To manage your law office best, make management part of your law office culture.

TDM Asset Management reveals the importance of culture for investing
By investing on its own terms, TDM Asset Management is challenging the norms of funds management.

The future of data center infrastructure management
Recent research from Intel suggests that, despite the availability of automated solutions for data center infrastructure management (DCIM), many ...

5 courses that’ll boost your project management career for $39
Whether you're just starting out or are looking to change careers, there are certain certifications that hiring managers love to see on a resume ...

Glendale receives trio of bids for Gila River Arena management - Phoenix Business Journal
The city will disclose the names of bidders it is considering early next month.

Resources last updated: 12/31/2015 5:30:14 PM