Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to you. 
In subsequent editions, I'll give updates on progress that we've made, and 
areas that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked aspects of 
Project Fission (also known as Site Isolation), but is absolutely critical to 
its success. And will require a company- and community-wide effort effort to 
meet its goals.

The problem is thus: In order for site isolation to work, we need to be able 
to run *at least* 100 content processes in an average Firefox session. Each of 
those processes has its own base memory overhead—memory we use just for 
creating the process, regardless of what's running in it. In the post-Fission 
world, that overhead needs to be less than 10MB per process in order to keep the 
extra overhead from Fission below 1GB. Right now, on our best-cast platform, 
Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover between 25 
and 35MB. In other words, between 2 and 3.5GB for an ordinary session.

That means that, in the best case, we need to reduce the memory we use in 
content processes by *at least* 7MB. The problem, of course, is that there are 
only so many places we can cut memory without losing functionality, and even 
fewer places where we can make big wins. But, there are lots of places we can 
make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a 
certain amount of overhead, here are the number of each that we need to fix in 
order to reach 1MB:

 250KB:   4
 100KB:  10
 75KB:   13
 50KB:   20
 20KB:   50
 10KB:  100
 5KB:   200

Now remember: we need to do *all* of these in order to reach our goal. It's 
not a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB *and* 
200 5KB improvements. There just aren't enough places we can cut 250KB. If we 
fall short in any of those areas, Project Fission will fail, and Firefox will be 
the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally 
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of 
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
eliminated. There just aren't that many large chunks to remove. They all need 
to go. And if an area of code has a dozen 5KB chunks that can be eliminated, 
maybe they don't all have to go, but at least half of them do. The more the 
better.


To help us triage these issues, we have a tracking bug (https://bugzil.la/memshrink-content), 
and a per-bug whiteboard tag ([overhead:...]) which gives an estimate of how 
much per-process overhead we believe fixing that bug would eliminate. Please 
feel free to add blockers to the tracking bug if you think they're relevant, and 
to add or update [overhead] tags if you have reasonable estimates.


With all of that said, here's a brief update of the progress we've made so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and JS 
memory usage in particular has dropped 1.1-1.9MB.

Particular credit goes to:

 * Eric Rahm added an AWSY test suite to track base content process memory
   (https://bugzil.la/1442361). Results:

    Resident unique: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4&series=mozilla-central,1684846,1,4&series=mozilla-central,1685133,1,4&series=mozilla-central,1685127,1,4
    Explicit allocations: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4&series=mozilla-inbound,1706220,1,4&series=mozilla-inbound,1706216,1,4
    JS: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4&series=mozilla-central,1685137,1,4&series=mozilla-central,1685131,1,4

 * Andrew McCreight created a tool for tracking JS memory usage, and figuring
   out which scripts and objects are responsible for how much of it
   (https://bugzil.la/1463569).

 * Andrew and Nika Layzell also completely rewrote the way we handle XPIDL type
   info so that it's statically compiled into the executable and shared between
   all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

 * Felipe Gomes split a bunch of code out of frame scripts so that it could be
   lazily loaded only when needed (https://bugzil.la/1467278, ...) and added a
   whitelist of JSMs that are allowed to be loaded at content process startup
   (https://bugzil.la/1471066)

 * I did a bit of this too, and also prevented us from loading some other JSMs
   before we need them (https://bugzil.la/1470333, https://bugzil.la/1469719,
   ...)

 * Nick Nethercote made dynamic nsAtoms allocate their string storage inline
   rather than use a refcounted StringBuffer (https://bugzil.la/1447951)

 * Emilio Álvarez reduced the amount of memory the Gecko Profiler uses in
   content processes.

 * Nathan Froyd fixed our static nsAtom code so it didn't generate static
   initializers (https://bugzil.la/1455178) and reduced the stack size of our
   image decoder threads (https://bugzil.la/1443932).

 * Doug Thayer reduced the number of hang monitor threads we start in each
   process (https://bugzil.la/1448040)

 * Boris Zbarsky removed a bunch of useless QueryInterface implementations
   (https://bugzil.la/1452862), made our static isInstance methods use less
   memory (https://bugzil.la/1452786), and generally deleted a bunch of
   useless, legacy nsI* interfaces that required us to add extra vtable
   pointers to a lot of DOM object instances.

And your humble author contributed the following:

 * Changed our localization string bundles to use shared memory for bundles
   which are loaded into content processes (https://bugzil.la/1470365).
   This bug also adds some helpers which should make it easer to use shared
   memory for more things in the future.

 * Made some changes to the script preloader to avoid keeping an unnecessary
   encoded copy of scripts in the content process (https://bugzil.la/1470793),
   to drop cached single-use scripts (https://bugzil.la/1471091), and to improve
   the set of scripts we load in content processes (https://bugzil.la/1471089).

 * Made some smaller optimizations to avoid making copies of strings in
   preference callbacks (https://bugzil.la/1472523), and to remove the XPC
   compilation scope (https://bugzil.la/1442737)

Apologies to anyone I missed.


[1]: Please feel free to read the '.' as a '!' if you're so inclined. I
     generally shy away from exclamation marks.
[2]: If this seems like a massive rip-off of Ehsan's Quantum Flow newsletter
     format, that's because it is. Thanks, Ehsan :)
[3]: 50KB per process, which is to say 5MB across 100 content processes.
[4]: The total memory mapped by each content process which is not shared by
     other processes. Approximately equal to USS.
[5]: It's hard to be precise, since the numbers can be noisy, and are often
     bi-modal.
0
Kris
7/10/2018 6:19:03 PM
mozilla.dev.platform 6420 articles. 0 followers. Post Follow

6 Replies
14 Views

Similar Articles

[PageSpeed] 36

>Welcome to the first edition of the Fission MemShrink newsletter.[1]

This is awesome and critical.

I'll note (and many of you know this well) that in addition to getting
rid of allocations (or making them lazy), another primary solution is to
move data out of the Content processes, and into the master process (or
some other shared process, if that's advisable for security or other
reasons), and access the data over IPC.  Or you can move it to a shared
memory block (with appropriate locking if not static).  For example, on
linux one of our worst offenders is fontconfig; Chrome for example
remotes much of that to the master process.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
0
Randell
7/10/2018 7:38:31 PM
Is there a guideline that should be used to evaluate what can
acceptably run in the same process for different sites?

I assume the primary goal is to prevent one site from reading
information that should only be available to another site?

There would also be defense-in-depth value from having each site
sandboxed separately because a security breach from one site could
not compromise another.

I guess a single compositor process is acceptable because there is
essentially no information returning from the compositor?

A font server may be acceptable, because information returned is
of limited power?

Use of system font, graphics, or audio servers is in a similar
bucket I guess.

Would using a single process for network be acceptable, not
because information returned is limited, but because we're willing
to have some compromise because there is a small API surface?  Or
would that be acceptable because content JS does not run in that
process?

Would it be acceptable to perform layout in a single process for
multiple sites (if that were practical)?

Would it be easier to answer the opposite question?  What should
not run in a shared process?  JS is a given.  Others?
0
Karl
7/11/2018 11:25:50 PM
On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson <moznews@karlt.net> wrote:

> Would it be easier to answer the opposite question?  What should
> not run in a shared process?  JS is a given.  Others?
>

Currently when an exploitable bug is found in content process code,
attackers use JS to weaponize it with an arsenal of known techniques (e.g.
heap spraying and shaping). An important question is whether, assuming a
similar bug were found in a shared non-content process, how difficult would
it be for content JS to apply those techniques remotely across the process
boundary? That would be a pretty interesting problem for security
researchers to work on.

Use of system font, graphics, or audio servers is in a similar bucket I
> guess.
>

Taking control of an audio server would let you listen into phone calls,
which seems interesting.

Another question is whether you can exfiltrate cross-origin data by
performing side-channel attacks against those shared processes. You
probably need to assume that Spectre-ish attacks will be blocked at process
boundaries by hardware/OS mitigations, but there could be
browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
exposed to content processes leak useful information about the activities
of other processes? Of course there are cross-origin timing-based
information leaks that are already known and somewhat unfixable :-(.

Rob
-- 
Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy ot
mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil eht. Efil
fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah ruo dna ta
dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw, draeh evah ew
hcihw, gninnigeb eht morf saw hcihw taht.
0
Robert
7/11/2018 11:56:04 PM
On Wed, Jul 11, 2018 at 6:25 PM, Karl Tomlinson <moznews@karlt.net> wrote:

> Is there a guideline that should be used to evaluate what can
> acceptably run in the same process for different sites?
>


This is on me to write. I have been slow at doing so mainly because there's
a lot of "What does X look like and where do its pats run" investigation I
feel I need to do to write it. (For X in at least { WebExtensions, WebRTC,
Compositing, Filters, ... })



> I assume the primary goal is to prevent one site from reading
> information that should only be available to another site?
>

Yep.



On Wed, Jul 11, 2018 at 6:56 PM, Robert O'Callahan <robert@ocallahan.org>
wrote:

> On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson <moznews@karlt.net>
> wrote:
>
> > Would it be easier to answer the opposite question?  What should
> > not run in a shared process?  JS is a given.  Others?
> >
>
> Currently when an exploitable bug is found in content process code,
> attackers use JS to weaponize it with an arsenal of known techniques (e.g.
> heap spraying and shaping). An important question is whether, assuming a
> similar bug were found in a shared non-content process, how difficult would
> it be for content JS to apply those techniques remotely across the process
> boundary?


You're completely correct.


> That would be a pretty interesting problem for security
> researchers to work on.
>

It's always illustrative to have exploits that demonstrate this goal in the
target of interest - they may have created generic techniques that we can
address fundamentally (like with Memory Partitioning or Allocator
Hardening).  But people have been writing exploits for targets that don't
have a scripting environment for two decades or more, so all of those are
prior art for this sort of exploitation.  This isn't a reason not to pursue
this work, and it's not saying this work isn't a net security win though!

I have been pondering (and brainstormed with a few people) about creating
something Google native-client-like to enforce process-like state
separation between threads in a single process. That might make it safer to
share utility processes between content processes. But it's considerably
less straightforward than I was hoping. Big open research question.


Use of system font, graphics, or audio servers is in a similar bucket I
> > guess.
> >
>
> Taking control of an audio server would let you listen into phone calls,
> which seems interesting.
>
> Another question is whether you can exfiltrate cross-origin data by
> performing side-channel attacks against those shared processes. You
> probably need to assume that Spectre-ish attacks will be blocked at process
> boundaries by hardware/OS mitigations, but there could be
> browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
> exposed to content processes leak useful information about the activities
> of other processes? Of course there are cross-origin timing-based
> information leaks that are already known and somewhat unfixable :-(.


Yup!

-tom
0
Tom
7/12/2018 3:10:27 PM
On Wednesday, July 11, 2018 at 4:19:15 AM UTC+10, Kris Maglione wrote:
> [...]
> Essentially what this means, though, is that if we identify an area of 
> overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
> eliminated. There just aren't that many large chunks to remove. They all need 
> to go. And if an area of code has a dozen 5KB chunks that can be eliminated, 
> maybe they don't all have to go, but at least half of them do. The more the 
> better.

Some questions: -- Sorry if some of this is already common knowledge or has been discussed.

Are there tools available, that could easily track memory usage of specific things?
E.g., could I instrument one class, so that every allocation would be tracked automatically, and I'd get nice stats at the end?
Including wasted space because of larger allocation blocks?

Could I even run what-if scenarios, where I could instrument a class and extract its current size but also provide an alternate size (based on what I think I could make it shrink), and in the end I'll know how much I could save overall?

Do we have Try tests that simulate real-world usage, so we could collect memory-usage data that's relevant to our users, but also reproducible?

Should there be some kind of Talos-like CI tests that focus on memory usage, so we'd get some warning if a particular patch suddenly eats too much memory?
0
gsquelart
7/14/2018 12:22:48 AM
On 7/13/18 5:22 PM, gsquelart@mozilla.com wrote:
> E.g., could I instrument one class, so that every allocation would be tracked automatically, and I'd get nice stats at the end?

You mean apart from just having a memory reporter for it?

> Including wasted space because of larger allocation blocks?

Memory reporters using mallocSizeOf include that space, yes.

> Could I even run what-if scenarios, where I could instrument a class and extract its current size but also provide an alternate size (based on what I think I could make it shrink), and in the end I'll know how much I could save overall?

You could hack the relevant memory reporter, sure.

> Do we have Try tests that simulate real-world usage, so we could collect memory-usage data that's relevant to our users, but also reproducible?

See the "awsy-10s" test suite, which sort of aims to do that.

> Should there be some kind of Talos-like CI tests that focus on memory usage, so we'd get some warning if a particular patch suddenly eats too much memory?

This is what awsy-e10s aims to do, yes.

-Boris
0
Boris
7/14/2018 12:43:04 AM
Reply: