[wildfly-dev] Speeding up WildFly boot time

David M. Lloyd david.lloyd at redhat.com
Thu May 18 15:18:19 EDT 2017


On 05/18/2017 09:04 AM, Andrig Miller wrote:
> On Thu, May 18, 2017 at 7:50 AM, David M. Lloyd <david.lloyd at redhat.com>
> wrote:
> 
>> On 05/15/2017 05:21 PM, Stuart Douglas wrote:
>>> On Tue, May 16, 2017 at 1:34 AM, David M. Lloyd <david.lloyd at redhat.com>
>> wrote:
>>>> Exploding the files out of the JarFile could expose this contention and
>>>> therefore might be useful as a test - but it would also skew the results
>>>> a little because you have no decompression overhead, and creating the
>>>> separate file streams hypothetically might be somewhat more (or less)
>>>> expensive.  I joked about resurrecting jzipfile (which I killed off
>>>> because it was something like 20% slower at decompressing entries than
>>>> Jar/ZipFile) but it might be worth considering having our own JAR
>>>> extractor at some point with a view towards concurrency gains.  If we go
>>>> this route, we could go even further and create an optimized module
>>>> format, which is an idea I think we've looked at a little bit in the
>>>> past; there are a few avenues of exploration here which could be
>>>> interesting.
>>>
>>> This could be worth investigating.
>>
>> Tomaž did a prototype of using the JDK JAR filesystem to back the
>> resource loader if it is available; contention did go down but memory
>> footprint went up, and overall the additional indexing and allocation
>> ended up slowing down boot a little, unfortunately (though large numbers
>> of deployments seemed to be faster).  Tomaž can elaborate on his
>> findings if he wishes.
>>
>> I had a look in the JAR FS implementation (and its parent class, the ZIP
>> FS implementation, which does most of the hard work), and there are a
>> few things which add overhead and contention that we don't need, like
>> using read/write locks to manage access and modifications (which we
>> don't need) and (synch-based) indexing structures that might be somewhat
>> larger than necessary.  They use NIO channels to access the zip data,
>> which is probably OK, but maybe mapped buffers could be better... or
>> worse?  They use a synchronized list per JAR file to pool Inflaters;
>> pooling is a hard thing to do right so maybe there isn't really any
>> better option in this case.
>>
>> But in any event, I think a custom extractor still might be a reasonable
>> thing to experiment with.  We could resurrect jzipfile or try a
>> different approach (maybe see how well mapped buffers work?).  Since
>> we're read-only, any indexes we use can be immutable and thus
>> unsynchronized, and maybe more compact as a result. We can use an
>> unordered hash table because we generally don't care about file order
>> the way that JarFile historically needs to, thus making indexing faster.
>>    We could save object allocation overhead by using a specialized
>> object->int hash table that just records offsets into the index for each
>> entry.
>>
>> If we try mapped buffers, we could share one buffer concurrently by
>> using only methods that accept an offset, and track offsets
>> independently.  This would let the OS page cache work for us, especially
>> for heavily used JARs.  We would be limited to 2GB JAR files, but I
>> don't think that's likely to be a practical problem for us; if it ever
>> is, we can create a specialized alternative implementation for huge JARs.
>>
> 
> ​I'm not so sure that the OS page cache will do anything here.  I actually
> think it would be better if we could open the JAR files using direct I/O,
> but of course Java doesn't support that, and that would require native
> code, so not the greatest option.

What the page cache would theoretically do for us is keep "hot" areas 
(i.e. the index) of commonly-used JAR files in RAM, while letting "cold" 
JARs be paged out, without consuming Java heap or committed memory (thus 
avoiding GC), while allowing total random access, without any special 
buffer management.  Because we are only reading and not writing, direct 
I/O won't likely help: either way you block to read from disk, but with 
memory mapping, you can reread an area many times and the OS will keep 
it handy for you.

On Linux, the page cache works very similarly whether you're mapping in 
a file or allocating memory from the OS: recently-used pages stay in 
physical RAM, and old pages get flushed to disk (BUT only if they're 
dirty) and dropped from physical RAM.  So it's effectively similar to 
allocating several hundred MB, copying all the JAR contents into that 
memory, and then referencing that, except that in this case you'd have 
to ensure that there is enough RAM+swap to accommodate it; behaviorally 
the primary difference is that the mmaped file is "paged out" by default 
and loaded on demand, whereas the eager allocated memory is "paged in" 
by default as you populate it and the pages have to age out.  Since we 
are generally not reading entire JAR files though, the lazy behavior 
should theoretically be a bit better for us.

On the other hand, this is a far worse option for 32-bit platforms for 
the same reason that it's useful on 64-bit: address space.  If we map in 
all the JARs that *we* ship, that could be as much as 25% or more of the 
available address space gone instantly.  So if we did explore this 
route, we'd need it to be switchable, with sensible defaults based on 
the available logical address size (and, as I said before, size of the 
target object).

The primary resource cost here (other than address space) is page table 
entries.  We'd be talking about probably hundreds of thousands, once 
every module has been referenced, on a CPU with 4k pages.  In terms of 
RAM, that's not too much; each one is only a few bytes plus (I believe) 
a few more bytes for bookkeeping in the kernel, and the kernel is pretty 
damned good at managing them at this point.  But it's not nothing.

Of course all this is just educated (?) speculation unless we test & 
measure it.  I suspect that in the end, it'll be subtle tradeoffs, just 
like everything else ends up being.

> 
> Andy
>> 
>>
>> In Java 9, jimages become an option by way of jlink, which will also be
>> worth experimenting with (as soon as we're booting on Java 9).
>>
>> Brainstorm other ideas here!
>> --
>> - DML
>> _______________________________________________
>> wildfly-dev mailing list
>> wildfly-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
> 
> 
> 


-- 
- DML


More information about the wildfly-dev mailing list