[infinispan-dev] Cachestores performance

Radim Vansa rvansa at redhat.com
Tue Jul 2 05:09:33 EDT 2013


Hi,

I've written down this proposal for the implementation of new cache store.

https://community.jboss.org/wiki/BrnoCacheStoreDesignProposal

WDYT?

Radim

----- Original Message -----
| From: "Radim Vansa" <rvansa at redhat.com>
| To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
| Sent: Thursday, June 27, 2013 2:37:43 PM
| Subject: Re: [infinispan-dev] Cachestores performance
| 
| 
| 
| ----- Original Message -----
| | From: "Galder Zamarreño" <galder at redhat.com>
| | To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
| | Sent: Thursday, June 27, 2013 1:52:11 PM
| | Subject: Re: [infinispan-dev] Cachestores performance
| | 
| | > As for Karsten's FCS implementation, I too have issues with the key set
| | > and
| | > value offsets being solely in memory.  However I think that could be
| | > improved by storing only a certain number of keys/offsets in memory, and
| | > flushing the rest to disk again into an index file.
| | 
| | ^ Karsten's implementation makes this relatively easy to achieve because it
| | already keeps this mapping in a LinkedHashMap (with a given max entries
| | limit [1]) assuming removeEldestEntry() is overriden to flush to disk older
| | entries. Some extra logic would be needed to bring back data from the disk
| | too… but your suggestion below is also quite interesting...
| 
| I certainly wouldn't call this easy task, because the most problematic part
| is what we will do when the whole entry (both key and value) are gone from
| memory and we want to read them - that requires keeping some searchable
| structure on-disk. And that's the hard stuff.
| 
| | 
| | > I believe LevelDB follows a similar design, but I think Karsten's FCS
| | > will
| | > perform better than LevelDB since it doesn't attempt to maintain a sorted
| | > structure on disk.
| | 
| | ^ In-memory, the structure can optionally be ordered if it's bound [1],
| | otherwise it's just a normal map. How would be store it at the disk level?
| | B+ tree with hashes of keys and then linked lists?
| 
| Before choosing "I love B#@& trees, let's use B#@& trees!", I'd find out what
| requirements do we have for the structure. I believe that the index itself
| should not be considered persistent, as it can be rebuilt when preloading
| the data (sequentially reading the data is fast, therefore we can afford do
| this indexing preload), the reason of the index being on-disk is that we
| don't have enough memory to store all keys, or even key hashes. Therefore it
| does not have to be updated synchronously with the writes. It should be
| mostly read-optimized then, because that's the thing where we need
| synchronous access to this structure.
| 
| | 
| | > One approach to maintaining keys and offsets in memory could be a
| | > WeakReference that points to the key stored in the in-memory
| | > DataContainer.  Once evicted from the DC, then the CacheStore impl would
| | > need to fetch the key again from the index file before looking up the
| | > value in the actual store.
| | 
| | ^ Hmmm, interesting idea… has the potential to safe the memory space by not
| | having to keep that extra data structure in the cache store.
| 
| You mean to mix the DataContainer with xCacheEntry implementation and the
| cache store implementation? Is that possible from design perspective?
| Speaking about different kind of references, we may even optimize
| not-well-tuned eviction by SoftReferences, so that even if the entry was
| evicted from main DataContainer, we'd keep the value referenced from the
| cache-store (and this does not have to be loaded from disk if referenced
| before garbage collection). But such thought may be premature optimization.
| For having eviction managed in relation with GC we should rather combine
| this with PhantomReferences, where entries would be written to cache upon
| finalization.
| 
| | 
| | > This way we have hot items always in memory, semi-hot items with offsets
| | > in
| | > memory and values on disk, and cold items needing to be read off disk
| | > entirely (both offset and value).  Also for write-through and
| | > write-behind, as long as the item is hot or warm (key and offset in
| | > memory), writing will be pretty fast.
| | 
| | My worry about Karsten's impl is writing actually. If you look at the last
| | performance numbers in [2], where we see the performance difference of
| | force=true and force=false in Karsten's cache store compared with LevelDB
| | JNI, you see that force=false is fastest, then JNI LevelDB, and the
| | force=true. Me wonders what kind of write guarantees LevelDB JNI provides
| | (and the JAVA version)...
| 
| Just for clarification: the fast implementation is without force at all, the
| slower is with force(false). Force(true) means updating metadata (such as
| access times?) which is not required for cache-store.
| But the numbers suggest that the random access with syncing is really not a
| good option, and that we should rather use the temporary append-only log,
| which would be persisted into structured DB by different thread (as LevelDB
| does, I suppose).
| 
| Thinking about all the levels and cache structures optimizing the read
| access, I can see four levels of search structures: key + value (usual
| DataContainer), key + offset, hash + offset, all on disk. The "hash +
| offset" may seem superflous but for some use-cases with big keys it may be
| worth sparing a few disk look-ups.
| 
| Radim
| 
| | > 
| | > On 27 Jun 2013, at 10:33, Radim Vansa <rvansa at redhat.com> wrote:
| | > 
| | >> Oops, by the cache store I mean the previously-superfast
| | >> KarstenFileCacheStore implementation.
| | >> 
| | >> ----- Original Message -----
| | >> | From: "Radim Vansa" <rvansa at redhat.com>
| | >> | To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
| | >> | Sent: Thursday, June 27, 2013 11:30:53 AM
| | >> | Subject: Re: [infinispan-dev] Cachestores performance
| | >> | 
| | >> | I have added FileChannel.force(false) flushes after all write
| | >> | operations
| | >> | in
| | >> | the cache store, and now the comparison is also updated with these
| | >> | values.
| | >> | 
| | >> | Radim
| | >> | 
| | >> | ----- Original Message -----
| | >> | | From: "Radim Vansa" <rvansa at redhat.com>
| | >> | | To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
| | >> | | Sent: Thursday, June 27, 2013 8:54:25 AM
| | >> | | Subject: Re: [infinispan-dev] Cachestores performance
| | >> | | 
| | >> | | Yep, write-through. LevelDB JAVA used FileChannelTable
| | >> | | implementation
| | >> | | (-Dleveldb.mmap), because Mmaping is not implemented very well and
| | >> | | causes
| | >> | | JVM crashes (I believe it's because of calling non-public API via
| | >> | | reflection
| | >> | | - I've found post from the Oracle JVM guys discouraging the
| | >> | | particular
| | >> | | trick
| | >> | | it uses). After writing the record to the log, it calls
| | >> | | FileChannel.force(true), therefore, it should be really on the disc
| | >> | | by
| | >> | | that
| | >> | | moment.
| | >> | | I have not looked into the JNI implementation but I expect the same.
| | >> | | 
| | >> | | By the way, I have updated [1] with numbers when running on more
| | >> | | data
| | >> | | (2 GB
| | >> | | instead of 100 MB). I won't retype it here, so look there. The
| | >> | | performance
| | >> | | is much lower.
| | >> | | I may try also increase JVM heap size and try with a bit more data
| | >> | | yet.
| | >> | | 
| | >> | | Radim
| | >> | | 
| | >> | | [1] https://community.jboss.org/wiki/FileCacheStoreRedesign
| | >> | | 
| | >> | | ----- Original Message -----
| | >> | | | From: "Erik Salter" <an1310 at hotmail.com>
| | >> | | | To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
| | >> | | | Sent: Wednesday, June 26, 2013 7:40:19 PM
| | >> | | | Subject: Re: [infinispan-dev] Cachestores performance
| | >> | | | 
| | >> | | | These were write-through cache stores, right?  And with LevelDB,
| | >> | | | this was
| | >> | | | through to the database file itself?
| | >> | | | 
| | >> | | | Erik
| | >> | | | 
| | >> | | | -----Original Message-----
| | >> | | | From: infinispan-dev-bounces at lists.jboss.org
| | >> | | | [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Radim
| | >> | | | Vansa
| | >> | | | Sent: Wednesday, June 26, 2013 11:24 AM
| | >> | | | To: infinispan -Dev List
| | >> | | | Subject: [infinispan-dev] Cachestores performance
| | >> | | | 
| | >> | | | Hi all,
| | >> | | | 
| | >> | | | according to [1] I've created the comparison of performance in
| | >> | | | stress-tests.
| | >> | | | 
| | >> | | | All setups used local-cache, benchmark was executed via Radargun
| | >> | | | (actually
| | >> | | | version not merged into master yet [2]). I've used 4 nodes just to
| | >> | | | get
| | >> | | | more
| | >> | | | data - each slave was absolutely independent of the others.
| | >> | | | 
| | >> | | | First test was preloading performance - the cache started and
| | >> | | | tried
| | >> | | | to
| | >> | | | load
| | >> | | | 1GB of data from harddrive. Without cachestore the startup takes
| | >> | | | about 2
| | >> | | | -
| | >> | | | 4
| | >> | | | seconds, average numbers for the cachestores are below:
| | >> | | | 
| | >> | | | FileCacheStore:        9.8 s
| | >> | | | KarstenFileCacheStore:  14 s
| | >> | | | LevelDB-JAVA impl.:   12.3 s
| | >> | | | LevelDB-JNI impl.:    12.9 s
| | >> | | | 
| | >> | | | IMO nothing special, all times seem affordable. We don't benchmark
| | >> | | | exactly
| | >> | | | storing the data into the cachestore, here FileCacheStore took
| | >> | | | about
| | >> | | | 44
| | >> | | | minutes, while Karsten about 38 seconds, LevelDB-JAVA 4 minutes
| | >> | | | and
| | >> | | | LevelDB-JNI 96 seconds. The units are right, it's minutes compared
| | >> | | | to
| | >> | | | seconds. But we all know that FileCacheStore is bloody slow.
| | >> | | | 
| | >> | | | Second test is stress test (5 minutes, preceded by 2 minute
| | >> | | | warmup)
| | >> | | | where
| | >> | | | each of 10 threads works on 10k entries with 1kB values (~100 MB
| | >> | | | in
| | >> | | | total).
| | >> | | | 20 % writes, 80 % reads, as usual. No eviction is configured,
| | >> | | | therefore
| | >> | | | the
| | >> | | | cache-store works as a persistent storage only for case of crash.
| | >> | | | 
| | >> | | | FileCacheStore:         3.1M reads/s   112 writes/s  // on one
| | >> | | | node
| | >> | | | the
| | >> | | | performance was only 2.96M reads/s 75 writes/s
| | >> | | | KarstenFileCacheStore:  9.2M reads/s  226k writes/s  // yikes!
| | >> | | | LevelDB-JAVA impl.:     3.9M reads/s  5100 writes/s
| | >> | | | LevelDB-JNI impl.:      6.6M reads/s   14k writes/s  // on one
| | >> | | | node
| | >> | | | the
| | >> | | | performance was 3.9M/8.3k - about half of the others
| | >> | | | Without cache store:   15.5M reads/s  4.4M writes/s
| | >> | | | 
| | >> | | | Karsten implementation pretty rules here for two reasons. First of
| | >> | | | all,
| | >> | | | it
| | >> | | | does not flush the data (it calls only RandomAccessFile.write()).
| | >> | | | Other
| | >> | | | cheat is that it stores in-memory the keys and offsets of data
| | >> | | | values in
| | >> | | | the
| | >> | | | database file. Therefore, it's definitely the best choice for this
| | >> | | | scenario,
| | >> | | | but it does not allow to scale the cache-store, especially in
| | >> | | | cases
| | >> | | | where
| | >> | | | the keys are big and values small. However, this performance boost
| | >> | | | is
| | >> | | | definitely worth checking - I could think of caching the disk
| | >> | | | offsets in
| | >> | | | memory and querying persistent index only in case of missing
| | >> | | | record,
| | >> | | | with
| | >> | | | part of the persistent index flushed asynchronously (the index can
| | >> | | | be
| | >> | | | always
| | >> | | | rebuilt during the preloading for case of crash).
| | >> | | | 
| | >> | | | The third test should have tested the scenario with more data to
| | >> | | | be
| | >> | | | stored
| | >> | | | than memory - therefore, the stressors operated on 100k entries
| | >> | | | (~100 MB
| | >> | | | of
| | >> | | | data) but eviction was set to 10k entries (9216 entries ended up
| | >> | | | in
| | >> | | | memory
| | >> | | | after the test has ended).
| | >> | | | 
| | >> | | | FileCacheStore:            750 reads/s         285 writes/s  //
| | >> | | | one
| | >> | | | node
| | >> | | | had
| | >> | | | only 524 reads and 213 writes per second
| | >> | | | KarstenFileCacheStore:    458k reads/s        137k writes/s
| | >> | | | LevelDB-JAVA impl.:        21k reads/s          9k writes/s  // a
| | >> | | | bit
| | >> | | | varying
| | >> | | | performance
| | >> | | | LevelDB-JNI impl.:     13k-46k reads/s  6.6k-15.2k writes/s  //
| | >> | | | the
| | >> | | | performance varied a lot!
| | >> | | | 
| | >> | | | 100 MB of data is not much, but it takes so long to push it into
| | >> | | | FileCacheStore that I won't use more unless we exclude this loser
| | >> | | | from
| | >> | | | the
| | >> | | | comparison :)
| | >> | | | 
| | >> | | | Radim
| | >> | | | 
| | >> | | | [1] https://community.jboss.org/wiki/FileCacheStoreRedesign
| | >> | | | [2] https://github.com/rvansa/radargun/tree/t_keygen
| | >> | | | 
| | >> | | | -----------------------------------------------------------
| | >> | | | Radim Vansa
| | >> | | | Quality Assurance Engineer
| | >> | | | JBoss Datagrid
| | >> | | | tel. +420532294559 ext. 62559
| | >> | | | 
| | >> | | | Red Hat Czech, s.r.o.
| | >> | | | Brno, Purkyňova 99/71, PSČ 612 45
| | >> | | | Czech Republic
| | >> | | | 
| | >> | | | 
| | >> | | | _______________________________________________
| | >> | | | infinispan-dev mailing list
| | >> | | | infinispan-dev at lists.jboss.org
| | >> | | | https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | >> | | | 
| | >> | | | 
| | >> | | | _______________________________________________
| | >> | | | infinispan-dev mailing list
| | >> | | | infinispan-dev at lists.jboss.org
| | >> | | | https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | >> | | 
| | >> | | _______________________________________________
| | >> | | infinispan-dev mailing list
| | >> | | infinispan-dev at lists.jboss.org
| | >> | | https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | >> | 
| | >> | _______________________________________________
| | >> | infinispan-dev mailing list
| | >> | infinispan-dev at lists.jboss.org
| | >> | https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | >> 
| | >> _______________________________________________
| | >> infinispan-dev mailing list
| | >> infinispan-dev at lists.jboss.org
| | >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | > 
| | > --
| | > Manik Surtani
| | > manik at jboss.org
| | > twitter.com/maniksurtani
| | > 
| | > Platform Architect, JBoss Data Grid
| | > http://red.ht/data-grid
| | > 
| | > 
| | > _______________________________________________
| | > infinispan-dev mailing list
| | > infinispan-dev at lists.jboss.org
| | > https://lists.jboss.org/mailman/listinfo/infinispan-dev
| | 
| | 
| | --
| | Galder Zamarreño
| | galder at redhat.com
| | twitter.com/galderz
| | 
| | Project Lead, Escalante
| | http://escalante.io
| | 
| | Engineer, Infinispan
| | http://infinispan.org
| | 
| | 
| | _______________________________________________
| | infinispan-dev mailing list
| | infinispan-dev at lists.jboss.org
| | https://lists.jboss.org/mailman/listinfo/infinispan-dev
| 
| _______________________________________________
| infinispan-dev mailing list
| infinispan-dev at lists.jboss.org
| https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list