Hello JBossCache gurus
I've been toying with the idea of using in-memory replication across a cluster to
preserve transaction log information for JBossTS, as an alternative to writing it to disk.
What follows is my current thinking on the matter, which you can poke holes in at your
leisure.
'Memory is the new disk', so let's use it as such...
Transaction log entries are typically short lived (just the interval between the prepare
and commit phases if all goes well) but must survive a node failure. Or, depending on the
degree of user paranoia, perhaps multiple node failures. The size is not much - a few
hundred bytes per transaction. Writing the tx log to RAID is a major bottleneck for
transaction systems and hence app servers.
JBossTS already has a pluggable interface for transaction log ('ObjectStore')
implementations, so writing one based on JBossCache is not too difficult. The relative
performance of this approach compared to the existing file system or database stores
remains to be seen. Of course it largely depends on the disk and network hardware and
utilization. I should be able to get some preliminary numbers without too much work, but
first I need to decide what configurations to test...
Clearly the number of replicas is critical - it must be high enough to ensure at least one
node will survive any outage, but low enough to perform well.
Writes must be synchronous for obvious reasons, but ideally a node that is up should not
halt just because another member of the cluster is down. That model would preserve
information but reduce availability, which is undesirable.
So my first question is: does the cache support a mode somewhere between async and sync,
say 'return when at least M of N nodes have acked' ? I can get something similar
with buddy replication, but it's not quite the model I want - if more than M nodes are
available they should be used. Similarly the crash of one buddy should not halt the system
if there is an additional node available such that the total live number remains more than
M. Perhaps I can do this only with the raw JGroups API, not the cache?
Also, are there any numbers on the performance as a function of groups size, particularly
mixing nodes on the same or different network segments. I'm thinking that to get
independent failure characteristics for the nodes will probably require a distributed
cluster, such that the nodes are on different power supplies etc. Having all the nodes in
the same rack probably provides a false sense of security...
On a similar note, whilst cache puts must be synchronous, my design can tolerate
asynchronous removes. Is such a hybrid configuration possible?
Transaction log entries fall into two groups: the ones for transactions that complete
cleanly and the ones for transaction that go wrong. The former set is much larger and its
members have a lifetime of at most a few seconds. The failure set is much smaller
(hopefully empty!) but entries may persist indefinitely.
I'm thinking of setting up the cache loaders such that the eviction time is longer
than the expected lifetime of members of the first group. What I want to achieve is
this:
Synchronous write of an entry to at least N in-memory replicas.
If the transaction works, remove, possibly asynchronously, of that information from the
cluster.
If the transaction fails, writing of the entry to disk for longer term storage.
Critically this is not the same as having all writes go through to disk. Is it possible to
configure the cache loaders to write only on eviction?
Or I guess there is another possibility: since the loader's writes are asynchronous
with respect to cache puts, is it possible to have it try to write everything, but
intelligently remove queued writes from its work list if the corresponding node is removed
before the write for its 'put' is made? That would effectively cause the disk to
operate at its max throughput without (subject to the size limit of the work q) throttling
the in-memory replication. It thus provides an extra layer of assurance compared to
in-memory only copies but without the performance hit of synchronous disk writes.
Also, it vital to ensure there is no circular dependency between the cache and the
transaction manager. I'm assuming this can be achieved simply by ensuring there is no
transaction context on the thread at the time the cache API is called. Or does it use
transactions JTA anywhere internally?
One final question: Am I totally mad, or only mildly demented?
Thanks
Jonathan.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4185748#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...