[jboss-user] [JBoss Cache: Core Edition] - advice on cache (mis)use for transaction log store

Thu Oct 30 10:37:11 EDT 2008

Hello JBossCache gurus

I've been toying with the idea of using in-memory replication across a cluster to preserve transaction log information for JBossTS, as an alternative to writing it to disk. What follows is my current thinking on the matter, which you can poke holes in at your leisure.

'Memory is the new disk', so let's use it as such...

Transaction log entries are typically short lived (just the interval between the prepare and commit phases if all goes well) but must survive a node failure. Or, depending on the degree of user paranoia, perhaps multiple node failures. The size is not much - a few hundred bytes per transaction. Writing the tx log to RAID is a major bottleneck for transaction systems and hence app servers.

JBossTS already has a pluggable interface for transaction log ('ObjectStore') implementations, so writing one based on JBossCache is not too difficult. The relative performance of this approach compared to the existing file system or database stores remains to be seen. Of course it largely depends on the disk and network hardware and utilization. I should be able to get some preliminary numbers without too much work, but first I need to decide what configurations to test...

Clearly the number of replicas is critical - it must be high enough to ensure at least one node will survive any outage, but low enough to perform well.

Writes must be synchronous for obvious reasons, but ideally a node that is up should not halt just because another member of the cluster is down. That model would preserve information but reduce availability, which is undesirable.

So my first question is: does the cache support a mode somewhere between async and sync, say 'return when at least M of N nodes have acked' ?  I can get something similar with buddy replication, but it's not quite the model I want - if more than M nodes are available they should be used. Similarly the crash of one buddy should not halt the system if there is an additional node available such that the total live number remains more than M.  Perhaps I can do this only with the raw JGroups API, not the cache?

Also, are there any numbers on the performance as a function of groups size, particularly mixing nodes on the same or different network segments. I'm thinking that to get independent failure characteristics for the nodes will probably require a distributed cluster, such that the nodes are on different power supplies etc. Having all the nodes in the same rack probably provides a false sense of security...

On a similar note, whilst cache puts must be synchronous, my design can tolerate asynchronous removes. Is such a hybrid configuration possible?

Transaction log entries fall into two groups: the ones for transactions that complete cleanly and the ones for transaction that go wrong. The former set is much larger and its members have a lifetime of at most a few seconds. The failure set is much smaller (hopefully empty!) but entries may persist indefinitely.

I'm thinking of setting up the cache loaders such that the eviction time is longer than the expected lifetime of  members of the first group. What I want to achieve is this:

Synchronous write of an entry to at least N in-memory replicas.

If the transaction works, remove, possibly asynchronously, of that information from the cluster.

If the transaction fails, writing of the entry to disk for longer term storage.

Critically this is not the same as having all writes go through to disk. Is it possible to configure the cache loaders to write only on eviction?

Or I guess there is another possibility: since the loader's writes are asynchronous with respect to cache puts, is it possible to have it try to write everything, but intelligently remove queued writes from its work list if the corresponding node is removed before the write for its 'put' is made? That would effectively cause the disk to operate at its max throughput without (subject to the size limit of the work q) throttling the in-memory replication. It thus provides an extra layer of assurance compared to in-memory only copies but without the performance hit of synchronous disk writes.

Also, it vital to ensure there is no circular dependency between the cache and the transaction manager. I'm assuming this can be achieved simply by ensuring there is no transaction context on the thread at the time the cache API is called. Or does it use transactions JTA anywhere internally?

One final question: Am I totally mad, or only mildly demented?

Thanks

Jonathan.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4185748#4185748

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4185748