Currently JBM 2.0 uses server replication between a live and a backup server to maintain
the backup server in a (quasi) identical state so that on failover of a client from the
live to the backup server, the clients session(s) can be found in the exact same state
they left off on on the live server and can be reattached so the client can continue
it's operation 100% transparently.
The current implementation in TRUNK uses a single thread for replication which makes it
easier to guarantee that any state changes on the backup are applied in the same order as
live, but has a down side in that forcing everything to be single threaded destroys
concurrency on the server, effectively pushing everything to a single core.
Recently I have been working on multi-threaded replication. This allows state changes to
be applied on backup by many different threads so we solve the concurrency problem.
However, we still have to ensure that state changes are applied globally in the same order
on backup as live. This is tricky with multiple threads. The technique used is to note the
acquisition of mutexes around shared data on the live node and when replicating we
replicate this list of acquires too. On the backup node we create a special mutex which
forces locks to be obtained in the same order as the list.
This is a complex problem to solve/implement. The current status is it's "more or
less working" but not ready yet, and would probably take a significant amount of time
to complete/debug fully etc.
The replication code significantly complicates the server code, and all replication comes
at a cost of latency - since you need to make sure each packet is replicated and received
on the backup before returning results to the user.
Let's take a look at what other messaging systems do:
1) Weblogic JMS - they don't use server replication
2) Websphere MQ - they don't use server replication
3) Tibco EMS - they don't use server replication
4) ActiveMQ - has slow synchronous single threaded replication
5) SonicMQ - *does* have full server replication.
Really, only one of our significant competitors (SonicMQ) actually does server
replication.
Most of them do failover via a shared store on a shared filesystem, any session state is
lost.
Since most users use one of the above systems which typically don't have server
replication, it seems to me it can't be a critically important feature, that's
worth the cost (latency, performance). A non replicating server is likely to be faster
than a replicating server.
Therefore, what I am proposing is we remove full server replication from the JBM 2.0
server, since it's not worth the cost in terms of
a) Performance overhead
b) Maintainability difficulties
c) Hard work in implementing and debugging it.
Compared to the small benefit of having 100% transparent failover.
If we remove full server replication, when a client detects server failure it can stil
automatically fail over to the backup server and automatically reconnect, the only
difference will be the session state won't be there, so in a non transacted session,
any messages or acks sent might not have actually reached the server which could result in
sent messages being lost or duplicates delivered.
For a transacted session, that has already sent messages or acks before failover occurs,
we just need to flag the session as rollback only, and on commit, the commit will fail
with a TransactionRolledBackException. The user would need to catch this and restart the
transaction. In such a way we can maintain the once and only once delivery guarantee and
never lose messages or get duplicates with transacted sessions.
AIUI is pretty much how the majority of the other messaging systems handle failover.
With no full server replication the user can choose two modes of HA:
1) Failover via a shared store residing on a shared file system. When the live fails the
backup loads the journal, and clients can connect to it.
2) Replicated data store. We can replicate the data store from the live to the backup node
so there's no need for a shared file system. Replicating the data store is a lot
easier than replicating the entire server
I'll park the MT replication code in a branch in case we want to revisit it in the
future.
Thoughts?
View the original post :
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4248387#...
Reply to the post :
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&a...