[Design of Messaging on JBoss (Messaging/JBoss)] - Re: Server side HA and failover
by timfox
Off the top of my head, this is more or less what you want to do:
Let's say the failed node is node 1, and the fail over node is node 2.
The post offices at node 2 will detect that a server has failed (via JGroups).
They then look in their node->failover node mapping to determine whether it must take over responsibility for the failed node.
node2 is marked as a failover node for node 1 in the mapping so the post offices at node 2 now realise that they must take over responsibility for the
failed nodes queues.
They then do the following:
Take the name map for node 1 and move all the entries to the name map for node 1. Change the names of the queues by suffixing with node id to ensure they
are unique.
Replace the remote queue stubs with local clustered queues and load them, keep note of the new local clustered queues for the next step.
Now iterate through the condition map.
For each cluster router on each condition map entry, iterate through the entries in the router. For each one from node 1, replace with the relevant
localclusteredQueue from the previous step.
Server side fail over should now be complete.
The next step will be client reconnection from the client side.
When a consumer is reconnected to the queue, just need to look it up with the new name (the old name suffix the node id) and hey presto.
There should be no need for any post office signature changes AFAIK.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3981039#3981039
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3981039
19 years, 5 months
[Design of Messaging on JBoss (Messaging/JBoss)] - Re: Client failover redeliveries discussion
by clebert.suconic@jboss.com
Ovidiu wrote : To spell this out for you, so you won't go over is again: The current code, that is in the SVN, does not behave this way. This is understandable, it's just a prototype. I was going over the use cases for which we need test cases, so we can make sure the correct behavior is preserved in the future. The correct behavior is "transparently copying the transactional state (the corresponding TxState instance) into the new ResourceManager and send the messages over the new connection when transaction commits." Please note the use of the word "transparent".
|
The Transaction still has any pending transaction on its buffer when a connection is failed over.
The code it's not a prototype IMO, it's one of the layers of the failover and currently I'm not considering the first layer yet. I have started from the second layer as when we started we were not sure what was going to happen with Remoting. I want to finish this layer now before starting on the first layer.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3981021#3981021
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3981021
19 years, 5 months
[Design of JBossCache] - Re: Eviction thread behaviour
by manik.surtani@jboss.com
anonymous wrote :
|
| What I don't understand is still the cause of the problem. I understand you can't reproduce it reading from the Jira, right?
|
|
I couldn't reproduce it because of a timing problem, but I do completely understand the cause of the problem. Consider (LIFO):
1) Eviction queue is close to full, cache region is full.
2) Start a tx
3) Add stuff to the cache
4) Causes older items in the region to be queued for eviction
5) tx reads item in cache, which was queued for eviction
6) Node visited event in 5) not yet received, Eviction Thread attemps to process queue. Waiting on RL in 5)
7) tx attempts to write more stuff, but blocks because this triggers more evictions and the eviction queue is now full.
The tx doesn't get a chance to commit and release the RL in 5), because 7) blocks. The eviction thread cannot empty the quete because it is waiting n 5). Deadlock, until lock timeout!
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3981000#3981000
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3981000
19 years, 5 months