[jboss-user] [JBossCache] - Replication Problem When Nodes Have Gone Away

jbirkenmaier do-not-reply at jboss.com
Fri Sep 22 16:56:15 EDT 2006


Hi. Here's the problem in a nutshell. 3-node cluster with shared tree cache. Nodes 1 and 2 go away at around the same time (via an unplugged network cable). Node 3 gets notification withing 10-12 seconds that Node 1 is gone and makes a few changes to the cache (within a transaction). Cache tries to replicate to Node 2 (not knowing it has gone away) and fails (ReplicationException). Node 3 thinks that his local cache has been updated but it hasn't because of the replication failure. Node 3 receives notification that Node 2 has gone away after ~50 seconds and again updates his cache, which works because there is no one left to replicate to.

There are two things I need help with:
1. I need to have my local cache update even when it fails to replicate.
2. Why does it take so long to receive notification that the second node has gone away when they were both on the same network cable that I unplugged? My JGroups timeout is set to 12 seconds max (counting retries). The two JGroups viewChange notifications are sometime more than 60 seconds apart.

Thanks for the help!
Jim

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3973649#3973649

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3973649



More information about the jboss-user mailing list