If you do a put with a local option, it won't replicate to anyone, so the node that
did the replication will be out of sync with the buddies.
As to multiple nodes simultaneously doing a put on the same node, here's what happens.
I'm assuming the node already exists.
Assume no tx running. The data in question is stored on server0 and it's buddy
group.
1) You do a put() on server 1. Simultaneously a put() on server0.
2) DataGravitatorInterceptor.1 and DataGravitatorInterceptor.2 both see the node
doesn't exist; fetches the node's data from across the cluster.
3) DataGravitatorInterceptor.1 and .2 take the data and do a put (not local). This
replicates the data to its buddies. No tx, so no lock is held on the node. At this point
there are three copies of the data -- the server0 group's, the server1 group's and
the server2 group's.
4) DataGravitatorInterceptor.1 and .2 send a cleanup call to the cluster. Any copy of the
data not associated with the sending server's buddy group is removed.
5) The original puts go through.
The end result here will very much depend on how things get interleaved. With REPL_SYNC
you could end up with a TimeoutException in Step 4 as server1 and server2 tell each other
to remove the data and deadlock. Or server1 completes steps 3-5 and then server 2 executes
steps 3-5, in which case server 2's change wins. Or both complete step 3, then server
1 completes step 4 (so the server 0 and server 2 copies are gone), then server 2 completes
step 4 (so the server 1 copy is gone). Then the both complete step 5, resulting in 2 sets
of data, each of which only has the key/value pair included in the put.
Now, if there is a tx in place:
The put() in step 3 is done in a tx, so a write lock will be held on the node on each
server until the tx commits. The put will not replicate until the tx commits.
The removes in step 4 will also not be broadcast until the tx commits.
The put in step 5 will not be replicated until the tx commits.
The fact that the WL from step 3 is held should make steps 3-5 atomic. If it's
REPL_SYNC, you have two servers trying to write to the same node, so it's possible
when the tx tries to commit you'll get a TimeoutExceptio due to a lock conflict. With
REPL_ASYNC, the later tx will win; the step 5 put from the earlier tx will be lost.
But.. while writing this I'm pretty sure I've spotted a bug in the tx case. The
step 4 cleanup call gets bundled together with the other tx changes and therefore only
gets replicated to the server's buddy's, not to the whole cluster.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3994763#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...