[jboss-jira] [JBoss JIRA] Commented: (JBCACHE-1349) Buddy replication state transfer fails if a marshalling region is empty

Mon May 19 06:40:07 EDT 2008

    [ http://jira.jboss.com/jira/browse/JBCACHE-1349?page=comments#action_12413180 ] 

Manik Surtani commented on JBCACHE-1349:
----------------------------------------

Ok, here is what I think:

1.  Agreed re: specialized exceptions.  We already have an InactiveRegionException which is a subclass of CacheException.
2.  I don't think the state transfer manager should throw an exception if the region is empty.  Just provide empty state.  As you suggested above.
3.  BR code should deal with inactive regions but NOT propagate this to the recipient, since with BR, the data owner is the only instance capable of activating the region in any meaningful way.  It should just exclude the region from state transferred.  When the region is later activated, state will be pushed.

> Buddy replication state transfer fails if a marshalling region is empty
> -----------------------------------------------------------------------
>
>                 Key: JBCACHE-1349
>                 URL: http://jira.jboss.com/jira/browse/JBCACHE-1349
>             Project: JBoss Cache
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Buddy Replication
>    Affects Versions: 2.1.1.GA
>            Reporter: Brian Stansberry
>         Assigned To: Manik Surtani
>             Fix For: 2.2.0.GA, 2.1.X
>
>
> Scenario: buddy replication is enabled, along with region-based marshalling. On one peer a region has been activated, but no root node for the region created.  Then a new peer joins the cluster, triggering a state transfer push from the existing peer.  Fails with the following exception:
> 2008-05-16 17:50:52,986 ERROR [org.jboss.cache.buddyreplication.BuddyManager] (AsyncViewChangeHandlerThread,127.0.0.1:35801) Caught exception handling view change
> org.jboss.cache.CacheException: Error acquiring state
> 	at org.jboss.cache.buddyreplication.BuddyManager.acquireState(BuddyManager.java:914)
> 	at org.jboss.cache.buddyreplication.BuddyManager.addBuddies(BuddyManager.java:802)
> 	at org.jboss.cache.buddyreplication.BuddyManager.reassignBuddies(BuddyManager.java:409)
> 	at org.jboss.cache.buddyreplication.BuddyManager.access$800(BuddyManager.java:56)
> 	at org.jboss.cache.buddyreplication.BuddyManager$AsyncViewChangeHandlerThread.handleEnqueuedViewChange(BuddyManager.java:1162)
> 	at org.jboss.cache.buddyreplication.BuddyManager$AsyncViewChangeHandlerThread.run(BuddyManager.java:1106)
> 	at java.lang.Thread.run(Thread.java:595)
> Caused by: org.jboss.cache.CacheException: Cache instance at 127.0.0.1:35801 cannot provide state for fqn /sfsb/ear=clusteredsession-local.jar,jar=clusteredsession-local.jar,name=ClusteredStateful,service=EJB3. There is no cache node at fqn /sfsb/ear=clusteredsession-local.jar,jar=clusteredsession-local.jar,name=ClusteredStateful,service=EJB3
> 	at org.jboss.cache.statetransfer.StateTransferManager.getState(StateTransferManager.java:121)
> 	at org.jboss.cache.buddyreplication.BuddyManager.generateState(BuddyManager.java:966)
> 	at org.jboss.cache.buddyreplication.BuddyManager.acquireState(BuddyManager.java:897)
> 	... 6 more
> This is because StateTransferManager.getState() is designed to throw CacheException if the region is inactive (not the case) or has no data (the case here).  This exception was really designed as a signal to propagate to a *total replication* state transfer *requestor* that there is no state on this peer (so the requestor can ask another peer).  But the buddy replication code isn't handling it and a single region like this breaks the whole state transfer.
> Some thoughts:
> 1) A specialized CacheException subclass should be created for this "signal"; plain CacheException is too generic.
> 2) It seems the case of "region inactive" is different from "no data". I don't really think "no data" is an exception, it's just a specialized type of state.  That is, in the total replication case, the code is designed to catch this special "exception due to an inactive region" and go on to ask another peer (who may be active).  I see no reason to ask another peer for the state if one peer has an active region but no data. The requestor should just initialize an empty region.
> 2) The BR code should catch this exception.
> 3) Possibly, the BR code should send the exception to the new peer as part of the state transfer data. That is, don't just swallow it, as the new peer may have old, stale, persistent data in its buddy backup tree; need to tell the peer to discard that data.
> I found this working with AS 5 EJB3 SFSB code, but it's not a critical issue for me due to the simple workaround of just making sure the root node for the region exists.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira