[jboss-jira] [JBoss JIRA] Updated: (JBAS-7186) SuspectExceptions during data gravitation lead to DataGravitationCleanup command not executing

Tue Mar 9 16:43:58 EST 2010

     [ https://jira.jboss.org/jira/browse/JBAS-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Stansberry updated JBAS-7186:
-----------------------------------

    Fix Version/s: JBossAS-6.0.0.M4
                       (was: JBossAS-6.0.0.M3)


> SuspectExceptions during data gravitation lead to DataGravitationCleanup command not executing
> ----------------------------------------------------------------------------------------------
>
>                 Key: JBAS-7186
>                 URL: https://jira.jboss.org/jira/browse/JBAS-7186
>             Project: JBoss Application Server
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Clustering
>    Affects Versions: JBossAS-5.1.0.GA
>            Reporter: Brian Stansberry
>            Assignee: Brian Stansberry
>             Fix For: JBossAS-6.0.0.M4
>
>
> JBC is different from JBC 1.4 in how it handles suspected nodes during data gravitation. JBC would ignore them; w/ JBC 3 they propagate.
> These can happen easily with a cluster under load and a node failing.  LB detects failure before view changes, node that has failing node as it backup starts gravitating, replication of the gravitated data to the (failed) backup throws a SuspectException.
> The clustering integration needs to handle this better. Right now gravitation attempts are wrapped in txs, so the SuspectException fails the tx commit. That's pretty non-recoverable unless we catch the commit failure and retry. A possibility is to not wrap the gravitation in a tx (not really needed except for FIELD) and use JBossCacheWrapper's get() retry logic to redo the gravitation.
> We already catch the exception and allow the request to continue if the data was actually retrieved.  Actually, that only works because we wrap w/ the tx; JBC wouldn't return from the gravitation read without the tx causing the replication write to wait for tx commit.  Hmm...
> Problem this causes now is 1) gravitated data doesn't replicate to buddy until a request changes it and causes a normal write 2) DataGravitationCleanupCommand is not issued, so stale data is left in the cache.  Some of the changes made for JBCACHE-1530 reduce the likelihood of that stale data being used; it's only used if a request fails over to the node where it's stored, leading to gravitation from (stale) local backup tree.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira