]
Brian Stansberry updated JBAS-7186:
-----------------------------------
Fix Version/s: Unscheduled
(was: 6.0.0.M4)
SuspectExceptions during data gravitation lead to
DataGravitationCleanup command not executing
----------------------------------------------------------------------------------------------
Key: JBAS-7186
URL:
https://jira.jboss.org/browse/JBAS-7186
Project: JBoss Application Server
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Clustering
Affects Versions: JBossAS-5.1.0.GA
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Fix For: Unscheduled
JBC is different from JBC 1.4 in how it handles suspected nodes during data gravitation.
JBC would ignore them; w/ JBC 3 they propagate.
These can happen easily with a cluster under load and a node failing. LB detects failure
before view changes, node that has failing node as it backup starts gravitating,
replication of the gravitated data to the (failed) backup throws a SuspectException.
The clustering integration needs to handle this better. Right now gravitation attempts
are wrapped in txs, so the SuspectException fails the tx commit. That's pretty
non-recoverable unless we catch the commit failure and retry. A possibility is to not wrap
the gravitation in a tx (not really needed except for FIELD) and use
JBossCacheWrapper's get() retry logic to redo the gravitation.
We already catch the exception and allow the request to continue if the data was actually
retrieved. Actually, that only works because we wrap w/ the tx; JBC wouldn't return
from the gravitation read without the tx causing the replication write to wait for tx
commit. Hmm...
Problem this causes now is 1) gravitated data doesn't replicate to buddy until a
request changes it and causes a normal write 2) DataGravitationCleanupCommand is not
issued, so stale data is left in the cache. Some of the changes made for JBCACHE-1530
reduce the likelihood of that stale data being used; it's only used if a request fails
over to the node where it's stored, leading to gravitation from (stale) local backup
tree.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: