]
Manik Surtani commented on ISPN-902:
------------------------------------
I'm going to close this as a duplicate of ISPN-493. It's related, however there
is valuable information on this specific JIRA, including the test.
Data consistency across rehashing
---------------------------------
Key: ISPN-902
URL:
https://issues.jboss.org/browse/ISPN-902
Project: Infinispan
Issue Type: Bug
Components: Distributed Cache, Transactions
Affects Versions: 4.2.0.Final
Reporter: Erik Salter
Assignee: Manik Surtani
Priority: Blocker
Fix For: 4.2.1.CR2, 4.2.1.Final, 5.0.0.ALPHA3
Attachments: cacheTest.zip
After much testing and analysis (and reopening and fixing ISPN-865), the final issue here
is that certain transactions throw an IllegalStateException in commit() - and this
cascades into a series of problems.
See
http://lists.jboss.org/pipermail/infinispan-dev/2011-January/007320.html for a more
detailed discussion.
Original request:
{quote}
There are two scenarios we're seeing on rehashing, both of which are critical.
1. On a node leaving a running cluster, we're seeing an inordinate amount of timeout
errors, such as the one below. The end result of this is that the cluster ends up losing
data.
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:417)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
at
org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:341)
at
org.infinispan.interceptors.DistributionInterceptor.realRemoteGet(DistributionInterceptor.java:143)
at
org.infinispan.interceptors.DistributionInterceptor.remoteGetAndStoreInL1(DistributionInterceptor.java:131)
06:07:44,097 WARN [GMS] cms-node-20192: merge leader did not get data from all
partition coordinators [cms-node-20192, mydht1-18445], merge is cancelled at
org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:59)
2. Joining a node into a running cluster causes transactional failures on the other
nodes. Most of the time, depending on the load, a node can take upwards of 8 minutes to
join.
I've attached a unit test that can reproduce these issues.
{quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: