[infinispan-dev] Jgroups - One or more nodes have left exception while querying(get, replaceWithVersion) on the cache

mohammedisaa.khan at subex.com mohammedisaa.khan at subex.com
Thu Dec 11 07:17:54 EST 2014


Hi,

We are using the Infinispan 6.0.2 Final with hotrod client in our
application. We have 3 nodes and are running test with about 30 million
entries in the cache and about 300 million requests being processed. 

During the Execution after a few hours, we get the following error - 

1)Failed to recover cluster state after the current node became the
coordinator
2)org.infinispan.remoting.transport.jgroups.SuspectException: One or more
nodes have left the cluster while replicating command PrepareCommand
3) Message Send failed due to time out
4)Suspect Messages -  although the nodes were active.

There were no crashes and all the nodes are active! But it seems like some
node appeared to leave the cluster(Deduced from error #2) and post that the
cluster misbehaves. Most requests return null for cache query although the
data is present in the nodes and the nodes are up and active. We have
written a debug script which individually queries the cache and the caches
respond, but when we run the hotrod client with all node Ip/ports. Only one
node seems to respond and other 2 nodes do not respond.

Could you tell me why errors 2,3 occur? Are these identified ? Have they
been fixed in 7.x?

This appears to break the system quite often. Kindly reach out with
solutions.

Regards,
Isaa




--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Jgroups-One-or-more-nodes-have-left-exception-while-querying-get-replaceWithVersion-on-the-cache-tp4030028.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.


More information about the infinispan-dev mailing list