[infinispan-dev] Jgroups - One or more nodes have left exception while querying(get, replaceWithVersion) on the cache

Dan Berindei dan.berindei at gmail.com
Thu Dec 11 09:20:02 EST 2014


Hi Isaa

We definitely recommend that you try upgrading to 7.0.2.Final, since
we don't support older versions.

That being said, the suspect exceptions and communication timeouts are
a sign of a flaky network, or more likely of excessive garbage
collections. Have you tried enabling GC logging to see how big the
pauses are?

7.0.x has some fixes in this area, e.g. it suspect exceptions are no
longer propagated to the application and instead the client retries
the operation. But it won't help much if the application is really
running out of memory.

Cheers
Dan


On Thu, Dec 11, 2014 at 2:17 PM, mohammedisaa.khan at subex.com
<mohammedisaa.khan at subex.com> wrote:
> Hi,
>
> We are using the Infinispan 6.0.2 Final with hotrod client in our
> application. We have 3 nodes and are running test with about 30 million
> entries in the cache and about 300 million requests being processed.
>
> During the Execution after a few hours, we get the following error -
>
> 1)Failed to recover cluster state after the current node became the
> coordinator
> 2)org.infinispan.remoting.transport.jgroups.SuspectException: One or more
> nodes have left the cluster while replicating command PrepareCommand
> 3) Message Send failed due to time out
> 4)Suspect Messages -  although the nodes were active.
>
> There were no crashes and all the nodes are active! But it seems like some
> node appeared to leave the cluster(Deduced from error #2) and post that the
> cluster misbehaves. Most requests return null for cache query although the
> data is present in the nodes and the nodes are up and active. We have
> written a debug script which individually queries the cache and the caches
> respond, but when we run the hotrod client with all node Ip/ports. Only one
> node seems to respond and other 2 nodes do not respond.
>
> Could you tell me why errors 2,3 occur? Are these identified ? Have they
> been fixed in 7.x?
>
> This appears to break the system quite often. Kindly reach out with
> solutions.
>
> Regards,
> Isaa
>
>
>
>
> --
> View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Jgroups-One-or-more-nodes-have-left-exception-while-querying-get-replaceWithVersion-on-the-cache-tp4030028.html
> Sent from the Infinispan Developer List mailing list archive at Nabble.com.
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list