[infinispan-issues] [JBoss JIRA] Commented: (ISPN-654) Inconsistent state during nodes leaving with async DIST cache

Fri Jan 7 10:29:17 EST 2011

    [ https://issues.jboss.org/browse/ISPN-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574131#comment-12574131 ] 

Erik Salter commented on ISPN-654:
----------------------------------

Test 2:  

The rehashing takes a long time, and can exhaust memory.

We use those steps to record the rehash used time and memory usage:

2. Ingest that "BigObject" 
3. Shutdown one node.  Record leave rehash used time and memory usage.
4. Restart the node.  Record join rehash used time and memory usage.

For 60K objects, a LEAVE rehash uses about 200s, JOIN rehash uses about 100s.
For 90K objects, a LEAVE rehash uses about 270s, JOIN rehash uses about 320s.

With the 90K objects case, we're seeing OOMs on a JOIN rehash:

07:48:55,993 INFO  [InfinispanCacheDeployer] Initialize cache cache_nojta_sync
07:48:56,005 INFO  [TransactionLoggerImpl] Starting transaction logging
07:53:59,521 ERROR [JoinTask] Caught exception!
org.infinispan.CacheException: Remote (iptv-srm-26284) failed unexpectedly
	at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:74)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:414)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)
	at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:113)
	at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)
	at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.infinispan.io.ExposedByteArrayOutputStream.write(ExposedByteArrayOutputStream.java:90)
	at org.jboss.marshalling.Marshalling$6.write(Marshalling.java:378)
	at org.jboss.marshalling.UTFUtils.writeUTFBytes(UTFUtils.java:129)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:328)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1140)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1096)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:966)
	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)
	at org.infinispan.container.entries.ImmortalCacheValue$Externalizer.writeObject(ImmortalCacheValue.java:99)
	at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)
	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)
	at org.infinispan.marshall.MarshallUtil.marshallMap(MarshallUtil.java:59)
	at org.infinispan.marshall.exts.MapExternalizer.writeObject(MapExternalizer.java:61)
	at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)
	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)
	at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.writeObject(SuccessfulResponse.java:59)
	at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)
	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)
	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)
	at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectToObjectStream(GenericJBossMarshaller.java:98)
	at org.infinispan.marshall.VersionAwareMarshaller.objectToBuffer(VersionAwareMarshaller.java:93)
	at org.infinispan.marshall.AbstractMarshaller.objectToBuffer(AbstractMarshaller.java:31)
	at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectToBuffer(MarshallerAdapter.java:22)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:595)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)
	at org.jgroups.JChannel.up(JChannel.java:1465)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:430)
07:53:59,525 INFO  [TransactionLoggerImpl] Stopping transaction logging
07:53:59,525 INFO  [JoinTask] iptv-srm-3-39091 completed join rehash!

NOTE: In our environment, every time a node is restarted, it gets an available IP address, which may not be the one it previously had.  I don't know how this affects the hash code of the underlying JGroupAddress's UUID -- if at all.

> Inconsistent state during nodes leaving with async DIST cache
> -------------------------------------------------------------
>
>                 Key: ISPN-654
>                 URL: https://issues.jboss.org/browse/ISPN-654
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 4.1.0.Final
>            Reporter: Erik Salter
>            Assignee: Mircea Markus
>            Priority: Critical
>             Fix For: 4.2.0.CR1, 4.2.0.Final
>
>         Attachments: ispn-654.xml, ISPN.txt, PerformanceTest.zip
>
>
> We have a performance test that is putting 10K objects into a 4-node cluster, with each of the nodes on separate physical machines.  We are seeing loss of data after an insert and loss of a data node.
> The test directions and code are attached to this JIRA.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira