[infinispan-issues] [JBoss JIRA] Commented: (ISPN-826) RemoteLockCleanupStressTest throws EOFException when reading state

Galder Zamarreño (JIRA) jira-events at lists.jboss.org
Thu Dec 9 05:02:31 EST 2010


    [ https://jira.jboss.org/browse/ISPN-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568894#comment-12568894 ] 

Galder Zamarreño commented on ISPN-826:
---------------------------------------

Right, the reason why the EOFException is not propagated back to the caller is cos if RpcManagerImpl encounters any issue retrieving the state, it backs off and retries and succeeds. In theory, when an EOFException is received, this should be propagated to the caller immediately so that it's looked at cos so far, every time I've seen an EOFException, there was some kind of bug hiding behind.

For example, already back in the 4.0.x days: ISPN-335 and ISPN-176

Even in the 4.2.x days, Sanne found one but could not replicate: ISPN-661.

Now, having debugged this further, I've found the cause of the exception, let's explain:

// NodeI-38030 starts up
2010-12-09 10:26:11,510 10244 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-30-thread-1:) Cache local address is NodeI-38030, physical addresses are [127.0.0.1:7900]
...
// Requests the state from NodeJ-2368
2010-12-09 10:26:11,517 10251 INFO  [org.infinispan.remoting.rpc.RpcManagerImpl] (pool-30-thread-1:) Trying to fetch state from NodeJ-2368
...
// NodeJ-2368 tries to generate state
2010-12-09 10:26:11,526 10260 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Received request to generate state for cache named '___defaultcache'.  Attempting to generate state.
...
// NodeJ-2368 fails to generate the transaction log cos it doesn't own the lock and so stops the marshaller
2010-12-09 10:26:21,532 20266 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsDistSync] (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Did not own lock!
2010-12-09 10:26:21,533 20267 TRACE [org.infinispan.marshall.jboss.JBossMarshaller] (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Stop marshaller at 7f0ab78a
2010-12-09 10:26:21,533 20267 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Caught while responding to state transfer request
org.infinispan.statetransfer.StateTransferException: java.util.concurrent.TimeoutException: Could not obtain exclusive processing lock
	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:175)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:119)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:586)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:691)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
	at org.jgroups.JChannel.up(JChannel.java:1465)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:653)
	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:582)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
Caused by: java.util.concurrent.TimeoutException: Could not obtain exclusive processing lock
	at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.acquireProcessingLock(JGroupsDistSync.java:71)
	at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:202)
	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:165)
	... 12 more

// This means that when NodeI-38030 is going to try to apply the transaction log, the data is not there and rightfully throws an EOFException
2010-12-09 10:26:21,535 20269 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (Incoming-2,Infinispan-Cluster,NodeI-38030:) Integrating transaction log
2010-12-09 10:26:21,535 20269 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (Incoming-2,Infinispan-Cluster,NodeI-38030:) Applying commit log
2010-12-09 10:26:21,535 20269 TRACE [org.infinispan.marshall.VersionAwareMarshaller] (Incoming-2,Infinispan-Cluster,NodeI-38030:) Log exception reported
java.io.EOFException: Read past end of file
	at org.jboss.marshalling.AbstractUnmarshaller.eofOnRead(AbstractUnmarshaller.java:184)
	at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByteDirect(AbstractUnmarshaller.java:319)
	at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByte(AbstractUnmarshaller.java:280)
	at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
	at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
	at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:175)
	at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:184)
	at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:228)
	at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:250)
	at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:320)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:102)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:603)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
	at org.jgroups.JChannel.up(JChannel.java:1422)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
	at org.jgroups.protocols.FC.up(FC.java:483)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
	at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
	at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
	at org.jgroups.protocols.FD.up(FD.java:266)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
	at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
	at org.jgroups.protocols.Discovery.up(Discovery.java:292)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
	at org.jgroups.protocols.TP.access$100(TP.java:56)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)

Bottom line: The EOFException in this case is justified due to a failure on the sender side to acquire the lock to send the transaction log. The code, as is, attempts a retry which is the right thing to do. I'll simply add some extra logging in the JBossMarshaller side so that in the future these type of exceptions are easier to trace.

I'll also add an FAQ entry to explain that sometimes, an EOFException is justified when reading state.

> RemoteLockCleanupStressTest throws EOFException when reading state
> ------------------------------------------------------------------
>
>                 Key: ISPN-826
>                 URL: https://jira.jboss.org/browse/ISPN-826
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Marshalling, State transfer
>    Affects Versions: 4.1.0.Final, 4.2.0.CR3
>            Reporter: Galder Zamarreño
>            Assignee: Galder Zamarreño
>            Priority: Blocker
>             Fix For: 4.2.0.CR4, 4.2.0.Final
>
>         Attachments: eof-ispn826-infinispan.log.zip
>
>
> While doing some work to verify ISPN-244, I've spotted that RemoteLockCleanupStressTest throws. Note that this is not related to the ISPN-244 cos the issue is present in 4.2.x as well where ISPN-244 has no bearing: 
> 2010-12-08 18:16:25,945 13951 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,Infinispan-Cluster,NodeC-2057:) Caught while requesting or applying state 
> org.infinispan.statetransfer.StateTransferException: java.io.EOFException: Read past end of file 
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:331) 
> at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:102) 
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:598) 
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712) 
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772) 
> at org.jgroups.JChannel.up(JChannel.java:1422) 
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954) 
> at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478) 
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525) 
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464) 
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225) 
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:190) 
> at org.jgroups.protocols.FC.up(FC.java:483) 
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888) 
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234) 
> at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615) 
> at org.jgroups.protocols.UNICAST.up(UNICAST.java:295) 
> at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707) 
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132) 
> at org.jgroups.protocols.FD.up(FD.java:266) 
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269) 
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:210) 
> at org.jgroups.protocols.Discovery.up(Discovery.java:292) 
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1093) 
> at org.jgroups.protocols.TP.access$100(TP.java:56) 
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633) 
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:680) 
> Caused by: java.io.EOFException: Read past end of file 
> at org.jboss.marshalling.AbstractUnmarshaller.eofOnRead(AbstractUnmarshaller.java:184) 
> at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByteDirect(AbstractUnmarshaller.java:319) 
> at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByte(AbstractUnmarshaller.java:280) 
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207) 
> at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85) 
> at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:162) 
> at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:184) 
> at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:228) 
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:250) 
> at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:320) 
> ... 29 more 
> Now, in spite of this issue the test passes!!! (WTF?). 
> So, first things first, the test needs improving to send these issues back up. That probably means changing to a Callable rather than Runnable, and will see which exceptions we wanna swallow and which ones we wanna propagate. Then I'll get going to figure out the cause.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the infinispan-issues mailing list