[
https://jira.jboss.org/browse/ISPN-826?page=com.atlassian.jira.plugin.sys...
]
Galder Zamarreño commented on ISPN-826:
---------------------------------------
Right, the reason why the EOFException is not propagated back to the caller is cos if
RpcManagerImpl encounters any issue retrieving the state, it backs off and retries and
succeeds. In theory, when an EOFException is received, this should be propagated to the
caller immediately so that it's looked at cos so far, every time I've seen an
EOFException, there was some kind of bug hiding behind.
For example, already back in the 4.0.x days: ISPN-335 and ISPN-176
Even in the 4.2.x days, Sanne found one but could not replicate: ISPN-661.
Now, having debugged this further, I've found the cause of the exception, let's
explain:
// NodeI-38030 starts up
2010-12-09 10:26:11,510 10244 INFO
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-30-thread-1:) Cache
local address is NodeI-38030, physical addresses are [127.0.0.1:7900]
...
// Requests the state from NodeJ-2368
2010-12-09 10:26:11,517 10251 INFO [org.infinispan.remoting.rpc.RpcManagerImpl]
(pool-30-thread-1:) Trying to fetch state from NodeJ-2368
...
// NodeJ-2368 tries to generate state
2010-12-09 10:26:11,526 10260 TRACE
[org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Received request to
generate state for cache named '___defaultcache'. Attempting to generate state.
...
// NodeJ-2368 fails to generate the transaction log cos it doesn't own the lock and so
stops the marshaller
2010-12-09 10:26:21,532 20266 TRACE
[org.infinispan.remoting.transport.jgroups.JGroupsDistSync]
(STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Did not own lock!
2010-12-09 10:26:21,533 20267 TRACE [org.infinispan.marshall.jboss.JBossMarshaller]
(STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Stop
marshaller@7f0ab78a
2010-12-09 10:26:21,533 20267 ERROR
[org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeJ-2368:) Caught while responding
to state transfer request
org.infinispan.statetransfer.StateTransferException:
java.util.concurrent.TimeoutException: Could not obtain exclusive processing lock
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:175)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:119)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:586)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:691)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
at org.jgroups.JChannel.up(JChannel.java:1465)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:653)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:582)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.util.concurrent.TimeoutException: Could not obtain exclusive processing
lock
at
org.infinispan.remoting.transport.jgroups.JGroupsDistSync.acquireProcessingLock(JGroupsDistSync.java:71)
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:202)
at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:165)
... 12 more
// This means that when NodeI-38030 is going to try to apply the transaction log, the data
is not there and rightfully throws an EOFException
2010-12-09 10:26:21,535 20269 TRACE
[org.infinispan.statetransfer.StateTransferManagerImpl]
(Incoming-2,Infinispan-Cluster,NodeI-38030:) Integrating transaction log
2010-12-09 10:26:21,535 20269 TRACE
[org.infinispan.statetransfer.StateTransferManagerImpl]
(Incoming-2,Infinispan-Cluster,NodeI-38030:) Applying commit log
2010-12-09 10:26:21,535 20269 TRACE [org.infinispan.marshall.VersionAwareMarshaller]
(Incoming-2,Infinispan-Cluster,NodeI-38030:) Log exception reported
java.io.EOFException: Read past end of file
at org.jboss.marshalling.AbstractUnmarshaller.eofOnRead(AbstractUnmarshaller.java:184)
at
org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByteDirect(AbstractUnmarshaller.java:319)
at
org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByte(AbstractUnmarshaller.java:280)
at
org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
at
org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:175)
at
org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:184)
at
org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:228)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:250)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:320)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:102)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:603)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
at org.jgroups.JChannel.up(JChannel.java:1422)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
at org.jgroups.protocols.FC.up(FC.java:483)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FD.up(FD.java:266)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:292)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
at org.jgroups.protocols.TP.access$100(TP.java:56)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Bottom line: The EOFException in this case is justified due to a failure on the sender
side to acquire the lock to send the transaction log. The code, as is, attempts a retry
which is the right thing to do. I'll simply add some extra logging in the
JBossMarshaller side so that in the future these type of exceptions are easier to trace.
I'll also add an FAQ entry to explain that sometimes, an EOFException is justified
when reading state.
RemoteLockCleanupStressTest throws EOFException when reading state
------------------------------------------------------------------
Key: ISPN-826
URL:
https://jira.jboss.org/browse/ISPN-826
Project: Infinispan
Issue Type: Bug
Components: Marshalling, State transfer
Affects Versions: 4.1.0.Final, 4.2.0.CR3
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Priority: Blocker
Fix For: 4.2.0.CR4, 4.2.0.Final
Attachments: eof-ispn826-infinispan.log.zip
While doing some work to verify ISPN-244, I've spotted that
RemoteLockCleanupStressTest throws. Note that this is not related to the ISPN-244 cos the
issue is present in 4.2.x as well where ISPN-244 has no bearing:
2010-12-08 18:16:25,945 13951 ERROR
[org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-2,Infinispan-Cluster,NodeC-2057:) Caught while requesting or applying state
org.infinispan.statetransfer.StateTransferException: java.io.EOFException: Read past end
of file
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:331)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:102)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:598)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
at org.jgroups.JChannel.up(JChannel.java:1422)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
at org.jgroups.protocols.FC.up(FC.java:483)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FD.up(FD.java:266)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:292)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
at org.jgroups.protocols.TP.access$100(TP.java:56)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.EOFException: Read past end of file
at org.jboss.marshalling.AbstractUnmarshaller.eofOnRead(AbstractUnmarshaller.java:184)
at
org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByteDirect(AbstractUnmarshaller.java:319)
at
org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByte(AbstractUnmarshaller.java:280)
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
at
org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:162)
at
org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:184)
at
org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:228)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:250)
at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:320)
... 29 more
Now, in spite of this issue the test passes!!! (WTF?).
So, first things first, the test needs improving to send these issues back up. That
probably means changing to a Callable rather than Runnable, and will see which exceptions
we wanna swallow and which ones we wanna propagate. Then I'll get going to figure out
the cause.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira