We may have a deadlock because we hold the processing lock (for
reading) while invoking commands remotely (in JGroupsTransport). The
remote commands might block waiting for the remote cache to start, and
the remote cache won't start because it is waiting for this cache to
acquire the processing log (for writing) and send the state.
Do we really need to hold the processing lock while invoking remote commands?
Yes, since this processing lock is what holds up more commands from being handled when a
rehash is in progress or state is being generated. Otherwise a node's in-memory state
becomes a moving target and generating state for sending a neighbour node becomes an
issue.
Some of this is duplicated by the TransactionLogger we use in DIST, so maybe for DIST this
is no longer necessary, but it does need to be considered carefully before removing.
Cheers
Manik
Dan
On Fri, Jun 10, 2011 at 12:42 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
> 2011/6/10 Galder ZamarreƱo <galder(a)redhat.com>:
>>
>> On Jun 9, 2011, at 5:47 PM, Manik Surtani wrote:
>>
>>> +1 to writing the error marker to the stream. At least prevent false alarms.
>>>
>>> Re: unit testing our externalizers, Galder, any thoughts there?
>>
>> The debugging done by Sanne/Dan seems to be correct.
>>
>> EOFException is simply saying: "hey, i'm expecting all these bytes but
the stream finished before I could read them all"
>>
>> This generally means that the side generating the stream encountered an issue,
and that's precisely what happens on the generation side.
>>
>> The receiver side cannot do much here other than say: "hey, i don't have
all the bytes" - and that's precisely what the EOFException is doing.
>>
>> I think the error marker could be complicated to implement (i.e. imagine
expecting to read a byte and instead getting an ERROR marker). What would be much easier
to do is for VersionAwareMarshaller or GenericJBossMarshaller to provide more hints about
what's wrong. So, they could hide the inner details of the EOFException and launder it
into something that's clearer to the user:
>>
>> "The stream ended unexpectedly, please check for any errors where the stream
was generated"
>>
>> The right exception here is still an EOFException.
>>
>> That's all the receiver side can do.
>
> Agreed that's reasonable on the receiver side, and of course we can't
> control the network so while we can't always prevent it, can we still
> try to not send incomplete streams from the sending side?
>
> Sanne
>
>>
>>>
>>> Sent from my mobile phone
>>>
>>> On 9 Jun 2011, at 16:24, Dan Berindei <dan.berindei(a)gmail.com> wrote:
>>>
>>>> I don't think it's an externalizer issue, as I also see some
>>>> exceptions on the node that generates state:
>>>>
>>>> 2011-06-09 18:16:18,250 ERROR
>>>> [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
>>>> (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902)
>>>> ISPN00095: Caught while responding to state transfer request
>>>> org.infinispan.statetransfer.StateTransferException:
>>>> java.util.concurrent.TimeoutException:
>>>> STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902 could
>>>> not obtain exclusive processing lock after 10 seconds. Locks in
>>>> question are
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock@a35c90[Read
>>>> locks = 1] and
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock@111fb7f[Unlocked]
>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:177)
>>>> at
org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248)
>>>> at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:585)
>>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690)
>>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)
>>>> at org.jgroups.JChannel.up(JChannel.java:1484)
>>>> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074)
>>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651)
>>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580)
>>>> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:636)
>>>> Caused by: java.util.concurrent.TimeoutException:
>>>> STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902 could
>>>> not obtain exclusive processing lock after 10 seconds. Locks in
>>>> question are
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock@a35c90[Read
>>>> locks = 1] and
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock@111fb7f[Unlocked]
>>>> at
org.infinispan.remoting.transport.jgroups.JGroupsDistSync.acquireProcessingLock(JGroupsDistSync.java:100)
>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:204)
>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:167)
>>>> ... 11 more
>>>>
>>>> I guess we could write an error marker in the stream to prevent the
>>>> EOFException on the receiving side, but the end result would be the
>>>> same.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Thu, Jun 9, 2011 at 5:58 PM, Sanne Grinovero
<sanne(a)infinispan.org> wrote:
>>>>> Hello all,
>>>>> if I happen to look at the console while the tests are running, I
see
>>>>> this exception popup very often:
>>>>>
>>>>> 2011-06-09 15:32:18,092 ERROR [JGroupsTransport]
>>>>> (Incoming-1,Infinispan-Cluster,NodeB-32230) ISPN00096: Caught while
>>>>> requesting or applying state
>>>>> org.infinispan.statetransfer.StateTransferException:
>>>>> java.io.EOFException: Read past end of file
>>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:333)
>>>>> at
org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:230)
>>>>> at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:602)
>>>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:711)
>>>>> at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)
>>>>> at org.jgroups.JChannel.up(JChannel.java:1441)
>>>>> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074)
>>>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:523)
>>>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)
>>>>> at
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)
>>>>> at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)
>>>>> at org.jgroups.protocols.FC.up(FC.java:479)
>>>>> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
>>>>> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
>>>>> at
org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)
>>>>> at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)
>>>>> at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
>>>>> at
org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133)
>>>>> at org.jgroups.protocols.FD.up(FD.java:275)
>>>>> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275)
>>>>> at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
>>>>> at org.jgroups.protocols.Discovery.up(Discovery.java:291)
>>>>> at org.jgroups.protocols.TP.passMessageUp(TP.java:1102)
>>>>> at
org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658)
>>>>> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640)
>>>>> at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>> Caused by: java.io.EOFException: Read past end of file
>>>>> at
org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
>>>>> at
org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
>>>>> at
org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
>>>>> at
org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
>>>>> at
org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
>>>>> at
org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:192)
>>>>> at
org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:190)
>>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:230)
>>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:252)
>>>>> at
org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:322)
>>>>> ... 27 more
>>>>>
>>>>> But I'm not sure if it's an issue, as it seems tests are not
failing.
>>>>> I consider a "Read past end of file" quite suspiciously
looking; would
>>>>> it be possible to think that some internal Externalizer is writing
>>>>> less bytes than what it's attempting to read?
>>>>> Is there something clever I could do to understand which object the
>>>>> marshaller is trying to read when something like this is happening?
>>>>> I've found debugging this quite hard.
>>>>>
>>>>> Also, it doesn't look like our externalizers have a good test
>>>>> coverage; They are likely implicitly tested as I assume that nothing
>>>>> would work if they aren't, but still it looks like we have no
explicit
>>>>> tests for them?
>>>>>
>>>>> Cheers,
>>>>> Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Galder ZamarreƱo
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev