[infinispan-dev] Alert from a failing test

Dan Berindei dan.berindei at gmail.com
Tue Jun 28 12:58:58 EDT 2011


On Tue, Jun 28, 2011 at 2:05 PM, Galder Zamarreño <galder at redhat.com> wrote:
> Some comments below:
>
> On Jun 21, 2011, at 10:26 AM, Dan Berindei wrote:
>
>> On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> 2011/6/20 Manik Surtani <manik at jboss.org>:
>>>> Oddly enough, I don't see any other tests exhibiting this behaviour.  Let me know if you see it in more recent CI runs, and we'll investigate in detail.
>>>
>>> In fact there are not many tests in core which verify a full stream is
>>> received; but as in another thread I mentioned I was seeing the
>>> following exception relatively often (it never caught my attention
>>> some months ago)
>>>
>>> Caused by: java.io.EOFException: The stream ended unexpectedly.
>>> Please check whether the source of the stream encountered any issues
>>> generating the stream.
>>>        at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193)
>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218)
>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245)
>>>        at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284)
>>>        ... 27 more
>>> Caused by: java.io.EOFException: Read past end of file
>>>        at org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126)
>>>        at org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263)
>>>        at org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224)
>>>        at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
>>>        at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
>>>        at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191)
>>>        at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191)
>>>        ... 30 more
>>>
>>
>> The line "at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)"
>> suggests EOF has been reached while reading the lead byte of the
>> object, not a partial object. This is consistent with
>> StateTransferManagerImpl.generateTransactionLog() getting a timeout
>> while trying to acquire the processing lock (at
>> StateTransferManagerImpl.java:192) and closing the stream in the
>> middle of the transaction log, not a transmission error. We could
>> probably get rid of the exception in the logs by inserting another
>> delimiter here.
>
> You mean inserting a particular delimiter in case of a timeout acquiring the processing lock? That sounds like a good idea. Just back from holidays and can't remember well, but where previous EOFs related to processing lock acquisition? If so, I think your idea makes even more sense cos it'd fall into a possible known problem and could provide the receiver side with that bit more of information.
>

I'm pretty sure all the EOFs we got are related to processing lock
acquisition, yes.

>>
>> Back to the original problem, if this was a stream corruption issue
>> I'd expect a lot more instances of deserializing errors because the
>> length of the buffer was smaller/larger than the number of bytes
>> following it and the next object to be deserialized from the stream
>> found garbage instead.
>>
>> This looks to me more like an index segment has been created with size
>> x on node A and also on node B, then it was updated with size y > x on
>> node A but only the metadata got to node B, the segment's byte array
>> remained the same.
>>
>> I don't know anything about the Lucene directory implementation yet,
>> so I have no idea if/how this could happen and I haven't been able to
>> reproduce it on my machine. Is there a way to see the Jenkins test
>> logs?
>
> There's the console log in http://goo.gl/JFh5R but if this happens relatively often in the Lucene dir impl, we could create a small jenkins run and pass a -Dlog4j.configuration configuring TRACE to be printed on console. The testsuite is small and would not generate a lot logging. There's always a change of not encountering the issue when TRACE is enabled, particularly if it's a race condition, but I think it's worth doing. IOW, I can set it up.
>

It would be great if you could set it up. It's not happening on every
run though, so is it possible to configure jenkins to repeat it until
it gets a failure?

>>
>> Dan
>>
>>
>>> This looks like a suspicious correlation to me, as I think the
>>> reported errors are similar in nature.
>>>
>>> Cheers,
>>> Sanne
>>>
>>>
>>>
>>>>
>>>> On 18 Jun 2011, at 20:18, Sanne Grinovero wrote:
>>>>
>>>>> Hello all,
>>>>> I'm not in state to fully debug the issue this week, but even though
>>>>> this failure happens in the Lucene Directory it looks like it's
>>>>> reporting an issue with Infinispan core:
>>>>>
>>>>> https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/
>>>>>
>>>>> In this test we're writing to the index, and then asserting on the
>>>>> expected state on both nodes, but while it is successful on the same
>>>>> node as the writer, it fails with
>>>>> "java.io.IOException: Read past EOF" on the second node.
>>>>>
>>>>> This exception can mean only one thing: the value, which is a
>>>>> buffer[], was not completely transferred to the second node, which
>>>>> seems quite critical as the caches are using sync.
>>>>> I can't reproduce the error locally, but it's not the first time it is
>>>>> reported by CI: builds 60, 62, 65 for example (and more) show the same
>>>>> testcase fail in the same manner.
>>>>>
>>>>> Cheers,
>>>>> Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> --
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> twitter.com/maniksurtani
>>>>
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>



More information about the infinispan-dev mailing list