[infinispan-dev] Reproduce repl sync cache locking issue

Bela Ban bban at redhat.com
Thu Jul 18 09:57:59 EDT 2013



On 7/18/13 3:08 PM, Ray Tsang wrote:
> This is pretty weird.  I'm pretty certain the issue occurs with
> storeAsBinary set to false as well.


This is an unrelated issue I raised. Actually, setting storeAsBinary to 
false raises hell... see my comment on 
https://issues.jboss.org/browse/JGRP-1659 !


> On the real cluster it always deadlocks with thread dump pointing to
> MFC as well.

Which real cluster ? I tried cluster02.mv.lab.eng.bos.redhat.com, and it 
worked like a charm... can you contact me on IRC #jgroups @ 
irc.freenode.net, so we can look into this interactively ?

> Bela, could I trouble you to increase the payload size to max of 3mb?
> E.g. Multiply randomSize * 60k


And reduce the number of inserts ? 100'000 entries generate ~ 470MB of 
data; 60 times more would blow up my heap


> Thanks,
>
> On Jul 18, 2013, at 8:49, Mircea Markus <mmarkus at redhat.com> wrote:
>
>>
>> Thanks for rising the two issues Bela, both are caused by the way storeAsBinary works. I think storeAsBinary is flawed as it currently is - I've sent an email on -dev to ask for feedback on that.
>>
>> RE: MFC I'm pretty sure there's a problem somewhere, as the system always deadlocks in that protocol on my machine. I'll try to narrow it down.
>>
>> On 18 Jul 2013, at 08:53, Bela Ban <bban at redhat.com> wrote:
>>
>>> I could not reproduce this, I ran the test *many times* with both
>>> Infinispan 5.3.0 (JGroups 3.3.1 and 3.3.3) and 5.2.4 (JGroups 3.2.7),
>>> both on my mac and linux box, and was *never able to reproduce the
>>> blocking*.
>>>
>>> To be honest, this is what I expected, as MFC has run in production for
>>> a few years now and I have yet to receive a report on it locking up...
>>>
>>> However, I did run into 2 Infinispan problems (ISPN 5.3.0 / JG 3.3.3),
>>> probably related:
>>>
>>> #1
>>> - Start nodes 1 and 2
>>> - Hit enter in node 1 to populate the cache, these modifications are
>>> replicated to node 2
>>> - 100'000 elements with a total of ca. 470MB of data are added; in a
>>> single node we use ca. 520MB of heap, which is fine considering there's
>>> some overhead
>>> - However, node 1 has 1.4 *GB* of data, and using jvisualvm we can see
>>> that we have *200'000* byte[] arrays instead of 100'000 !
>>> - Node 2 is fine, with ca. 520MB of heap used and 100'000 byte arrays
>>>
>>> #2
>>> - Start node 1
>>> - Populate node 1 with data, 100'000 elements with a total of ca. 520MB
>>> of heap
>>> - Start node 2
>>> - After the state transfer, node 2 has ca. 520MB of data, which is fine
>>> - However, node 1 has *1.4 GB of heap* !
>>> - We can see that node 1 holds *200'000* byte[] arrays instead of 100'000
>>>
>>>
>>> Mircea and I looked at this yesterday and a possible culprit could be
>>> MarshalledValue, but Mircea's looking into it. We believe the root cause
>>> for #1 and #2 is the same.
>>>
>>>
>>> On 7/17/13 12:46 PM, Mircea Markus wrote:
>>>> Thanks Ray!
>>>>
>>>> I think the issue is: https://issues.jboss.org/browse/JGRP-1659
>>>> Bela the test is attached to the JIRA.
>>>>
>>>> Ray, I think your harness can be pretty useful as a general purpose tool for reporting issues, I think it's worth cleaning it up a bit + doc and add it to the infinispan repo. Wdyt?
>>>>
>>>> On 16 Jul 2013, at 20:11, Ray Tsang <rtsang at redhat.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Please see attached test.
>>>>>
>>>>> It generates random payloads of different sizes according to a distribution.
>>>>> At client, the actual payload size ranges from 10k to 2mb.  However, this test only simulates btwn 10 bytes to 50k bytes - and locking still occurs.
>>>>>
>>>>> Do not run unit tests - those tests are for other things ;)  To run the actual test, do:
>>>>>
>>>>> mvn -e exec:exec -Dnode=n  // where n is the node number 1 to 4 etc.
>>>>>
>>>>> What I do is open 3 terminals/tabs/screens, whichever you prefer, each run:
>>>>> mvn -e exec:exec -Dnode=1
>>>>> mvn -e exec:exec -Dnode=2
>>>>> mvn -e exec:exec -Dnode=3
>>>>> ...
>>>>>
>>>>> It'll prompt you to press a key when ready.  When you confirm cluster has formed, press any key to continue on all the nodes.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> <replication-test.zip>
>>>>
>>>> Cheers,
>>>
>>> --
>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list