[infinispan-dev] Reproduce repl sync cache locking issue

Thu Jul 18 03:53:07 EDT 2013

I could not reproduce this, I ran the test *many times* with both 
Infinispan 5.3.0 (JGroups 3.3.1 and 3.3.3) and 5.2.4 (JGroups 3.2.7), 
both on my mac and linux box, and was *never able to reproduce the 
blocking*.

To be honest, this is what I expected, as MFC has run in production for 
a few years now and I have yet to receive a report on it locking up...

However, I did run into 2 Infinispan problems (ISPN 5.3.0 / JG 3.3.3), 
probably related:

#1
- Start nodes 1 and 2
- Hit enter in node 1 to populate the cache, these modifications are 
replicated to node 2
- 100'000 elements with a total of ca. 470MB of data are added; in a 
single node we use ca. 520MB of heap, which is fine considering there's 
some overhead
- However, node 1 has 1.4 *GB* of data, and using jvisualvm we can see 
that we have *200'000* byte[] arrays instead of 100'000 !
- Node 2 is fine, with ca. 520MB of heap used and 100'000 byte arrays

#2
- Start node 1
- Populate node 1 with data, 100'000 elements with a total of ca. 520MB 
of heap
- Start node 2
- After the state transfer, node 2 has ca. 520MB of data, which is fine
- However, node 1 has *1.4 GB of heap* !
- We can see that node 1 holds *200'000* byte[] arrays instead of 100'000

Mircea and I looked at this yesterday and a possible culprit could be 
MarshalledValue, but Mircea's looking into it. We believe the root cause 
for #1 and #2 is the same.

On 7/17/13 12:46 PM, Mircea Markus wrote:
> Thanks Ray!
>
> I think the issue is: https://issues.jboss.org/browse/JGRP-1659
> Bela the test is attached to the JIRA.
>
> Ray, I think your harness can be pretty useful as a general purpose tool for reporting issues, I think it's worth cleaning it up a bit + doc and add it to the infinispan repo. Wdyt?
>
> On 16 Jul 2013, at 20:11, Ray Tsang <rtsang at redhat.com> wrote:
>
>> Hi All,
>>
>> Please see attached test.
>>
>> It generates random payloads of different sizes according to a distribution.
>> At client, the actual payload size ranges from 10k to 2mb.  However, this test only simulates btwn 10 bytes to 50k bytes - and locking still occurs.
>>
>> Do not run unit tests - those tests are for other things ;)  To run the actual test, do:
>>
>> mvn -e exec:exec -Dnode=n  // where n is the node number 1 to 4 etc.
>>
>> What I do is open 3 terminals/tabs/screens, whichever you prefer, each run:
>> mvn -e exec:exec -Dnode=1
>> mvn -e exec:exec -Dnode=2
>> mvn -e exec:exec -Dnode=3
>> ...
>>
>> It'll prompt you to press a key when ready.  When you confirm cluster has formed, press any key to continue on all the nodes.
>>
>> Thanks,
>>
>> <replication-test.zip>
>
> Cheers,
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)