[infinispan-issues] [JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during node join rebalance

Takayoshi Kimura (JIRA) jira-events at lists.jboss.org
Tue Jul 23 11:21:26 EDT 2013


     [ https://issues.jboss.org/browse/ISPN-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takayoshi Kimura updated ISPN-3357:
-----------------------------------

    Description: 
Here is test scenario:

* DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
* HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total

After the test run, the numberOfEntries on each node are:

* node1: 20074
* node2: 19888
* node3: 20114
* node4: 18885

Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.

Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.

Current CH: owners(7c29bccb) are [node1, node2]
Pending CH: owners(7c29bccb) are [node1, node2, node4]
Balanced CH: owners(7c29bccb) are [node1, node4]

The events sequence is:

* hotrod -> node1
* node1 -> node2, node4
* node2 committed entry
* node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
* node1 committed entry
* node2 invalidates the entry because it's no longer an owner

Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.


  was:
Here is test scenario:

* DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
* HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entrties total

After the test run, the numberOfEntries on each node are:

* node1: 20074
* node2: 19888
* node3: 20114
* node4: 18885

Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.

Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.

Current CH: owners(7c29bccb) are [node1, node2]
Pending CH: owners(7c29bccb) are [node1, node2, node4]
Balanced CH: owners(7c29bccb) are [node1, node4]

The events sequence is:

* hotrod -> node1
* node1 -> node2, node4
* node2 committed entry
* node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
* node1 committed entry
* node2 invalidates the entry because it's no longer an owner

Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.



    
> Insufficient owners with putIfAbsent during node join rebalance
> ---------------------------------------------------------------
>
>                 Key: ISPN-3357
>                 URL: https://issues.jboss.org/browse/ISPN-3357
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Distributed Cache
>    Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
>            Reporter: Takayoshi Kimura
>            Assignee: Dan Berindei
>
> Here is test scenario:
> * DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 20074
> * node2: 19888
> * node3: 20114
> * node4: 18885
> Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.
> Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.
> Current CH: owners(7c29bccb) are [node1, node2]
> Pending CH: owners(7c29bccb) are [node1, node2, node4]
> Balanced CH: owners(7c29bccb) are [node1, node4]
> The events sequence is:
> * hotrod -> node1
> * node1 -> node2, node4
> * node2 committed entry
> * node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
> * node1 committed entry
> * node2 invalidates the entry because it's no longer an owner
> Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list