[infinispan-issues] [JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during node join rebalance

Monday, 29 July 2013

    [
https://issues.jboss.org/browse/ISPN-3357?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-3357:
------------------------------------

[~nekop] You didn't mention the transaction mode in the description, but from the logs
this looks like a non-tx cache.

[~mircea.markus] Right, a backup owners should not check the previous value for
conditional operations, only the primary owner should do it. The primary should also set
the ignorePreviousValue flag on the command before forwarding it to the backup owners.

State transfer-related forwarding could be done only from the primary owner as well for
non-tx caches. The forwarding would still be done from the StateTransferInterceptor, so
without holding the primary lock, and without guaranteeing consistency, but I don't
think we could lose updates either.

For tx caches, we already set the {{ignorePreviousValue}} flag before replicating the
command to the owners (via a PrepareCommand). But we reset it when we process the command
on the owners, and I don't think that's ok because the PrepareCommand could be
forwarded by StateTransferInterceptor from one of the owners, with {{ignorePreviousValue
== false}}.

...
 Insufficient owners with putIfAbsent during node join rebalance
 ---------------------------------------------------------------

                 Key: ISPN-3357
                 URL: https://issues.jboss.org/browse/ISPN-3357
             Project: Infinispan
          Issue Type: Bug
          Components: Distributed Cache
    Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
            Reporter: Takayoshi Kimura
            Assignee: Dan Berindei
            Priority: Critical
         Attachments: 7c29bccb.log

 Here is test scenario:
 * DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
 * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000
entries total
 After the test run, the numberOfEntries on each node are:
 * node1: 20074
 * node2: 19888
 * node3: 20114
 * node4: 18885
 Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries
should be there.
 Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.
 Current CH: owners(7c29bccb) are [node1, node2]
 Pending CH: owners(7c29bccb) are [node1, node2, node4]
 Balanced CH: owners(7c29bccb) are [node1, node4]
 The events sequence is:
 * hotrod -> node1
 * node1 -> node2, node4
 * node2 committed entry
 * node4 performed clustered get before write, got a value from node2 and will not commit
the entry because this node thinks it's not changed/created
 * node1 committed entry
 * node2 invalidates the entry because it's no longer an owner
 Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely
lost by further rebalances when node4 is donor for this segment. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during node join rebalance