[infinispan-issues] [JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown

Monday, 5 August 2013

    [
https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-3366:
------------------------------------

Looks like the remaining data loss has a slightly different cause: a node can become
primary owner while a command is executing. 

So we could have {{isPrimaryOwner(k) == false}} in EntryWrappingInterceptor, and
{{isPrimaryOwner(k) == true}} by the time we enter
NonTx[Concurrent]DistributionInterceptor:

{noformat}
11:23:12,876 TRACE [NonTransactionalLockingInterceptor] (HotRodServerWorker-9) Are
(node2/clustered(s1)) we the lock owners for key 'ByteArrayKey{data=ByteArray{size=16,
hashCode=4d12e1a9, array=0x033e0d74687265616431306b65793539}}'? false
11:23:12,876 TRACE [EntryWrappingInterceptor] (HotRodServerWorker-9) Wrapping entry
'ByteArrayKey{data=ByteArray{size=16, hashCode=4d12e1a9,
array=0x033e0d74687265616431306b65793539}}'? false
11:23:12,882 TRACE [NonTxDistributionInterceptor] (HotRodServerWorker-9) Not doing a
remote get for key ByteArrayKey{data=ByteArray{size=16, hashCode=4d12e1a9,
array=0x033e0d74687265616431306b65793539}} since entry is not affected by rehash or is
already in data container. We are node2/clustered(s1), owners are [node4/clustered(s1),
node2/clustered(s1)]
11:23:12,886 TRACE [NonTxConcurrentDistributionInterceptor] (HotRodServerWorker-9) I'm
the primary owner, sending the command to all ([node2/clustered(s1)]) the recipients in
order to be applied.
{noformat}

Because the key is not wrapped on the originator, it's not written to the data
container, but the command isn't looped back to the originator from the primary owner
either.

A possible solution would be to check the topology id of the command against the current
topology id before sending it, and throw an OutdatedTopologyException to cause
StateTransferInterceptor to retry the command.

...
 Data loss when entry forwarding to primary owner and primary owner
shutdown
 ---------------------------------------------------------------------------

                 Key: ISPN-3366
                 URL: https://issues.jboss.org/browse/ISPN-3366
             Project: Infinispan
          Issue Type: Bug
          Components: Distributed Cache
    Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
            Reporter: Takayoshi Kimura
            Assignee: Dan Berindei
            Priority: Critical
             Fix For: 5.2.8.Final, 6.0.0.Alpha3, 6.0.0.Final

         Attachments: ISPN-3366-full-logs-3rd.zip, ISPN-3366-logs.zip

   Looks like a problem in entry forwarding.
 Here is test scenario:
 * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
 * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000
entries total
 After the test run, the numberOfEntries on each node are:
 * node1: 26608
 * node2: 26622
 * node3: 26746
 * node4: 0
 Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It
means 1 entry is completely missing.
 Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
 Current CH: owners(574ff563) are [node4, node1]
 The events sequence is:
 * hotrod -> node1
 * node1 forwarding it to primary owner node4
 * node4 doesn't process the forwarded entry, shutdown
 Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown