[infinispan-issues] [JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown

Dan Berindei (JIRA) jira-events at lists.jboss.org
Fri Aug 9 12:21:26 EDT 2013


    [ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796065#comment-12796065 ] 

Dan Berindei commented on ISPN-3366:
------------------------------------

The missing backup entry is probably related to my fix: if state transfer starts just before we issue a put command, and we send the entries to the new owner just before we commit the entry in the put command, that new owner won't have a backup copy of the key.

State transfer used to automatically forward commands to all the new owners if the topology changed during the execution of the command, but I tried to remove it because the forwarded commands were executed without holding a lock on the primary, so they could lead to inconsistencies. 

I'm still trying to come up with a better solution - perhaps sending the command to the backup owners only after the entries are committed on the primary owner will work. But in the meantime I'll just re-add the forwarding in the state transfer interceptor, as locking is not reliable in non-tx caches during state transfer anyway (two nodes could both think they are the primary owner for a key at the same time).

                
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
>                 Key: ISPN-3366
>                 URL: https://issues.jboss.org/browse/ISPN-3366
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Distributed Cache
>    Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
>            Reporter: Takayoshi Kimura
>            Assignee: Dan Berindei
>            Priority: Critical
>             Fix For: 5.2.8.Final, 6.0.0.Alpha3, 6.0.0.Final
>
>         Attachments: ISPN-3366-full-logs-3rd.zip, ISPN-3366-full-logs-4th.zip, ISPN-3366-logs.zip
>
>
>   Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list