[infinispan-issues] [JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Tue Aug 6 13:47:26 EDT 2013
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795440#comment-12795440 ]
Dan Berindei commented on ISPN-3366:
------------------------------------
Checking the topology id before invoking the command on the primary owner wouldn't be enough: the primary owner may have already installed a new topology in which it is no longer the primary owner, and it may not forward the command to the other owners (or it may not do anything, if it's no longer an owner). This could very well be the cause of issue ISPN-3357...
I had another idea, to let the command execute and then check in StateTransferInterceptor whether the topology id changed. If so, we would retry the command on the new primary owner. The problem is that the check yields too many false positives, and if we retry a put command after it was successfully executed then we won't get the correct return value. This is even more important for conditional commands.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.8.Final, 6.0.0.Alpha3, 6.0.0.Final
>
> Attachments: ISPN-3366-full-logs-3rd.zip, ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list