[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-3366:
------------------------------------
[~nekop] You're right, searching for "Invoked with command .*0x033e0.746872656164" I got ~ 120k occurrences, so it's clearly the common prefix: the JBoss Marshalling info (033e), the length (0.), and then "thread" (746872656164).
I searched in the new logs for the first missing key: {{thread02key72}}, aka {{ByteArray\{size=16, hashCode=1b9657dd, array=0x033e0d74687265616430326b65793732\}}} and I could find it in node1's log, but not in node4's log. There is also a very similar bug for tx caches: ISPN-2510, so I think your analysis is spot on.
Ideally we should retry on the originator if the primary owner is no longer running the cache, so I'll try to implement that. If it proves too difficult, I'll only throw an exception in the first phase.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-3366:
------------------------------------------
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=989807, https://bugzilla.redhat.com/show_bug.cgi?id=989809 (was: https://bugzilla.redhat.com/show_bug.cgi?id=989807)
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months