[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura commented on ISPN-3366:
----------------------------------------
Thanks Dan, I'll try the workaround.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 1 month
[JBoss JIRA] (ISPN-2903) Manual eviction should not delete entry from cache store
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2903?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2903:
-----------------------------------------------
Paul Gier <pgier(a)redhat.com> changed the Status of [bug 900549|https://bugzilla.redhat.com/show_bug.cgi?id=900549] from MODIFIED to ON_QA
> Manual eviction should not delete entry from cache store
> --------------------------------------------------------
>
> Key: ISPN-2903
> URL: https://issues.jboss.org/browse/ISPN-2903
> Project: Infinispan
> Issue Type: Bug
> Components: Eviction
> Affects Versions: 5.2.3.Final
> Reporter: Paul Ferraro
> Assignee: Galder Zamarreño
> Priority: Critical
> Labels: 5.2.x, jdg6
> Fix For: 5.2.5.Final, 5.3.0.Alpha1, 5.3.0.Final
>
> Attachments: AtomicMapServlet.java, AtomicMapTestCase.java, server.log, server.log
>
>
> Here's the scenario:
> Given 2 nodes with REPL_SYNC cache with passivating cache store (e.g. default web cache in AS7).
> 1. Create cache entry containing atomic map with 2 map entries on node1.
> 2. Passivate that cache entry on node2 via manual evict.
> 3. Modify 1 of the atomic map entries within the cache entry on node1.
> 4. Lookup atomic map on node2. It only contains 1 map entry - the map entry modified in step 3. The other map entry is lost.
> It's a side effect of ISPN-2384, where some changes were made to tighten the passivation/activation scenarios, but it did not cover manual eviction calls.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 1 month
[JBoss JIRA] (ISPN-2903) Manual eviction should not delete entry from cache store
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2903?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2903:
-----------------------------------------------
Aleksandar Kostadinov <akostadi(a)redhat.com> made a comment on [bug 900549|https://bugzilla.redhat.com/show_bug.cgi?id=900549]
I just ran a cluster failover test with EAP 6.1.1 ER3 in EC2 and hit the exceptions again. Attaching server log.
> Manual eviction should not delete entry from cache store
> --------------------------------------------------------
>
> Key: ISPN-2903
> URL: https://issues.jboss.org/browse/ISPN-2903
> Project: Infinispan
> Issue Type: Bug
> Components: Eviction
> Affects Versions: 5.2.3.Final
> Reporter: Paul Ferraro
> Assignee: Galder Zamarreño
> Priority: Critical
> Labels: 5.2.x, jdg6
> Fix For: 5.2.5.Final, 5.3.0.Alpha1, 5.3.0.Final
>
> Attachments: AtomicMapServlet.java, AtomicMapTestCase.java, server.log, server.log
>
>
> Here's the scenario:
> Given 2 nodes with REPL_SYNC cache with passivating cache store (e.g. default web cache in AS7).
> 1. Create cache entry containing atomic map with 2 map entries on node1.
> 2. Passivate that cache entry on node2 via manual evict.
> 3. Modify 1 of the atomic map entries within the cache entry on node1.
> 4. Lookup atomic map on node2. It only contains 1 map entry - the map entry modified in step 3. The other map entry is lost.
> It's a side effect of ISPN-2384, where some changes were made to tighten the passivation/activation scenarios, but it did not cover manual eviction calls.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 1 month
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-3366:
------------------------------------
A possible workaround would be to set {{transport.strictPeerToPeer = true}} in the global configuration, which will cause node4 to return a {{NamedCacheNotFoundException}} instead of {{null}} during shutdown. The HotRod client should then also receive an exception.
But this would only work if the cluster is symmetric - i.e. all the nodes have exactly the same caches running. Otherwise, for replicated caches, any operation would result in a {{NamedCacheNotFoundException}}.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 1 month
[JBoss JIRA] (ISPN-3386) Cancel Map/Reduce tasks before they cause OOM errors
by Alan Field (JIRA)
Alan Field created ISPN-3386:
--------------------------------
Summary: Cancel Map/Reduce tasks before they cause OOM errors
Key: ISPN-3386
URL: https://issues.jboss.org/browse/ISPN-3386
Project: Infinispan
Issue Type: Enhancement
Components: Distributed Execution and Map/Reduce
Affects Versions: 5.3.0.Final
Reporter: Alan Field
Assignee: Mircea Markus
Currently, a MapReduceTask that generates an OutOfMemory error will break a clustered cache. The node where the OOM occurs will generate SuspectExceptions on the other nodes in the cluster.
In order to avoid this situation, Infinispan should monitor free memory and cancel the MapReduceTask if free memory is below a user definable threshold. [~pruivo] has also suggested allowing the API user to set the priority of tasks, so that the Map Reduce Manager can cancel lower priority tasks first. This functional could also be included with ISPN-2991.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 1 month