[jboss-jira] [JBoss JIRA] (WFLY-5437) Stale data after cluster wide rebalance with remote EJB invocations

Fri Oct 9 03:00:00 EDT 2015

     [ https://issues.jboss.org/browse/WFLY-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Janík reopened WFLY-5437:
---------------------------------

{quote}
Stale data is always a possibility on failover when using an async cache mode.
To guarantee stale reads never happen, you need to use sync mode.
{quote}

Stale data shows even with SYNC caches, that's what the last line of description says. Though I agree I made the description confusing by mentioning ASYNC caches. Will fix.

> Stale data after cluster wide rebalance with remote EJB invocations
> -------------------------------------------------------------------
>
>                 Key: WFLY-5437
>                 URL: https://issues.jboss.org/browse/WFLY-5437
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>            Reporter: Richard Janík
>            Assignee: Paul Ferraro
>
> Hi,
> we're occasionally getting stale data on remote EJB invocations (number counter returns number 1 lower than expected, see example). This is usually preceded (~6 seconds before that) by cluster wide rebalance after a node is brought back from dead.
> - 2000 clients, stale data is uncommon
> - requests from a single client are separated by a 4 second window.
> An example of stale data:
> {code}
> 2015/08/28 12:45:11:868 EDT [WARN ][Runner - 553] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:  &lt;org.jboss.smartfrog.loaddriver.RequestProcessingException: Response serial does not match. Expected: 87, received: 86, runner: 553.>
> {code}
> Server side log excerpt about rebalance:
> {code}
> [JBossINF] [0m[0m12:45:02,780 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,ee,perf21) ISPN000094: Received new cluster view for channel web: [perf21|7] (4) [perf21, perf20, perf18, perf19]
> [JBossINF] [0m[0m12:45:02,781 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,ee,perf21) ISPN000094: Received new cluster view for channel ejb: [perf21|7] (4) [perf21, perf20, perf18, perf19]
> [JBossINF] [0m[0m12:45:03,660 INFO  [org.infinispan.CLUSTER] (remote-thread--p3-t6) ISPN000310: Starting cluster-wide rebalance for cache repl, topology CacheTopology{id=12, rebalanceId=5, currentCH=ReplicatedConsistentHash{ns = 60, owners = (3)[perf21: 20, perf20: 20, perf18: 20]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (4)[perf21: 15, perf20: 15, perf18: 15, perf19: 15]}, unionCH=null, actualMembers=[perf21, perf20, perf18, perf19]}
> [JBossINF] [0m[0m12:45:03,660 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t16) ISPN000310: Starting cluster-wide rebalance for cache dist, topology CacheTopology{id=16, rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20: 26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20, perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21, perf20, perf18, perf19]}
> [JBossINF] [0m[0m12:45:03,663 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t19) ISPN000310: Starting cluster-wide rebalance for cache clusterbench-ee7.ear.clusterbench-ee7-web-default.war, topology CacheTopology{id=16, rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20: 26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20, perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21, perf20, perf18, perf19]}
> [JBossINF] [0m[0m12:45:03,664 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t18) ISPN000310: Starting cluster-wide rebalance for cache clusterbench-ee7.ear.clusterbench-ee7-web-passivating.war, topology CacheTopology{id=16, rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20: 26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20, perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21, perf20, perf18, perf19]}
> [JBossINF] [0m[0m12:45:03,759 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t18) ISPN000336: Finished cluster-wide rebalance for cache clusterbench-ee7.ear.clusterbench-ee7-web-passivating.war, topology id = 16
> [JBossINF] [0m[0m12:45:03,820 INFO  [org.infinispan.CLUSTER] (remote-thread--p3-t7) ISPN000336: Finished cluster-wide rebalance for cache repl, topology id = 12
> [JBossINF] [0m[0m12:45:03,832 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t18) ISPN000336: Finished cluster-wide rebalance for cache dist, topology id = 16
> [JBossINF] [0m[0m12:45:03,958 INFO  [org.infinispan.CLUSTER] (remote-thread--p3-t3) ISPN000310: Starting cluster-wide rebalance for cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar, topology CacheTopology{id=12, rebalanceId=5, currentCH=ReplicatedConsistentHash{ns = 60, owners = (3)[perf21: 20, perf20: 20, perf18: 20]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (4)[perf21: 15, perf20: 15, perf18: 15, perf19: 15]}, unionCH=null, actualMembers=[perf21, perf20, perf18, perf19]}
> [JBossINF] [0m[0m12:45:04,760 INFO  [org.infinispan.CLUSTER] (remote-thread--p4-t18) ISPN000336: Finished cluster-wide rebalance for cache clusterbench-ee7.ear.clusterbench-ee7-web-default.war, topology id = 16
> [JBossINF] [0m[0m12:45:06,331 INFO  [org.infinispan.CLUSTER] (remote-thread--p3-t9) ISPN000336: Finished cluster-wide rebalance for cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar, topology id = 12
> {code}
> And a link to our jobs if you're interested:
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-ejbremote-jvmkill-repl-async/6
> This behavior has been observed with jvmkill and undeploy scenario, on REPL-SYNC, REPL-ASYNC and DIST-SYNC caches.

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)