Richard Janík created WFLY-5437:
-----------------------------------
Summary: Stale data after cluster wide rebalance with remote EJB invocations
Key: WFLY-5437
URL:
https://issues.jboss.org/browse/WFLY-5437
Project: WildFly
Issue Type: Bug
Components: Clustering
Reporter: Richard Janík
Assignee: Paul Ferraro
Hi,
we're occasionally getting stale data on remote EJB invocations (number counter
returns number 1 lower than expected, see example). This is usually preceded (~6 seconds
before that) by cluster wide rebalance after a node is brought back from dead.
- 2000 clients, stale data is uncommon
- requests from a single client are separated by a 4 second window.
An example of stale data:
{code}
2015/08/28 12:45:11:868 EDT [WARN ][Runner - 553] HOST
perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:
<org.jboss.smartfrog.loaddriver.RequestProcessingException: Response serial does
not match. Expected: 87, received: 86, runner: 553.>
{code}
Server side log excerpt about rebalance:
{code}
[JBossINF] [0m[0m12:45:02,780 INFO
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,ee,perf21)
ISPN000094: Received new cluster view for channel web: [perf21|7] (4) [perf21, perf20,
perf18, perf19]
[JBossINF] [0m[0m12:45:02,781 INFO
[org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,ee,perf21)
ISPN000094: Received new cluster view for channel ejb: [perf21|7] (4) [perf21, perf20,
perf18, perf19]
[JBossINF] [0m[0m12:45:03,660 INFO [org.infinispan.CLUSTER] (remote-thread--p3-t6)
ISPN000310: Starting cluster-wide rebalance for cache repl, topology CacheTopology{id=12,
rebalanceId=5, currentCH=ReplicatedConsistentHash{ns = 60, owners = (3)[perf21: 20,
perf20: 20, perf18: 20]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (4)[perf21:
15, perf20: 15, perf18: 15, perf19: 15]}, unionCH=null, actualMembers=[perf21, perf20,
perf18, perf19]}
[JBossINF] [0m[0m12:45:03,660 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t16)
ISPN000310: Starting cluster-wide rebalance for cache dist, topology CacheTopology{id=16,
rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20:
26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20,
perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21,
perf20, perf18, perf19]}
[JBossINF] [0m[0m12:45:03,663 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t19)
ISPN000310: Starting cluster-wide rebalance for cache
clusterbench-ee7.ear.clusterbench-ee7-web-default.war, topology CacheTopology{id=16,
rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20:
26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20,
perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21,
perf20, perf18, perf19]}
[JBossINF] [0m[0m12:45:03,664 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t18)
ISPN000310: Starting cluster-wide rebalance for cache
clusterbench-ee7.ear.clusterbench-ee7-web-passivating.war, topology CacheTopology{id=16,
rebalanceId=7, currentCH=DefaultConsistentHash{ns=80, owners = (3)[perf21: 27+26, perf20:
26+28, perf18: 27+26]}, pendingCH=DefaultConsistentHash{ns=80, owners = (4)[perf21: 20+20,
perf20: 20+20, perf18: 20+20, perf19: 20+20]}, unionCH=null, actualMembers=[perf21,
perf20, perf18, perf19]}
[JBossINF] [0m[0m12:45:03,759 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t18)
ISPN000336: Finished cluster-wide rebalance for cache
clusterbench-ee7.ear.clusterbench-ee7-web-passivating.war, topology id = 16
[JBossINF] [0m[0m12:45:03,820 INFO [org.infinispan.CLUSTER] (remote-thread--p3-t7)
ISPN000336: Finished cluster-wide rebalance for cache repl, topology id = 12
[JBossINF] [0m[0m12:45:03,832 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t18)
ISPN000336: Finished cluster-wide rebalance for cache dist, topology id = 16
[JBossINF] [0m[0m12:45:03,958 INFO [org.infinispan.CLUSTER] (remote-thread--p3-t3)
ISPN000310: Starting cluster-wide rebalance for cache
clusterbench-ee7.ear/clusterbench-ee7-ejb.jar, topology CacheTopology{id=12,
rebalanceId=5, currentCH=ReplicatedConsistentHash{ns = 60, owners = (3)[perf21: 20,
perf20: 20, perf18: 20]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (4)[perf21:
15, perf20: 15, perf18: 15, perf19: 15]}, unionCH=null, actualMembers=[perf21, perf20,
perf18, perf19]}
[JBossINF] [0m[0m12:45:04,760 INFO [org.infinispan.CLUSTER] (remote-thread--p4-t18)
ISPN000336: Finished cluster-wide rebalance for cache
clusterbench-ee7.ear.clusterbench-ee7-web-default.war, topology id = 16
[JBossINF] [0m[0m12:45:06,331 INFO [org.infinispan.CLUSTER] (remote-thread--p3-t9)
ISPN000336: Finished cluster-wide rebalance for cache
clusterbench-ee7.ear/clusterbench-ee7-ejb.jar, topology id = 12
{code}
And a link to our jobs if you're interested:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-...
This behavior has been observed with jvmkill and undeploy scenario, on REPL-SYNC,
REPL-ASYNC and DIST-SYNC caches.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)