[JBoss JIRA] (WFLY-5822) Clustering performance regression in ejbremote-dist-sync scenario

Monday, 11 January 2016

    [
https://issues.jboss.org/browse/WFLY-5822?page=com.atlassian.jira.plugin....
] 

Richard Achmatowicz edited comment on WFLY-5822 at 1/11/16 8:24 PM:
--------------------------------------------------------------------

With regard to remote accesses of keys:

7.0.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the
method BaseDistributionInterceptor.retrieveFromRemoteSource
- output found for this rule in the job output for random cluster nodes (in other words,
on one run, perf18 has to remotely retrieve a key, on another run, perf21 has to retrieve
a remote key)
- because this is a stress test and failures are not triggered, unless there is an ISPN
rebalance for reasons other than failure, the node with the remote key will have to make
that extra invocation on every access to that key

6.4.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the
method BaseDistributionInterceptor.retrieveFromRemoteSource
- no output found for this rule in the job output for any cluster node

These "extra" remote accesses are triggered during the resetting of the
weakAffinity, which occurs after a successful invocation and before the invocation result
is returned to the client. The server sets the weak affinity associated with the session
id by calling MethodInvocationMessageHandler.getWeakAffinity(sessionID). This invocation
passes in turn through DistributableCache.getWeakAffinity(..),
InfinispanBeanManager.getWeakAffinity(...) until it looks up the primary owner of the
session via InfinispanBeanManager.locatePrimaryOwner(sessionID). This method returns a
Node object identifying the Node in the Group which should be used for weak affinity.
However, the Node needs to be translated into an Address / string representation. So the
code makes a further call to the Registry (a distributed cache used to hold
ClientMappings) to lookup the String name of the host. Unfortunately, there is no
guarantee that the cache entry in the Registry for a Node is actually resident on that
Node - and so in some cases, for some servers, a call to retrieveFromRemote has to be made
to get the cache entry from the other node(s).

NOTE: I'm not certain that this is the cause of the performance problem, but it seems
to be the reason why the extra remote call is made.  It is possible to hack the name
generation InfinispanBeanManager.getWeakAffinity() to avoid making the call to the
Registry and just pick up the name from the Node instance - this would confirm that these
extra calls to the Registry were the cause of the performance issue.     

was (Author: rachmato):
With regard to remote accesses of keys:

7.0.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the
method BaseDistributionInterceptor.retrieveFromRemoteSource
- output found for this rule in the job output for random cluster nodes (in other words,
on one run, perf18 has to remotely retrieve a key, on another run, perf21 has to retrieve
a remote key)
- because this is a stress test and failures are not triggered, unless there is an ISPN
rebalance for reasons other than failure, the node with the remote key will have to make
that extra invocation on every access to that key

6.4.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the
method BaseDistributionInterceptor.retrieveFromRemoteSource
- no output found for this rule in the job output for any cluster node

These "extra" remote accesses are triggered during the resetting of the
weakAffinity, after a successful invocation and before the invocation result is returned
to the client. The server will set the weak affinity of the session by calling
MethodInvocationMessageHandler.getWeakAffinity(sessionID). This invocation passes in turn
through DistributableCache.getWeakAffinity(..), InfinispanBeanManager.getWeakAffinity(...)
until it looks up the primary owner of the session via
InfinispanBeanManager.locatePrimaryOwner(sessionID) which returns a Node object
identifying the Node in the Group which should be used for weak affinity. 

However, the Node needs to be translated into an Address / string representation. So the
code makes a call to the Registry (a distributed cache) to lookup the String name of the
host. Unfortunately, there is no guarantee that the cache entry in the Registry for a Node
perf18 is resident on perf18 - and so a call to retrieveFromRemote has to be made to get
the cache entry from the other node(s).

NOTE: I'm not certain that this is the cause of the performance problem, but it seems
to be the reason why the extra remote call is made.  It is possible to hack the name
generation InfinispanBeanManager.getWeakAffinity() to avoid making the call to the
registry and just pick up the name from the Node instance - this would confirm that these
extra calls to the Registry were the cause of the performance issue.     

...
 Clustering performance regression in ejbremote-dist-sync scenario 
 ------------------------------------------------------------------

                 Key: WFLY-5822
                 URL: https://issues.jboss.org/browse/WFLY-5822
             Project: WildFly
          Issue Type: Bug
          Components: Clustering, EJB
    Affects Versions: 10.0.0.CR5
            Reporter: Michal Vinkler
            Assignee: Richard Achmatowicz
            Priority: Critical

 Compared to EAP 6, all SYNC scenarios have the same/better performance except of this
one, wonder why?
 Compare these results:
 stress-ejbremote-dist-sync
 7.0.0.ER2:
[throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-str...]
 6.4.0.GA:
[throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-str...]
 ---------------------------------------
 Just for comparison: ejbremote REPL_SYNC scenario *performs well* on the other hand:
 stress-ejbremote-repl-sync
 7.0.0.ER2:
[throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-str...]
 6.4.0.GA:
[throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-str...]

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006