[jboss-jira] [JBoss JIRA] (WFLY-5822) Clustering performance regression in ejbremote-dist-sync scenario

Mon Jan 11 20:17:00 EST 2016

    [ https://issues.jboss.org/browse/WFLY-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146710#comment-13146710 ] 

Richard Achmatowicz edited comment on WFLY-5822 at 1/11/16 8:16 PM:
--------------------------------------------------------------------

With regard to remote accesses of keys:

7.0.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the method BaseDistributionInterceptor.retrieveFromRemoteSource
- output found for this rule in the job output for random cluster nodes (in other words, on one run, perf18 has to remotely retrieve a key, on another run, perf21 has to retrieve a remote key)
- because this is a stress test and failures are not triggered, unless there is an ISPN rebalance for reasons other than failure, the node with the remote key will have to make that extra invocation on every access to that key

6.4.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the method BaseDistributionInterceptor.retrieveFromRemoteSource
- no output found for this rule in the job output for any cluster node

These "extra" remote accesses are triggered during the resetting of the weakAffinity, after a successful invocation and before the invocation result is returned to the client. The server will set the weak affinity of the session by calling MethodInvocationMessageHandler.getWeakAffinity(sessionID). This invocation passes in turn through DistributableCache.getWeakAffinity(..), InfinispanBeanManager.getWeakAffinity(...) until it looks up the primary owner of the session via InfinispanBeanManager.locatePrimaryOwner(sessionID) which returns a Node object identifying the Node in the Group which should be used for weak affinity. However, the Node needs to be translated into an Address / string representation. So the code makes a call to the Registry (a distributed cache) to lookup the String name of the host. Unfortunately, there is no guarantee that the cache entry in the Registry for a Node perf18 is resident on perf18 - and so a call to retrieveFromRemote has to be made to get the cache entry from the other node(s).

I'm not certain that this is the cause of the performance problem, but it seems to be the reason why the extra remote call is made.      

was (Author: rachmato):
With regard to remote accesses of keys:

7.0.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the method BaseDistributionInterceptor.retrieveFromRemoteSource
- output found for this rule in the job output for random cluster nodes (in other words, on one run, perf18 has to remotely retrieve a key, on another run, perf21 has to retrieve a remote key)
- because this is a stress test and failures are not triggered, unless there is an ISPN rebalance for reasons other than failure, the node with the remote key will have to make that extra invocation on every access to that key

6.4.0 jobs:
- run with the Byteman rule RetrieveFromRemoteSource which checks for invocations of the method BaseDistributionInterceptor.retrieveFromRemoteSource
- no output found for this rule in the job output for any cluster node

MOre investigation to follow.

> Clustering performance regression in ejbremote-dist-sync scenario 
> ------------------------------------------------------------------
>
>                 Key: WFLY-5822
>                 URL: https://issues.jboss.org/browse/WFLY-5822
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering, EJB
>    Affects Versions: 10.0.0.CR5
>            Reporter: Michal Vinkler
>            Assignee: Richard Achmatowicz
>            Priority: Critical
>
> Compared to EAP 6, all SYNC scenarios have the same/better performance except of this one, wonder why?
> Compare these results:
> stress-ejbremote-dist-sync
> 7.0.0.ER2: [throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-stress-ejbremote-dist-sync/4/artifact/report/graph-throughput.png]
> 6.4.0.GA: [throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-ejbremote-dist-sync_noperf21/1/artifact/report/graph-throughput.png]
> ---------------------------------------
> Just for comparison: ejbremote REPL_SYNC scenario *performs well* on the other hand:
> stress-ejbremote-repl-sync
> 7.0.0.ER2: [throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-stress-ejbremote-repl-sync/3/artifact/report/graph-throughput.png]
> 6.4.0.GA: [throughput|http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-ejbremote-repl-sync_noperf21/2/artifact/report/graph-throughput.png]

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)