[jboss-jira] [JBoss JIRA] (WFLY-5158) Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for node_name

Michal Karm Babacek (JIRA) issues at jboss.org
Thu Oct 15 08:58:00 EDT 2015


    [ https://issues.jboss.org/browse/WFLY-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118622#comment-13118622 ] 

Michal Karm Babacek commented on WFLY-5158:
-------------------------------------------

[~pferraro] The pull request doesn't fix it for arbitrary caches. I have my own cache, defined programatically as:
{code}
GlobalConfiguration glob = new GlobalConfigurationBuilder().clusteredDefault()
        .transport().addProperty("configurationFile", "jgroups-udp.xml")
        .clusterName("mytestcluster")
        .globalJmxStatistics().allowDuplicateDomains(true).enable()
        .build();
Configuration loc = new ConfigurationBuilder().jmxStatistics().enable()
        .clustering().cacheMode(CacheMode.DIST_ASYNC)
        .stateTransfer().awaitInitialTransfer(false)
        .timeout(6, TimeUnit.MINUTES)
        .expiration().lifespan(ENTRY_LIFESPAN_NEVER)
        .expiration().disableReaper()
        .indexing().index(Index.ALL)
        .eviction().strategy(EvictionStrategy.LRU)
        .maxEntries(MAX_ENTRIES)
        .transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL)
        .build();
{code}
and I keep hitting these:
{noformat}
23:17:17,771 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-50) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for corenode-1-5865
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:755)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$80(JGroupsTransport.java:589)
    at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46)
    at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{noformat}

Having replication instead of distribution doesn't change anything:
{code}
GlobalConfiguration glob = new GlobalConfigurationBuilder().clusteredDefault()
        .transport().addProperty("configurationFile", "jgroups-udp.xml")
        .distributedSyncTimeout(6, TimeUnit.MINUTES)
        .clusterName("mytestcluster")
        .globalJmxStatistics().allowDuplicateDomains(true).enable()
        .build(); 
Configuration loc = new ConfigurationBuilder().jmxStatistics().enable()
        .clustering().cacheMode(CacheMode.REPL_ASYNC)
        .async()
        .useReplQueue(true)
        .stateTransfer().awaitInitialTransfer(false)
        .timeout(6, TimeUnit.MINUTES)
        .expiration().lifespan(ENTRY_LIFESPAN_NEVER)
        .expiration().disableReaper()
        .indexing().index(Index.ALL)
        .eviction().strategy(EvictionStrategy.LRU)
        .maxEntries(MAX_ENTRIES)
        .transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL)
        .build();
{code}
results:
{noformat}
12:39:22,976 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-73) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for corenode-1-7623
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:755)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$80(JGroupsTransport.java:589)
    at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
    at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46)
    at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{noformat}

Wildfly build: [pferraro/wildfly/tree/web|https://github.com/pferraro/wildfly/tree/web] 

WDYT?

> Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for node_name
> ---------------------------------------------------------------------------------------------------
>
>                 Key: WFLY-5158
>                 URL: https://issues.jboss.org/browse/WFLY-5158
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 10.0.0.Beta1, 10.0.0.CR2
>            Reporter: Michal Vinkler
>            Assignee: Paul Ferraro
>
> Seen in ejb-ejbservlet and http-session scenarios intermittently (no matter what failover type or cache is used). 
> When node perf18 is restarted after failover other servers log this error several times: 
> {code}
> [JBossINF] [0m[31m16:11:43,595 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-107) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf18
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:752)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$6(JGroupsTransport.java:599)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport$$Lambda$34/238012590.apply(Unknown Source)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> [JBossINF] 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1954)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.RspListFuture.timeout(RspListFuture.java:40)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$$Lambda$32/2073718099.run(Unknown Source)
> [JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Server log:
> http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-ejb-ejbservlet-jvmkill-repl-async/4/console-perf19/
> In this particular test run, after perf18 restarted , perf19 logged the first error in 2 seconds, perf20 in 30 seconds, perf21 in 10 seconds.
> timeline:
> {code}
> perf18: [JBossINF] [0m[0m16:11:42,361 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: EAP 7.0.0.Alpha1 (WildFly Core 2.0.0.Beta1) started in 20244ms - Started 747 of 993 services (424 services are lazy, passive or on-demand)
> perf19: [JBossINF] [0m[31m16:11:43,595 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-107) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf18
> perf20: [JBossINF] [0m[31m16:12:12,836 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-51) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf18
> perf21: [JBossINF] [0m[31m16:11:52,826 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-22) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for perf18
> {code}
> This error also intermittently appears after server is shutdown.
> Total number of errors for this particular test run: 1183



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list