[infinispan-issues] [JBoss JIRA] (ISPN-9351) nodes did not join to cluster because of timeoutException
Dan Berindei (Jira)
issues at jboss.org
Thu Apr 2 05:14:11 EDT 2020
[ https://issues.redhat.com/browse/ISPN-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei resolved ISPN-9351.
--------------------------------
Resolution: Explained
[~staho] sorry we couldn't be more helpful at the time. I checked the configuration file and I noticed that you have {{UDP.max_bundle_size="64K"}} and you don't set a bundler, so you're using the default {{transfer-queue}} bundler. {{FRAG3.frag_size="1200"}} doesn't help, because messages are fragmented in {{FRAG3}} and then they're batched back together in the bundler.
In general we recommend using the default JGroups configuration files shipped with Infinispan and modifying them as little as possible. In our {{default-jgroups-udp.xml}}, we have {{UDP.max_bundle_size="8500"}}, {{UDP.bundler="no-bundler"}}, and {{FRAG3.frag_size="8000"}}. We set at those defaults exactly because some networks have problems with UDP packets bigger than 9000 bytes (the size of a jumbo Ethernet frame).
The logs seem to confirm that the cluster status responses, which are big messages, are sent by all 14 nodes but are not received by the coordinator.
{noformat}
2018-01-15 06:48:45,072 DEBUG [transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19] (DelegatingBasicLogger.java:384) - Recovering cluster status for view 20
2018-01-15 06:48:45,101 DEBUG [transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,126 DEBUG [remote-thread-0105f36a-2860-4107-810a-d087224c9105-p2-t22] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,127 DEBUG [remote-thread-0035f36a-2860-4107-810a-d087224c9105-p2-t17] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,128 DEBUG [remote-thread-0055f36a-2860-4107-810a-d087224c9105-p2-t19] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,128 DEBUG [remote-thread-0155f36a-2860-4107-810a-d087224c9105-p2-t21] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0025f36a-2860-4107-810a-d087224c9105-p2-t32] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0085f36a-2860-4107-810a-d087224c9105-p2-t29] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0135f36a-2860-4107-810a-d087224c9105-p2-t16] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0145f36a-2860-4107-810a-d087224c9105-p2-t17] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,130 DEBUG [remote-thread-0095f36a-2860-4107-810a-d087224c9105-p2-t10] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,130 DEBUG [remote-thread-0125f36a-2860-4107-810a-d087224c9105-p2-t15] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0065f36a-2860-4107-810a-d087224c9105-p2-t10] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0075f36a-2860-4107-810a-d087224c9105-p2-t2] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0115f36a-2860-4107-810a-d087224c9105-p2-t25] (DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:49:33,107 DEBUG [transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19] (ClusterTopologyManagerImpl.java:456) - Timed out waiting for cluster status responses, trying again
{noformat}
> nodes did not join to cluster because of timeoutException
> ---------------------------------------------------------
>
> Key: ISPN-9351
> URL: https://issues.redhat.com/browse/ISPN-9351
> Project: Infinispan
> Issue Type: Bug
> Reporter: Robert Cernak
> Priority: Major
> Attachments: 15nodesTryingToJoin1ClusterHowever5DidNotJoin .zip, Rebooting14ControllrsOf15InCloud.zip
>
>
> I was trying to connect 15nodes to 1 cluster, however nodes did not join.
> In logs, all nodes mostly had 2 kinds of exceptions, all caused by TimeoutException:
> 1:
> {noformat}
> 2018-07-03 07:35:31,670 DEBUG [Camel (camel-1) thread #0 - seda://systemInitializer] (LocalTopologyManagerImpl.java:169) - Error sending join request for cache org.infinispan.CONFIG to coordinator
> org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 2 from 0125f36a-2860-4107-810a-d087224c9105-21637
> following by next exception 1 second after...
> 2018-07-03 07:35:31,968 INFO [Camel (camel-1) thread #0 - seda://systemInitializer] (JGroupsTransport.java:702) - ISPN000080: Disconnecting JGroups channel cloud11-15
> 2018-07-03 07:35:32,262 DEBUG [Camel (camel-1) thread #0 - seda://systemInitializer] (DefaultCacheManager.java:709) - Stopping cache manager cloud11-15 on null
> 2018-07-03 07:35:32,273 WARN [Camel (camel-1) thread #0 - seda://systemInitializer] (DefaultCacheManager.java:736) - ISPN000189: While stopping a cache or cache manager, one of its components failed to stop
> java.util.concurrent.CompletionException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
> at java.util.concurrent.CompletableFuture.reportJoin(Unknown Source) ~[?:1.8.0_131]
> at java.util.concurrent.CompletableFuture.join(Unknown Source) ~[?:1.8.0_131]
> at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688) ~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 2 from 0125f36a-2860-4107-810a-d087224c9105-21637
> {noformat}
> 2:
> {noformat}
> 2018-07-03 09:12:55,788 WARN [Camel (camel-1) thread #2 - seda://northboundProvider] (TransactionImpl.java:429) - ISPN000927: exception while committing
> javax.transaction.xa.XAException: null
> at org.infinispan.transaction.impl.TransactionCoordinator.rollback(TransactionCoordinator.java:180) ~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.transaction.xa.XaTransactionTable.rollback(XaTransactionTable.java:137) ~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.transaction.xa.TransactionXaAdapter.rollback(TransactionXaAdapter.java:76) ~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:424) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.commons.tx.TransactionImpl.rollbackResources(TransactionImpl.java:477) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:332) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.commons.tx.TransactionImpl.rollback(TransactionImpl.java:132) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.commons.tx.TransactionManagerImpl.rollback(TransactionManagerImpl.java:80) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.CacheImpl.tryRollback(CacheImpl.java:1801) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.CacheImpl.executeCommandWithInjectedTx(CacheImpl.java:1731) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1707) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1370) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:655) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:544) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:674) [infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
> .....
> Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1043 from 0075f36a-2860-4107-810a-d087224c9105-32070
> {noformat}
> in attached zip including
> -infinispan logs from all nodes
> -cluster config
> -jgroups health status csv files from nodes(comma separated, time in csv is 2hours before time in logs)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the infinispan-issues
mailing list