[
https://issues.redhat.com/browse/ISPN-9351?page=com.atlassian.jira.plugin...
]
Dan Berindei resolved ISPN-9351.
--------------------------------
Resolution: Explained
[~staho] sorry we couldn't be more helpful at the time. I checked the configuration
file and I noticed that you have {{UDP.max_bundle_size="64K"}} and you don't
set a bundler, so you're using the default {{transfer-queue}} bundler.
{{FRAG3.frag_size="1200"}} doesn't help, because messages are fragmented in
{{FRAG3}} and then they're batched back together in the bundler.
In general we recommend using the default JGroups configuration files shipped with
Infinispan and modifying them as little as possible. In our {{default-jgroups-udp.xml}},
we have {{UDP.max_bundle_size="8500"}}, {{UDP.bundler="no-bundler"}},
and {{FRAG3.frag_size="8000"}}. We set at those defaults exactly because some
networks have problems with UDP packets bigger than 9000 bytes (the size of a jumbo
Ethernet frame).
The logs seem to confirm that the cluster status responses, which are big messages, are
sent by all 14 nodes but are not received by the coordinator.
{noformat}
2018-01-15 06:48:45,072 DEBUG
[transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19]
(DelegatingBasicLogger.java:384) - Recovering cluster status for view 20
2018-01-15 06:48:45,101 DEBUG
[transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,126 DEBUG [remote-thread-0105f36a-2860-4107-810a-d087224c9105-p2-t22]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,127 DEBUG [remote-thread-0035f36a-2860-4107-810a-d087224c9105-p2-t17]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,128 DEBUG [remote-thread-0055f36a-2860-4107-810a-d087224c9105-p2-t19]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,128 DEBUG [remote-thread-0155f36a-2860-4107-810a-d087224c9105-p2-t21]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0025f36a-2860-4107-810a-d087224c9105-p2-t32]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0085f36a-2860-4107-810a-d087224c9105-p2-t29]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0135f36a-2860-4107-810a-d087224c9105-p2-t16]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,129 DEBUG [remote-thread-0145f36a-2860-4107-810a-d087224c9105-p2-t17]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,130 DEBUG [remote-thread-0095f36a-2860-4107-810a-d087224c9105-p2-t10]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,130 DEBUG [remote-thread-0125f36a-2860-4107-810a-d087224c9105-p2-t15]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0065f36a-2860-4107-810a-d087224c9105-p2-t10]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0075f36a-2860-4107-810a-d087224c9105-p2-t2]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:48:45,131 DEBUG [remote-thread-0115f36a-2860-4107-810a-d087224c9105-p2-t25]
(DelegatingBasicLogger.java:384) - Sending cluster status response for view 20
2018-01-15 06:49:33,107 DEBUG
[transport-thread-0045f36a-2860-4107-810a-d087224c9105-p4-t19]
(ClusterTopologyManagerImpl.java:456) - Timed out waiting for cluster status responses,
trying again
{noformat}
nodes did not join to cluster because of timeoutException
---------------------------------------------------------
Key: ISPN-9351
URL:
https://issues.redhat.com/browse/ISPN-9351
Project: Infinispan
Issue Type: Bug
Reporter: Robert Cernak
Priority: Major
Attachments: 15nodesTryingToJoin1ClusterHowever5DidNotJoin .zip,
Rebooting14ControllrsOf15InCloud.zip
I was trying to connect 15nodes to 1 cluster, however nodes did not join.
In logs, all nodes mostly had 2 kinds of exceptions, all caused by TimeoutException:
1:
{noformat}
2018-07-03 07:35:31,670 DEBUG [Camel (camel-1) thread #0 - seda://systemInitializer]
(LocalTopologyManagerImpl.java:169) - Error sending join request for cache
org.infinispan.CONFIG to coordinator
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 2 from 0125f36a-2860-4107-810a-d087224c9105-21637
following by next exception 1 second after...
2018-07-03 07:35:31,968 INFO [Camel (camel-1) thread #0 - seda://systemInitializer]
(JGroupsTransport.java:702) - ISPN000080: Disconnecting JGroups channel cloud11-15
2018-07-03 07:35:32,262 DEBUG [Camel (camel-1) thread #0 - seda://systemInitializer]
(DefaultCacheManager.java:709) - Stopping cache manager cloud11-15 on null
2018-07-03 07:35:32,273 WARN [Camel (camel-1) thread #0 - seda://systemInitializer]
(DefaultCacheManager.java:736) - ISPN000189: While stopping a cache or cache manager, one
of its components failed to stop
java.util.concurrent.CompletionException: org.infinispan.commons.CacheException: Unable
to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start()
throws java.lang.Exception on object of type StateTransferManagerImpl
at java.util.concurrent.CompletableFuture.reportJoin(Unknown Source) ~[?:1.8.0_131]
at java.util.concurrent.CompletableFuture.join(Unknown Source) ~[?:1.8.0_131]
at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688)
~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out
waiting for responses for request 2 from 0125f36a-2860-4107-810a-d087224c9105-21637
{noformat}
2:
{noformat}
2018-07-03 09:12:55,788 WARN [Camel (camel-1) thread #2 - seda://northboundProvider]
(TransactionImpl.java:429) - ISPN000927: exception while committing
javax.transaction.xa.XAException: null
at
org.infinispan.transaction.impl.TransactionCoordinator.rollback(TransactionCoordinator.java:180)
~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at
org.infinispan.transaction.xa.XaTransactionTable.rollback(XaTransactionTable.java:137)
~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at
org.infinispan.transaction.xa.TransactionXaAdapter.rollback(TransactionXaAdapter.java:76)
~[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:424)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.commons.tx.TransactionImpl.rollbackResources(TransactionImpl.java:477)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:332)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.commons.tx.TransactionImpl.rollback(TransactionImpl.java:132)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at
org.infinispan.commons.tx.TransactionManagerImpl.rollback(TransactionManagerImpl.java:80)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.CacheImpl.tryRollback(CacheImpl.java:1801)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.CacheImpl.executeCommandWithInjectedTx(CacheImpl.java:1731)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at
org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1707)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1370)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:655)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:544)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at
org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:674)
[infinispan-embedded-9.3.0.Final.jar:9.3.0.Final]
.....
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out
waiting for responses for request 1043 from 0075f36a-2860-4107-810a-d087224c9105-32070
{noformat}
in attached zip including
-infinispan logs from all nodes
-cluster config
-jgroups health status csv files from nodes(comma separated, time in csv is 2hours before
time in logs)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)