[
https://issues.jboss.org/browse/ISPN-2373?page=com.atlassian.jira.plugin....
]
Adrian Nistor commented on ISPN-2373:
-------------------------------------
After adding more logging to the code and extensive log analysis it seems that indeed
initial state transfer does not happen for about 4 minutes and this causes tests to fail.
This explains the InterruptedException thrown from
StateTransferManagerImpl.waitForInitialStateTransferToComplete(). But the root cause is
not because segments were not received or were not acknowledged as received as was
initially thought and described in the title. Instead I have found that rebalance does not
even start for the first 4 minutes! This happens because the rebalance message is sent by
a task submitted to the async executor service and during tests this pool happens to be
configured with max 4 threads. This small thread pool often leads to tasks being
discarded. Unfortunately in this case the exception thrown is not logged so the problem
was hidden until now. To fix this I added logging that highlights the issue and have
increased the pool to 6 threads. This allows the suite to always run successfully. Before
this change it was usually failing randomly with the exception after just 2-3 runs.
State transfer does not end because some segments are erroneously
reported as unreceived
----------------------------------------------------------------------------------------
Key: ISPN-2373
URL:
https://issues.jboss.org/browse/ISPN-2373
Project: Infinispan
Issue Type: Feature Request
Components: State transfer
Affects Versions: 5.2.0.Beta1
Reporter: Adrian Nistor
Assignee: Adrian Nistor
Priority: Critical
Fix For: 5.2.0.CR1
Hard to reproduce. I lost the last log where this was visible but still have a stack
trace:
org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:197)
at org.infinispan.CacheImpl.start(CacheImpl.java:517)
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:689)
at
org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
at
org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:126)
at
org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:574)
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
LuceneIndexesMetadata on PersistentStateTransferQueryDistributedIndexTest-NodeC-6067
at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:199)
at sun.reflect.GeneratedMethodAccessor139.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 10 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira