[JBoss JIRA] (ISPN-6341) StateTransferManager should be the first component to stop
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-6341?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-6341:
-----------------------------------------------
wfink(a)redhat.com changed the Status of [bug 1316132|https://bugzilla.redhat.com/show_bug.cgi?id=1316132] from ASSIGNED to MODIFIED
> StateTransferManager should be the first component to stop
> ----------------------------------------------------------
>
> Key: ISPN-6341
> URL: https://issues.jboss.org/browse/ISPN-6341
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.2.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.0.0.Alpha1, 8.2.1.Final
>
>
> When a cache stops, it first removes the component registry from the {{GlobalComponentsRegistry}}'s {{namedComponents}} map, which means the node (let's call it {{A}}) will reply with a {{CacheNotFoundResponse}} to any remote command.
> Another node {{B}} trying to execute a write/transactional command will receive the {{CacheNotFoundResponse}}, assume that a new cache topology with id {{current topology id + 1}} is coming soon, and wait for that new topology before retrying.
> Normally this is not a problem, because {{StateTransferManagerImpl.stop()}} sends a {{CacheTopologyControlCommand(LEAVE)}} to the coordinator quickly enough, then {{B}} receives the {{current topology id + 1}} topology and retries the command.
> But in some cases, the cache components that stop before {{StateTransferManagerImpl}} can take a long time to do so. In particular, because of {{ISPN-5507}}, {{TransactionTable}} can block for {{cacheStopTimeout}} if there are remote transactions in progress, even though the cache can no longer process remote commands.
> We should give {{StateTransferManagerImpl.stop()}} a priority of {{0}}, so that the {{CacheTopologyControlCommand(LEAVE)}} comand is sent as soon as possible.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-5507) Transactions committed immediately before cache stop can block shutdown
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5507?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5507:
-----------------------------------------------
wfink(a)redhat.com changed the Status of [bug 1316132|https://bugzilla.redhat.com/show_bug.cgi?id=1316132] from ASSIGNED to MODIFIED
> Transactions committed immediately before cache stop can block shutdown
> -----------------------------------------------------------------------
>
> Key: ISPN-5507
> URL: https://issues.jboss.org/browse/ISPN-5507
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final, 8.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha1, 8.2.1.Final
>
>
> This is causing random failures in {{DistributedEntryRetrieverTxTest.verifyNodeLeavesBeforeGettingData}}.
> The test inserts some values into the cache, starts an iteration, and then kills one of the nodes. In rare instances, the killed cache only receives the TxCompletionCommand for one of the writes after it started the shutdown, and ignores it. That leaves the remote tx on-going, and {{TransactionTable.shutDownGracefully()}} blocks for 30 seconds - causing a {{TimeoutException}} elsewhere in the test.
> {noformat}
> 10:52:18,129 TRACE (remote-thread-NodeAM-p12133-t6:) [CommandAwareRpcDispatcher] About to send back response SuccessfulResponse{responseValue=null} for command CommitCommand {gtx=GlobalTransaction:<NodeAL-45757>:22325:remote, cacheName='org.infinispan.iteration.DistributedEntryRetrieverTxTest', topologyId=4}
> 10:52:18,129 TRACE (testng-DistributedEntryRetrieverTxTest:) [JGroupsTransport] dests=[NodeAM-45518, NodeAL-45757], command=TxCompletionNotificationCommand{ xid=null, internalId=0, topologyId=4, gtx=GlobalTransaction:<NodeAL-45757>:22325:local, cacheName=org.infinispan.iteration.DistributedEntryRetrieverTxTest} , mode=ASYNCHRONOUS, timeout=15000
> 10:52:18,133 DEBUG (testng-DistributedEntryRetrieverTxTest:) [CacheImpl] Stopping cache org.infinispan.iteration.DistributedEntryRetrieverTxTest on NodeAM-45518
> 10:52:18,133 TRACE (OOB-2,NodeAM-45518:) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: TxCompletionNotificationCommand{ xid=null, internalId=0, topologyId=4, gtx=GlobalTransaction:<NodeAL-45757>:22325:local, cacheName=org.infinispan.iteration.DistributedEntryRetrieverTxTest} [sender=NodeAL-45757]
> 10:52:18,133 TRACE (OOB-2,NodeAM-45518:) [GlobalInboundInvocationHandler] Silently ignoring that org.infinispan.iteration.DistributedEntryRetrieverTxTest cache is not defined
> 10:52:18,133 DEBUG (testng-DistributedEntryRetrieverTxTest:) [TransactionTable] Wait for on-going transactions to finish for 30 seconds.
> 10:52:48,139 WARN (testng-DistributedEntryRetrieverTxTest:) [TransactionTable] ISPN000100: Stopping, but there are 0 local transactions and 1 remote transactions that did not finish in time.
> 10:52:48,386 ERROR (testng-DistributedEntryRetrieverTxTest:) [UnitTestTestNGListener] Test verifyNodeLeavesBeforeGettingData(org.infinispan.iteration.DistributedEntryRetrieverTxTest) failed.
> java.lang.IllegalStateException: Thread already timed out waiting for event pre_send_response_released
> at org.infinispan.test.fwk.CheckPoint.trigger(CheckPoint.java:131)
> at org.infinispan.test.fwk.CheckPoint.trigger(CheckPoint.java:116)
> at org.infinispan.iteration.DistributedEntryRetrieverTest.verifyNodeLeavesBeforeGettingData(DistributedEntryRetrieverTest.java:105)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years