[infinispan-issues] [JBoss JIRA] (ISPN-9701) TransactionTable does not shutdown gracefully

Dan Berindei (Jira) issues at jboss.org
Mon Nov 19 13:20:00 EST 2018


    [ https://issues.jboss.org/browse/ISPN-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663595#comment-13663595 ] 

Dan Berindei commented on ISPN-9701:
------------------------------------

I think your "naive fix" is just fine [~pferraro]. Marshalling is always synchronous, so it will definitely fix the exception.

True, before this we already left the cache, so the other nodes might have already removed the transaction before we sent the commit and the subsequent {{TxCompletionNotificationCommand}}. In order to fix that, we'd need another component that waits for local txs to finish before leaving the cache, or perhaps splitting the leave in 2 phases.

> TransactionTable does not shutdown gracefully
> ---------------------------------------------
>
>                 Key: ISPN-9701
>                 URL: https://issues.jboss.org/browse/ISPN-9701
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 9.4.1.Final
>            Reporter: Paul Ferraro
>            Assignee: Dan Berindei
>            Priority: Critical
>
> Here's a sample stacktrace during shutdown:
> {noformat}
> 16:54:15,033 WARN  [org.wildfly.clustering.web.undertow] (default task-1) ISPN000472: Cache manager is stopping: org.infinispan.IllegalLifecycleStateException: ISPN000472: Cache manager is stopping
> 	at org.infinispan.marshall.core.GlobalMarshaller.getExternalizer(GlobalMarshaller.java:420)
> 	at org.infinispan.marshall.core.GlobalMarshaller.writeNonNullableObject(GlobalMarshaller.java:400)
> 	at org.infinispan.marshall.core.GlobalMarshaller.writeNullableObject(GlobalMarshaller.java:355)
> 	at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:183)
> 	at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:176)
> 	at org.infinispan.marshall.core.GlobalMarshaller.objectToBuffer(GlobalMarshaller.java:305)
> 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.marshallRequest(JGroupsTransport.java:1009)
> 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.sendCommand(JGroupsTransport.java:1209)
> 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.performAsyncRemoteInvocation(JGroupsTransport.java:1105)
> 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:246)
> 	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:291)
> 	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:323)
> 	at org.infinispan.transaction.impl.TransactionTable.removeTransactionInfoRemotely(TransactionTable.java:900)
> 	at org.infinispan.transaction.impl.TransactionTable.releaseLocksForCompletedTransaction(TransactionTable.java:886)
> 	at org.infinispan.transaction.xa.XaTransactionTable.forgetSuccessfullyCompletedTransaction(XaTransactionTable.java:195)
> 	at org.infinispan.transaction.xa.XaTransactionTable.commit(XaTransactionTable.java:128)
> 	at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:68)
> 	at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:419)
> 	at org.infinispan.commons.tx.TransactionImpl.commitResources(TransactionImpl.java:466)
> 	at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:335)
> 	at org.infinispan.commons.tx.TransactionImpl.commit(TransactionImpl.java:110)
> {noformat}
> The problem seems to be that shutDownGracefully() first waits for the localTransactions map to be empty.  However, when the cache is clustered, releaseLocksForCompletedTransaction(...) removes the transaction from the localTransactions map *before* invoking removeTransactionInfoRemotely(...), which means that the subsequent TxCompletionNotificationCommand can fail to marshal (see above), or the transport might close before this command is sent.
> A naive fix would simply reorder the removeLocalTransaction(...) to happen after the call to removeTransactionInfoRemotely(...) within the releaseLocksForCompletedTransaction(...) method, but I'm sure there's more to it.



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the infinispan-issues mailing list