[
https://issues.jboss.org/browse/ISPN-9701?page=com.atlassian.jira.plugin....
]
Dan Berindei edited comment on ISPN-9701 at 12/3/18 8:00 AM:
-------------------------------------------------------------
I think your "naive fix" is just fine [~pferraro]. Marshalling is always
synchronous, so it will definitely fix the exception.
--True, before this we already left the cache, so the other nodes might have already
removed the transaction before we sent the commit and the subsequent
{{TxCompletionNotificationCommand}}. In order to fix that, we'd need another component
that waits for local txs to finish before leaving the cache, or perhaps splitting the
leave in 2 phases.--
{{TransactionTable}} doesn't remove transactions for originators who are not members
in the cache topology, it only looks at the cluster view, so we don't need anything
else.
was (Author: dan.berindei):
I think your "naive fix" is just fine [~pferraro]. Marshalling is always
synchronous, so it will definitely fix the exception.
True, before this we already left the cache, so the other nodes might have already removed
the transaction before we sent the commit and the subsequent
{{TxCompletionNotificationCommand}}. In order to fix that, we'd need another component
that waits for local txs to finish before leaving the cache, or perhaps splitting the
leave in 2 phases.
TransactionTable does not shutdown gracefully
---------------------------------------------
Key: ISPN-9701
URL:
https://issues.jboss.org/browse/ISPN-9701
Project: Infinispan
Issue Type: Bug
Components: Transactions
Affects Versions: 9.2.4.Final, 9.3.5.Final, 9.4.1.Final
Reporter: Paul Ferraro
Assignee: Paul Ferraro
Priority: Critical
Fix For: 9.4.3.Final, 10.0.0.Alpha2
Here's a sample stacktrace during shutdown:
{noformat}
16:54:15,033 WARN [org.wildfly.clustering.web.undertow] (default task-1) ISPN000472:
Cache manager is stopping: org.infinispan.IllegalLifecycleStateException: ISPN000472:
Cache manager is stopping
at
org.infinispan.marshall.core.GlobalMarshaller.getExternalizer(GlobalMarshaller.java:420)
at
org.infinispan.marshall.core.GlobalMarshaller.writeNonNullableObject(GlobalMarshaller.java:400)
at
org.infinispan.marshall.core.GlobalMarshaller.writeNullableObject(GlobalMarshaller.java:355)
at
org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:183)
at
org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:176)
at
org.infinispan.marshall.core.GlobalMarshaller.objectToBuffer(GlobalMarshaller.java:305)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.marshallRequest(JGroupsTransport.java:1009)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.sendCommand(JGroupsTransport.java:1209)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.performAsyncRemoteInvocation(JGroupsTransport.java:1105)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:246)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:291)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:323)
at
org.infinispan.transaction.impl.TransactionTable.removeTransactionInfoRemotely(TransactionTable.java:900)
at
org.infinispan.transaction.impl.TransactionTable.releaseLocksForCompletedTransaction(TransactionTable.java:886)
at
org.infinispan.transaction.xa.XaTransactionTable.forgetSuccessfullyCompletedTransaction(XaTransactionTable.java:195)
at org.infinispan.transaction.xa.XaTransactionTable.commit(XaTransactionTable.java:128)
at
org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:68)
at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:419)
at org.infinispan.commons.tx.TransactionImpl.commitResources(TransactionImpl.java:466)
at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:335)
at org.infinispan.commons.tx.TransactionImpl.commit(TransactionImpl.java:110)
{noformat}
The problem seems to be that shutDownGracefully() first waits for the localTransactions
map to be empty. However, when the cache is clustered,
releaseLocksForCompletedTransaction(...) removes the transaction from the
localTransactions map *before* invoking removeTransactionInfoRemotely(...), which means
that the subsequent TxCompletionNotificationCommand can fail to marshal (see above), or
the transport might close before this command is sent.
A naive fix would simply reorder the removeLocalTransaction(...) to happen after the call
to removeTransactionInfoRemotely(...) within the releaseLocksForCompletedTransaction(...)
method, but I'm sure there's more to it.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)