[JBoss JIRA] (ISPN-8232) Transaction inconsistency during network partitions
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8232?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-8232:
------------------------------------
True, the stable topology doesn't fix the problem by itself. But having a stable topology does allow us to do some extra processing before updating the stable topology in the majority partition, like I suggested in ISPN-3421/ISPN-5046.
OTOH I don't see how an "expected members" list different from the list of members in the stable topology could be maintained without the administrator having to manually intervene every time a node crashes.
> Transaction inconsistency during network partitions
> ---------------------------------------------------
>
> Key: ISPN-8232
> URL: https://issues.jboss.org/browse/ISPN-8232
> Project: Infinispan
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 9.1.0.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Critical
>
> In scenario where the originator stays in minor partition (in our test suite, the originator isolated tests), it is possible to a transaction to be committed and rolled back in the majority partition.
> In {{Pessimitic Locking}}, the transaction is committed in one-phase using the {{PrepareCommand}}. If the partition happens when the originator sends the {{PrepareCommand}}, the nodes in the majority partition may or may not receive it. We can have the case where some nodes receive the {{PrepareCommand}} and applied and other don't receive it.
> When the topology is updated in the majority partition, the {{TransactionTable}} rollbacks all transaction in which the originator isn't present. So, in the nodes where the {{PrepareCommand}} isn't received, the transaction is rolled back.
> The originator in the minory partition detects the partition and marks the transaction partially completed. When the merge occurs, it tries to commit the transaction again. In the nodes where the transaction is rolled back, the transaction is marked as completed and when the {{PrepareCommand}} is received, it throws an {{IllegalStateException}} ({{TransactionTable:386, getOrCreateRemoteTransaction()}}). In this case, the transaction isn't removed from the {{PartitionHandlingManager}} and our test suite fails with {{"there are pending tx".}}
> Other theoretically scenario is the {{PrepareCommand}} to be executed when no locks are acquired.
> The same issue can happen with {{Optimistic Locking}} for the {{CommitCommand}}.
> The problem is the transaction table can't identify is the node left gracefully or not. A solution would be to have an {{"expected members"}} list, ideally separated from the {{CacheTopology}} to avoid sending it every time. Also, it would need some sysadmin tools for the case where the node crashes and it won't be back online for a while (or for some reason, it doesn't need to be back online).
> A sysadmin could remove the node from this list ({{CacheTopology}} is updated and there is no need to increase it) and decide what to do with the pending transactions (or an automatic mechanism to auto-commit/rollback the transaction).
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months
[JBoss JIRA] (ISPN-8237) State transfer doesn't replace L1 entry with regular entry
by Dan Berindei (JIRA)
Dan Berindei created ISPN-8237:
----------------------------------
Summary: State transfer doesn't replace L1 entry with regular entry
Key: ISPN-8237
URL: https://issues.jboss.org/browse/ISPN-8237
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.1.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
An L1 entry is preserved when the local node becomes an owner, it is only removed if the application removes the key explicitly. Otherwise state transfer overwrites the L1 entry with the entry received from a previous owner, making it immortal (assuming the original entry didn't have a lifespan/maxIdle).
A transaction writing to a key while the local node is only a write owner will not remove the L1 metadata from the entry either, but it will set the {{PUT_FOR_STATE_TRANSFER}} flag in {{CommitManager}}. Since state transfer doesn't overwrite an entry tracked in {{CommitManager}}, that means the L1 entry is never replaced with an immortal entry.
This started appearing in {{ConcurrentNonOverlappingLeaveTest}} once I changed {{TxDistributionInterceptor}} to cause a retry when the topology changes during commit (for ISPN-8195). There, the entry is first wrapped on the originator before it becomes the primary owner, and the commit happens multiple times, but neither seems to be required.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 7 months