[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4569:
-------------------------------
Assignee: William Burns (was: Dan Berindei)
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: William Burns
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4569:
------------------------------------
[~sannegrinovero] you are right, the cluster registry is an implementation detail, and we never describe the cluster registry as joining an ongoing transaction. However, the cluster registry cache registers itself as a synchronization, so it doesn't force the tx to become XA.
I see the indexing is done in the {{QueryInterceptor.visitPrepareCommand()}}, so indexing is done during the Infinispan cache's prepare, regardless of whether it's an XA resource of a synchronization. This is probably too early, as the TM can very well roll back the tx after a successful prepare.
However, it would be impossible to access the cluster registry during commit unless we start an independent tx in cluster registry operations, so that should be the first priority.
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-4569:
-----------------------------------
{quote}
Why is the change to the ServiceRegistry included in the same transaction of the user? they should be independent, or a child transaction at best
{quote}
It is not - the transaction from JTA point of view is local to the originating node. ClusterRegistry starts another transaction, modifying the same data as the first one.
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero commented on ISPN-4569:
---------------------------------------
yes they should add value. But Infinispan is hijacking the user TX to do other things in it, apparently even at odd lifecycles, and including forcing it to become XA (which implies a need for transaction logs) while the user will not expect this to happen.
{quote}Where is it registered, as a synchronization? (just asking){quote}
I only meant what Hibernate Search does when it's used in the context of Hibernate ORM / JPA. Apparently Manik and Navin coded it differently in Infinispan.
Why is the change to the ServiceRegistry included in the same transaction of the user? they should be independent, or a child transaction at best.
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-4569:
-----------------------------------
{quote}
* we shouldn't start a transaction during a prepare -> indexing in Hibernate is registered as a post Transaction synchronization
{quote}
Where is it registered, as a synchronization? (just asking)
{quote}
* the ServiceRegistry access should be unrelated to the main TX (many more services might need this)
{quote}
If executed when any locks are held, it must be part of the main TX, because you can't guarantee that you won't try to write another entry with the same lock.
{quote}
* why is the ServiceRegistry transactional? Never asked for such trouble
{quote}
There is this comment: 'use a transactional cache for high consistency as writes are expected to be rare in this cache'
I don't think people generally consider TXs "trouble" :) I think it's rather considered something with added value.
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero commented on ISPN-4569:
---------------------------------------
Looks like we have several issues here:
- we shouldn't start a transaction during a prepare -> indexing in Hibernate is registered as a *post* Transaction synchronization
- the ServiceRegistry access should be unrelated to the main TX (many more services might need this)
- why is the ServiceRegistry transactional? Never asked for such trouble
- it's a great of example of what I meant in a recent discussion about the fact that we shouldn't be _using_ transactions to do internal operations, but rather expose them as a user feature exclusively
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4569) Inserting into cache with indexing fails for XA transactions
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4569?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-4569:
-----------------------------------
The problem is that ServiceRegistry uses transactional cache, and therefore, you start a transaction while executing prepare for another transaction. If those two transactions modify the same key, you have the deadlock. I don't blame you for not realizing that, though :)
The correct solution would be (IMO) to let the code executing prepare join the transaction, if it needs to execute another distributed code. However, that would result in sending another PrepareCommand to another nodes, sharing the actually modified state (if we expect repeatable reads across cluster) and possibly a cascade of another operations... And the consequences terrify me a bit :) But the other option is to prohibit code performing the prepare starting a new transactions - I am not sure how the current implementation would cope with that.
> Inserting into cache with indexing fails for XA transactions
> ------------------------------------------------------------
>
> Key: ISPN-4569
> URL: https://issues.jboss.org/browse/ISPN-4569
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying, Transactions
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> If I setup XA transactions in {{ClusteredQueryDslConditionsTest}} using
> {code}
> cacheCfg.transaction().transactionMode(TransactionMode.TRANSACTIONAL).useSynchronization(false);
> {code}
> the test fails. The reason is in deadlock while updating {{ScopedKey}} in __cluster_registry_cache__ : It seems that on originator we create transaction with modified inserted key and the {{ScopedKey}} for inserted class, and send it in two prepare commands to the other node. In the {{ScopedKey}}-prepare, the lock is acquired, but the regular prepare on the other node does not see it (it is not committed yet) and also tries to update this {{ScopedKey}} in __cluster_registry_cache__ . This fails with lock timeout, as the commit is waiting on the regular prepare to finish.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4581) Memory leak when H2 is used
by Pedro Ruivo (JIRA)
Pedro Ruivo created ISPN-4581:
---------------------------------
Summary: Memory leak when H2 is used
Key: ISPN-4581
URL: https://issues.jboss.org/browse/ISPN-4581
Project: Infinispan
Issue Type: Bug
Security Level: Public (Everyone can see)
Affects Versions: 7.0.0.Alpha5
Reporter: Pedro Ruivo
Assignee: Pedro Ruivo
Fix For: 7.0.0.Beta1
When H2 is used in-memory (the default maven profile) it keeps the data until the JVM is shutdown. This may leak the test data from the between the JDBC tests and the JPA tests.
Change the tests to clear the data before closing the cache manager involved.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4581) Memory leak when H2 is used
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4581?page=com.atlassian.jira.plugin.... ]
Work on ISPN-4581 started by Pedro Ruivo.
> Memory leak when H2 is used
> ---------------------------
>
> Key: ISPN-4581
> URL: https://issues.jboss.org/browse/ISPN-4581
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Affects Versions: 7.0.0.Alpha5
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Labels: testsuite_stability
> Fix For: 7.0.0.Beta1
>
>
> When H2 is used in-memory (the default maven profile) it keeps the data until the JVM is shutdown. This may leak the test data from the between the JDBC tests and the JPA tests.
> Change the tests to clear the data before closing the cache manager involved.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4575?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4575:
-------------------------------
Priority: Blocker (was: Major)
> Map/Reduce incorrect results with a non-shared non-tx intermediate cache
> ------------------------------------------------------------------------
>
> Key: ISPN-4575
> URL: https://issues.jboss.org/browse/ISPN-4575
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Vladimir Blagojevic
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.Beta1
>
>
> In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
> When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
> Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
> _A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
> This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
> I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
> A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 5 months