[JBoss JIRA] (ISPN-5309) Model data consistency for get()/put() operations in Infinispan
by Richard Achmatowicz (JIRA)
[ https://issues.jboss.org/browse/ISPN-5309?page=com.atlassian.jira.plugin.... ]
Richard Achmatowicz commented on ISPN-5309:
-------------------------------------------
Just an update. I made some not insignificant progress on this in October/November of 2015 but got bogged down with .. um .. work in in the meantime. I am planning to return to this modelling exercise after the current EAP7 clustering issues are cleared up.
> Model data consistency for get()/put() operations in Infinispan
> ---------------------------------------------------------------
>
> Key: ISPN-5309
> URL: https://issues.jboss.org/browse/ISPN-5309
> Project: Infinispan
> Issue Type: Task
> Reporter: Richard Achmatowicz
> Assignee: Richard Achmatowicz
>
> This will be the first in a series of modelling/validation exercises for some of the more critical Infinispan protocols.
> We shall use TLA+ / PlusCal / TLA+ Tools to do the following:
> - model the design of processing of get()/put() operations in an Infinispan cluster
> - model client interactions with that cluster
> - describe the data consistency requirements of get()/put() operations
> - verify that the data consistency semantics of Infinispan are preserved in the face of concurrent client interactions
> TLA+ / PLusCal can be thought of as a pseudo-code language which has a well-defined semantics and is testable using the tools in the TLA+ Toolkit.
> The benefits of such an exercise are that we end up with:
> - a specification of data consistency guarantees that Infinispan provides
> - a semantically precise pseudo-code description of the design of get()/put() processing
> - a verification that the protocol design is correct
> We start here with the simple case of modelling data consistency in the absence of failures. In later exercises, we aim to tackle rebalancing and non-blocking state transfer in the face of membership changes and partitions.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 2 months
[JBoss JIRA] (ISPN-6047) Deadlock when a prepare command is retried
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6047?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6047:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Deadlock when a prepare command is retried
> ------------------------------------------
>
> Key: ISPN-6047
> URL: https://issues.jboss.org/browse/ISPN-6047
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.1.0.Final
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Fix For: 8.2.0.CR1
>
>
> Looks like the ISPN-5623 fix went too far, and now I found a test failure with the opposite behaviour:
> 1. Remote prepare for {{txA}} acquires lock {{K}}
> 2. Remote prepare for {{txB}} blocks waiting for lock {{K}}
> 3. The topology changes, and the {{txA}} prepare is retried
> 4. The {{txA}} prepare times out, because it waits for pending transaction {{txB}} to finish.
> So we have to make {{txA}} somehow know that it already has the lock after it received an {{UnsureResponse}} for the prepare command, and skip waiting for pending transactions.
> I found the problem in a random failure of {{DistributedFourNodesMapReduceTest}} on a local branch, but I'm not sure if my local changes (making SyncCHF the default CH factory) made it more likely.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 2 months
[JBoss JIRA] (ISPN-6235) ClusterTopologyManagerImpl join during cluster status recovery
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-6235?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-6235:
------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 8.2.0.CR1
Resolution: Done
> ClusterTopologyManagerImpl join during cluster status recovery
> --------------------------------------------------------------
>
> Key: ISPN-6235
> URL: https://issues.jboss.org/browse/ISPN-6235
> Project: Infinispan
> Issue Type: Bug
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 8.2.0.CR1
>
>
> If the joiner has the correct view id, but the current status is
> RECOVERING_CLUSTER, we should wait for the cluster status recovery to
> finish before adding the new member.
> We are currently not doing that, so the new member could be erased by the status recovery process that's in progress. This can happen if the coordinator joiner already had been a member of the JGroups cluster for some time, and there's no view change when they actually start their caches (exactly the scenario in {{ConcurrentStartTest}}).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 2 months
[JBoss JIRA] (ISPN-5495) ConcurrentStartTest.testConcurrentStart random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5495?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5495:
------------------------------
Status: Resolved (was: Pull Request Sent)
Assignee: Dan Berindei
Resolution: Done
> ConcurrentStartTest.testConcurrentStart random failures
> -------------------------------------------------------
>
> Key: ISPN-5495
> URL: https://issues.jboss.org/browse/ISPN-5495
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 8.2.0.CR1
>
>
> {noformat}
> org.testng.internal.thread.ThreadTimeoutException: Method org.testng.internal.TestNGMethod.testConcurrentStart() didn't finish within the time-out 60000
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at org.infinispan.test.TestingUtil.waitForRehashToComplete(TestingUtil.java:253)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 2 months