[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4546:
------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Possible stale lock when the primary owner leaves during rebalance
> ------------------------------------------------------------------
>
> Key: ISPN-4546
> URL: https://issues.jboss.org/browse/ISPN-4546
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5, 7.1.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.2.0.Final
>
>
> Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null
> B sends prepareCommand(tx1, put(k, v)) to C, D
> D adds backup locks and replies
> C acquires lock, ready to send reply to B
> A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E]
> A, C and E install topology T+1, B and D do not
> E requests and receives tx data from C, including tx1
> C leaves
> B sees a SuspectException, sends rollbackCommand(tx1) to C, D
> D removes tx1
> C has left, but is ignored
> B reports to the user that the tx has been rolled back
> B and D install topology T+1 (optional)
> A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E]
> A, B, D, E all install topology T+2
> E requests and receives state from D, but it does not remove tx1
> A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null
> E now has a stale backup lock on k
> It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E.
> A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months
[JBoss JIRA] (ISPN-5274) Inconsistent data after transaction rollback (with success on originator)
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5274?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5274:
------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Inconsistent data after transaction rollback (with success on originator)
> -------------------------------------------------------------------------
>
> Key: ISPN-5274
> URL: https://issues.jboss.org/browse/ISPN-5274
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.1.1.Final
> Reporter: Matej Čimbora
> Assignee: Dan Berindei
> Fix For: 7.2.0.Final
>
>
> Scenario
> Nodes edg-perf[10-13], partition handling on
> 1.Transaction is started on edg-perf10
> {code}
> 10:24:48,228 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter] (DefaultStressor-6) start called on tx GlobalTransaction:<edg-perf10-20667>:38910:local
> {code}
> 2. Value of key_000000000000065B is updated within the transaction
> {code}
> 10:24:48,405 TRACE [org.radargun.service.InfinispanOperations$Cache] (DefaultStressor-6) PUT cache=testCache key=key_000000000000065B value=[6 #15: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ]
> {code}
> 3. Transaction is successfully prepared on edg-perf11 & edg-perf12
> {code}
> 10:24:48,559 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-6) Responses: [sender=edg-perf11-61837, received=true, suspected=false]
> [sender=edg-perf12-7305, received=true, suspected=false]
> {code}
> 4. Transaction commit is issued
> {code}
> 10:24:48,562 TRACE [org.infinispan.transaction.TransactionCoordinator] (DefaultStressor-6) Committing transaction GlobalTransaction:<edg-perf10-20667>:38910:local
> {code}
> 5. Other participating nodes (edg-perf11 & edg-perf12) are suspected...
> {code}
> 10:24:52,705 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (DefaultStressor-6) Target node edg-perf11-61837 left during remote call, ignoring
> 10:24:52,716 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (DefaultStressor-6) Target node edg-perf12-7305 left during remote call, ignoring
> {code}
> ... as they received a new view (without edg-perf10) meanwhile.
> {code}
> 10:24:48,547 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,edg-perf12-7305) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf12-7305|20] (3) [edg-perf12-7305, edg-perf11-61837, edg-perf13-25187], 2 subgroups: [edg-perf13-25187|8] (1) [edg-perf13-25187], [edg-perf13-25187|19] (1) [edg-perf13-25187]
> {code}
> Still, the transaction is commited on edg-perf10 & updated entry is stored locally
> {code}
> 10:24:52,894 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6) Trying to commit. Key=key_000000000000065B. Operation Flag=null, L1 invalidation=false
> 10:24:52,896 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6) Committing key=key_000000000000065B. It is a L1 invalidation or a normal put and no tracking is enabled!
> 10:24:52,908 TRACE [org.infinispan.container.DefaultDataContainer] (DefaultStressor-6) Creating new ICE for writing. Existing=ImmortalCacheEntry{key=key_000000000000065B, value=[6 #14: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, ]}, metadata=EmbeddedMetadata{version=null}, new value=[6 #15: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ]
> {code}
> 6. Other nodes rollback the transaction
> {code}
> 10:24:50,376 DEBUG [org.infinispan.transaction.TransactionTable] (transport-thread-10) Rolling back transaction GlobalTransaction:<edg-perf10-20667>:38910:remote because originator edg-perf10-20667 left the cluster
> {code}
> 7. edg-perf10 receives a new view, containing nodes edg-perf[10,11,13]. Incoming state transfer overwrites the updated value
> {code}
> 10:25:09,614 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf10-20667) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf10-20667|22] (3) [edg-perf10-20667, edg-perf11-61837, edg-perf13-25187], 4 subgroups: [edg-perf10-20667|15] (2) [edg-perf10-20667, edg-perf11-61837], [edg-perf10-20667|19] (3) [edg-perf10-20667, edg-perf12-7305, edg-perf11-61837], [edg-perf10-20667|18] (1) [edg-perf10-20667], [edg-perf10-20667|21] (2) [edg-perf10-20667, edg-perf13-25187]
> {code}
> 8. get operation returns outdated value
> {code}
> 10:26:21,020 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-6) Response(s) to ClusteredGetCommand{key=key_000000000000065B, flags=null} is {edg-perf12-7305=SuccessfulResponse{responseValue=ImmortalCacheValue {value=[6 #14: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, ]}} }
> {code}
> From client perspective, this behavior is not transparent. Provided the transaction ended up successfully, presence of the updated entry can be assumed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months
[JBoss JIRA] (ISPN-5424) Make SemaphoreCompletionService an executor service
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5424?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5424:
-------------------------------
Description: Turns out the {{CompletionService}} features aren't that necessary, and it makes the use of {{Runnable}} more cumbersome. (was: Turns out the {{CompletionService}} features aren't that necessary, and it makes the use of {{Runnable}}s more cumbersome.)
> Make SemaphoreCompletionService an executor service
> ---------------------------------------------------
>
> Key: ISPN-5424
> URL: https://issues.jboss.org/browse/ISPN-5424
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 7.2.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.2.0.Final
>
>
> Turns out the {{CompletionService}} features aren't that necessary, and it makes the use of {{Runnable}} more cumbersome.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months
[JBoss JIRA] (ISPN-5424) Make SemaphoreCompletionService an executor service
by Dan Berindei (JIRA)
Dan Berindei created ISPN-5424:
----------------------------------
Summary: Make SemaphoreCompletionService an executor service
Key: ISPN-5424
URL: https://issues.jboss.org/browse/ISPN-5424
Project: Infinispan
Issue Type: Task
Components: Core
Affects Versions: 7.2.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 7.2.0.Final
Turns out the {{CompletionService}} features aren't that necessary, and it makes the use of {{Runnable}}s more cumbersome.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months
[JBoss JIRA] (ISPN-3691) Make client side Connection refused error TRACE
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3691?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3691:
-----------------------------------------------
Vojtech Juranek <vjuranek(a)redhat.com> changed the Status of [bug 1028411|https://bugzilla.redhat.com/show_bug.cgi?id=1028411] from ON_QA to VERIFIED
> Make client side Connection refused error TRACE
> -----------------------------------------------
>
> Key: ISPN-3691
> URL: https://issues.jboss.org/browse/ISPN-3691
> Project: Infinispan
> Issue Type: Feature Request
> Components: Remote Protocols
> Affects Versions: 6.0.0.CR1, 6.0.0.Final
> Reporter: Michal Linhard
> Assignee: Galder Zamarreño
> Priority: Minor
> Fix For: 7.0.0.Final
>
>
> After solving ISPN-3454, it seems that only remaining client-side error during node crashes is "Connection refused":
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/RESILIENCE...
> This has been reported before as ISPN-1794 or ISPN-1119, but actually it seems like it reappeared in different place.
> Sorry for not reporting sooner, I got used to ignoring some of the long-open cosmetic low-prio log message issues, that I forgot about this one...
> The issue here is that these "Connection refused" problems are retry-able, so the client log shouldn't contain error.
> Maybe only some info level message about failing over to different node. But that's actually already reported by the INFO level messages about the topology changes
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months
[JBoss JIRA] (ISPN-4991) Implement clustered cache statistics
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4991?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4991:
-----------------------------------------------
Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1180047|https://bugzilla.redhat.com/show_bug.cgi?id=1180047] from ON_QA to VERIFIED
> Implement clustered cache statistics
> ------------------------------------
>
> Key: ISPN-4991
> URL: https://issues.jboss.org/browse/ISPN-4991
> Project: Infinispan
> Issue Type: Sub-task
> Components: JMX, reporting and management
> Reporter: Vladimir Blagojevic
> Assignee: Vladimir Blagojevic
> Fix For: 7.1.0.Beta1, 7.1.0.Final
>
>
> As of 7.0.0 release we implement cache statistics on a per node cache level. For Infinispan admin console we need to implement aggregate statistics for each cache across all nodes in the cluster. The implementing class should be a registered MBean and should implement similar cache statistics currently implemented by org.infinispan.interceptors.CacheMgmtInterceptor
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
8 years, 12 months