[JBoss JIRA] (ISPN-2713) REBALANCE_START and REBALANCE_CONFIRM commands deadlock when RSVP.ack_on_delivery=true
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2713?page=com.atlassian.jira.plugin.... ]
Dan Berindei resolved ISPN-2713.
--------------------------------
Fix Version/s: 5.3.0.Beta1
Resolution: Done
The ISPN-2825 fix eliminated the deadlock.
> REBALANCE_START and REBALANCE_CONFIRM commands deadlock when RSVP.ack_on_delivery=true
> --------------------------------------------------------------------------------------
>
> Key: ISPN-2713
> URL: https://issues.jboss.org/browse/ISPN-2713
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 5.3.0.Beta1, 5.3.0.Final
>
>
> When the coordinator sends a REBALANCE_START command, it holds a lock on the ClusterCacheStatus until it receives the responses from all the other members.
> If a node doesn't need to request any new state, it sends the rebalance confirmation to the coordinator on the same thread that received the REBALANCE_START command. The REBALANCE_CONFIRM command also wants to acquire a lock on the ClusterCacheStatus on the coordinator, but because the REBALANCE_CONFIRM command is sent asynchronously, it doesn't deadlock with the thread waiting for REBALANCE_START responses on the coordinator.
> At least, that's what happens when {{RSVP.ack_on_delivery=false}} (the Infinispan default). When {{RSVP.ack_on_delivery=true}} (the JGroups default), the "asynchronous" REBALANCE_CONFIRM command becomes synchronous, and it generates a deadlock. The rebalance then fails after the RSVP timeout expires (10 seconds by default).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (ISPN-777) Race conditions in cleaning up stale locks held by dead members in a cluster
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-777?page=com.atlassian.jira.plugin.s... ]
Mircea Markus resolved ISPN-777.
--------------------------------
Resolution: Out of Date
should be fixed in 5.2.
> Race conditions in cleaning up stale locks held by dead members in a cluster
> ----------------------------------------------------------------------------
>
> Key: ISPN-777
> URL: https://issues.jboss.org/browse/ISPN-777
> Project: Infinispan
> Issue Type: Bug
> Components: Locking and Concurrency
> Affects Versions: 4.2.0.BETA1
> Reporter: Vladimir Blagojevic
> Assignee: Mircea Markus
> Priority: Critical
> Fix For: 5.3.0.Final
>
> Attachments: CacheScheduledCounter.java, ISPN-777_output.txt
>
>
> It seems that rollback sometimes does not release acquired eager locks. See attached test program and run two JVM instances on the same machine. Program schedules a task to run every 5 seconds. Tasks simply locks a key, gets the value, increases the value and puts it back surrounded by begin/commit/rollback tx boundary.
> Steps to reproduce (keep repeating steps until problem is encountered):
> 1) Kill one running instance.
> 2) Restart it
> See attached example output of a run.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months