January 2014 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-3878: ------------------------------------ I think the cancel command can't be sent asynchronously, because we want to know that nobody is sending state by the time the new rebalance starts. (The cancelling of the transfer tasks should happen during the handling of the CH_UPDATE that's sent by the new coordinator, not during the REBALANCE_START that follows.) On the other hand, perhaps we don't need the CANCEL_STATE_TRANSFER commands at all, and we could just cancel all outbound transfer tasks when we install a new cache topology without a pending CH in StateProviderImpl. > Unhandled failing ST cancel leads to deadlock > --------------------------------------------- > > Key: ISPN-3878 > URL: https://issues.jboss.org/browse/ISPN-3878 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 6.0.1.Final > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > > Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well. > 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true > 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion() > 3. New rebalance is started, changing the CH - requested segment is not in the new CH > 4. Some ST are canceled, the cancel command is sent and taking a long time > 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled > 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed > Result: the inbound transfer thread is stuck and rebalance is never completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration updated ISPN-3878: ------------------------------------------ Bugzilla Update: Perform Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1049846 > Unhandled failing ST cancel leads to deadlock > --------------------------------------------- > > Key: ISPN-3878 > URL: https://issues.jboss.org/browse/ISPN-3878 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 6.0.1.Final > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > > Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well. > 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true > 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion() > 3. New rebalance is started, changing the CH - requested segment is not in the new CH > 4. Some ST are canceled, the cancel command is sent and taking a long time > 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled > 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed > Result: the inbound transfer thread is stuck and rebalance is never completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock

by Radim Vansa (JIRA)

[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ] Radim Vansa commented on ISPN-3878: ----------------------------------- Could the CANCEL command be sent asynchronously? > Unhandled failing ST cancel leads to deadlock > --------------------------------------------- > > Key: ISPN-3878 > URL: https://issues.jboss.org/browse/ISPN-3878 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 6.0.1.Final > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > > Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well. > 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true > 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion() > 3. New rebalance is started, changing the CH - requested segment is not in the new CH > 4. Some ST are canceled, the cancel command is sent and taking a long time > 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled > 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed > Result: the inbound transfer thread is stuck and rebalance is never completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock

by Radim Vansa (JIRA)

[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ] Radim Vansa edited comment on ISPN-3878 at 1/8/14 5:57 AM: ----------------------------------------------------------- Could the StateRequestCommand.Type.CANCEL_STATE_TRANSFER command be sent asynchronously? Is it rather an optimization, or is it required? was (Author: rvansa): Could the CANCEL command be sent asynchronously? > Unhandled failing ST cancel leads to deadlock > --------------------------------------------- > > Key: ISPN-3878 > URL: https://issues.jboss.org/browse/ISPN-3878 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 6.0.1.Final > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > > Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well. > 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true > 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion() > 3. New rebalance is started, changing the CH - requested segment is not in the new CH > 4. Some ST are canceled, the cancel command is sent and taking a long time > 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled > 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed > Result: the inbound transfer thread is stuck and rebalance is never completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock

by Radim Vansa (JIRA)

Radim Vansa created ISPN-3878: --------------------------------- Summary: Unhandled failing ST cancel leads to deadlock Key: ISPN-3878 URL: https://issues.jboss.org/browse/ISPN-3878 Project: Infinispan Issue Type: Bug Components: State transfer Affects Versions: 6.0.1.Final Reporter: Radim Vansa Assignee: Dan Berindei Priority: Critical Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well. 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion() 3. New rebalance is started, changing the CH - requested segment is not in the new CH 4. Some ST are canceled, the cancel command is sent and taking a long time 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed Result: the inbound transfer thread is stuck and rebalance is never completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3829) Null value read with RR can be replaced by cache loader value

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3829?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-3829: ----------------------------------------------- Vojtech Juranek <vjuranek(a)redhat.com> changed the Status of [bug 1045579|https://bugzilla.redhat.com/show_bug.cgi?id=1045579] from ON_QA to VERIFIED > Null value read with RR can be replaced by cache loader value > ------------------------------------------------------------- > > Key: ISPN-3829 > URL: https://issues.jboss.org/browse/ISPN-3829 > Project: Infinispan > Issue Type: Bug > Components: Loaders and Stores > Affects Versions: 6.0.0.Final > Reporter: William Burns > Assignee: William Burns > Labels: 620 > Fix For: 7.0.0.Final > > > Currently the CacheLoaderInterceptor does the following check to determine if it should check the loader for a value > {code} > if (e == null || e.isNull() || e.getValue() == null) { > {code} > Unfortunately this means it checks the loader when a null value is in the entry when using RR. This can cause an issue if another transaction commits that key and puts a value that results in that value being inserted into the loader. > This also is a performance issue for RR, since it has to check the loader over and over for a given key even if it was found null the first time. > Initial thought is to do something like setSkipRemoteGet and that could actually be used for a dual purpose possibly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3821) Server distribution missing some client dependencies

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3821?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-3821: ----------------------------------------------- Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1042647|https://bugzilla.redhat.com/show_bug.cgi?id=1042647] from ON_QA to VERIFIED > Server distribution missing some client dependencies > ---------------------------------------------------- > > Key: ISPN-3821 > URL: https://issues.jboss.org/browse/ISPN-3821 > Project: Infinispan > Issue Type: Bug > Components: Server > Affects Versions: 6.0.0.Final > Reporter: Tristan Tarrant > Assignee: Tristan Tarrant > Fix For: 6.0.1.Final > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3737) L1 requestor registered after value read

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3737?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-3737: ----------------------------------------------- Radim Vansa <rvansa(a)redhat.com> changed the Status of [bug 1032545|https://bugzilla.redhat.com/show_bug.cgi?id=1032545] from ON_QA to VERIFIED > L1 requestor registered after value read > ---------------------------------------- > > Key: ISPN-3737 > URL: https://issues.jboss.org/browse/ISPN-3737 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache > Affects Versions: 6.0.0.Final > Reporter: Radim Vansa > Assignee: William Burns > Priority: Critical > Labels: 620 > Fix For: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Final > > > As the L1 requestor is registered only after the value is retrieved from data container, the (transactional) update of the value may not invalide the entry after write and the cache gets inconsistent. > Consider this interleaving of operations (G=get request from other node, C=commit) > R: read value -> old value > C: update old -> new > C: notify requestors for key > R: add requestor for key -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3738) Entry version gets lost during topology change -> NPE

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3738?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-3738: ----------------------------------------------- Radim Vansa <rvansa(a)redhat.com> changed the Status of [bug 1032693|https://bugzilla.redhat.com/show_bug.cgi?id=1032693] from ON_QA to VERIFIED > Entry version gets lost during topology change -> NPE > ----------------------------------------------------- > > Key: ISPN-3738 > URL: https://issues.jboss.org/browse/ISPN-3738 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache > Affects Versions: 6.0.0.Final > Reporter: Radim Vansa > Assignee: Pedro Ruivo > Priority: Critical > Labels: 620 > Fix For: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Final > > > Replicated TX cache with WSC, A, B are in cluster, C is joining > 0. The current CH already contains A and B as owners, C is joining (is not primary owner of anything yet). B is primary owner of K=V. > 1. A sends PrepareCommand to B and C with put(K, V) (V is null on all nodes) > 2. C receives PrepareCommand and responds with no versions (it is not primary owner) > 3. topology changes on B - primary ownership of K is transfered to C > 4. B receives PrepareCommand, responds without K's version (it is not primary) > 5. B forwards the Prepare to C as it sees that the command has lower topology ID > 6. C responds to B's prepare with version of K > 7. K version is *not* added to B's response, B responds to A > 8. A finds out that topology has changed, forwards prepare to C > 9. C responds to C's prepare with version of K > 10. A receives C's response, but the versions are not added to transaction > 11. A sends out CommitCommand missing version of K > 12. all nodes record K=V without version as usual ImmortalCacheEntry > 13. the next time we try to increase version of K=V, we fail with NPE in SimpleClusteredVersionGenerator (actually when it tries to throw IllegalArgumentException because the null version is unexpected version class) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

[JBoss JIRA] (ISPN-3688) Add Hot Rod protocol version 1.3 description in user guide

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-3688?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-3688: ----------------------------------------------- gsheldon(a)redhat.com changed the Status of [bug 1039974|https://bugzilla.redhat.com/show_bug.cgi?id=1039974] from ASSIGNED to CLOSED > Add Hot Rod protocol version 1.3 description in user guide > ---------------------------------------------------------- > > Key: ISPN-3688 > URL: https://issues.jboss.org/browse/ISPN-3688 > Project: Infinispan > Issue Type: Task > Components: Documentation, Remote protocols > Affects Versions: 6.0.0.Alpha4 > Reporter: Adrian Nistor > Assignee: Adrian Nistor > Labels: 620 > Fix For: 7.0.0.Alpha1 > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

12 years

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues January 2014