[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-3878:
------------------------------------------
Bugzilla Update: Perform
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1049846
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3878:
-----------------------------------
Could the CANCEL command be sent asynchronously?
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Radim Vansa edited comment on ISPN-3878 at 1/8/14 5:57 AM:
-----------------------------------------------------------
Could the StateRequestCommand.Type.CANCEL_STATE_TRANSFER command be sent asynchronously? Is it rather an optimization, or is it required?
was (Author: rvansa):
Could the CANCEL command be sent asynchronously?
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
Radim Vansa created ISPN-3878:
---------------------------------
Summary: Unhandled failing ST cancel leads to deadlock
Key: ISPN-3878
URL: https://issues.jboss.org/browse/ISPN-3878
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 6.0.1.Final
Reporter: Radim Vansa
Assignee: Dan Berindei
Priority: Critical
Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
3. New rebalance is started, changing the CH - requested segment is not in the new CH
4. Some ST are canceled, the cancel command is sent and taking a long time
5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3829) Null value read with RR can be replaced by cache loader value
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3829?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3829:
-----------------------------------------------
Vojtech Juranek <vjuranek(a)redhat.com> changed the Status of [bug 1045579|https://bugzilla.redhat.com/show_bug.cgi?id=1045579] from ON_QA to VERIFIED
> Null value read with RR can be replaced by cache loader value
> -------------------------------------------------------------
>
> Key: ISPN-3829
> URL: https://issues.jboss.org/browse/ISPN-3829
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 6.0.0.Final
> Reporter: William Burns
> Assignee: William Burns
> Labels: 620
> Fix For: 7.0.0.Final
>
>
> Currently the CacheLoaderInterceptor does the following check to determine if it should check the loader for a value
> {code}
> if (e == null || e.isNull() || e.getValue() == null) {
> {code}
> Unfortunately this means it checks the loader when a null value is in the entry when using RR. This can cause an issue if another transaction commits that key and puts a value that results in that value being inserted into the loader.
> This also is a performance issue for RR, since it has to check the loader over and over for a given key even if it was found null the first time.
> Initial thought is to do something like setSkipRemoteGet and that could actually be used for a dual purpose possibly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3737) L1 requestor registered after value read
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3737?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3737:
-----------------------------------------------
Radim Vansa <rvansa(a)redhat.com> changed the Status of [bug 1032545|https://bugzilla.redhat.com/show_bug.cgi?id=1032545] from ON_QA to VERIFIED
> L1 requestor registered after value read
> ----------------------------------------
>
> Key: ISPN-3737
> URL: https://issues.jboss.org/browse/ISPN-3737
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 6.0.0.Final
> Reporter: Radim Vansa
> Assignee: William Burns
> Priority: Critical
> Labels: 620
> Fix For: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Final
>
>
> As the L1 requestor is registered only after the value is retrieved from data container, the (transactional) update of the value may not invalide the entry after write and the cache gets inconsistent.
> Consider this interleaving of operations (G=get request from other node, C=commit)
> R: read value -> old value
> C: update old -> new
> C: notify requestors for key
> R: add requestor for key
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3738) Entry version gets lost during topology change -> NPE
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3738?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3738:
-----------------------------------------------
Radim Vansa <rvansa(a)redhat.com> changed the Status of [bug 1032693|https://bugzilla.redhat.com/show_bug.cgi?id=1032693] from ON_QA to VERIFIED
> Entry version gets lost during topology change -> NPE
> -----------------------------------------------------
>
> Key: ISPN-3738
> URL: https://issues.jboss.org/browse/ISPN-3738
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 6.0.0.Final
> Reporter: Radim Vansa
> Assignee: Pedro Ruivo
> Priority: Critical
> Labels: 620
> Fix For: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Final
>
>
> Replicated TX cache with WSC, A, B are in cluster, C is joining
> 0. The current CH already contains A and B as owners, C is joining (is not primary owner of anything yet). B is primary owner of K=V.
> 1. A sends PrepareCommand to B and C with put(K, V) (V is null on all nodes)
> 2. C receives PrepareCommand and responds with no versions (it is not primary owner)
> 3. topology changes on B - primary ownership of K is transfered to C
> 4. B receives PrepareCommand, responds without K's version (it is not primary)
> 5. B forwards the Prepare to C as it sees that the command has lower topology ID
> 6. C responds to B's prepare with version of K
> 7. K version is *not* added to B's response, B responds to A
> 8. A finds out that topology has changed, forwards prepare to C
> 9. C responds to C's prepare with version of K
> 10. A receives C's response, but the versions are not added to transaction
> 11. A sends out CommitCommand missing version of K
> 12. all nodes record K=V without version as usual ImmortalCacheEntry
> 13. the next time we try to increase version of K=V, we fail with NPE in SimpleClusteredVersionGenerator (actually when it tries to throw IllegalArgumentException because the null version is unexpected version class)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-800) Infinispan inside OSGI
by Brett Meyer (JIRA)
[ https://issues.jboss.org/browse/ISPN-800?page=com.atlassian.jira.plugin.s... ]
Brett Meyer commented on ISPN-800:
----------------------------------
[~bibryam], I'm pulling in your changes. Thanks! Here are a couple of questions:
1.) Why was it necessary to make commons a fragment attached to core, rather than ensuring the imports and exports are correct and running commons as a separate bundle? Just curious.
2.) Was allowing BND to automatically define imports causing issues, or do you simply prefer to explicitly list them?
> Infinispan inside OSGI
> ----------------------
>
> Key: ISPN-800
> URL: https://issues.jboss.org/browse/ISPN-800
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core API
> Reporter: Luca Stancapiano
> Assignee: Galder Zamarreño
>
> We need to import infinispan inside a OSGI repository. Tests are made with Felix.
> I added the configuration to use infinispan inside a osgi repository. We need to ignore all listed dependencies. With this configuration we can install infinispan-core.jar inside OSGI. Its achievement will be as a base installation here: https://github.com/flashboss/infinispan
> I added the Import-Package because you are forced to put manually in Felix all dependencies as jgroups, jboss marshalling, jcip, all apache commons. I've seen infinispan core working by default without all those libraries, so I think the same achievement should be replicated in OSGI.
> Inside the Import-Package tag I excluded those libraries so Infinispan core can be started in default mode without errors. If we want use the replication in OSGI, it is enough add manually the other packages (jgroups.jar etc etc)
> Actually the core bundle can be installed. But to be used it needs theese projects be installed as osgi bundles:
> jboss transaction api 1.0.1.GA
> We patched it. There is a new OSGI version here: https://repository.jboss.org/nexus/content/groups/public/org/jboss/spec/j... )
> jgroups 2.10.1.GA
> (it's a osgi bundle since the 3.x version)
> river 1.2.3.GA
> (opened an issue for marshalling 1.4.0 in JBMAR-118 and https://github.com/flashboss/jboss-marshalling/blob/master/river/pom.xml )
> marshalling-api 1.2.3.GA
> (opened an issue for marshalling 1.4.0 in JBMAR-118 and https://github.com/flashboss/jboss-marshalling/blob/master/api/pom.xml )
> jboss logging spi 2.0.5.GA
> (added a jira issue in JBLOGGING-51 . It could be fixed in the 2.2.0.CR2 version. Fixed in the 3.x version)
> rhq plugin annotations 1.4.0.B01
> (opened a feature request in https://bugzilla.redhat.com/show_bug.cgi?id=657754 )
> i18nlog 1.0.9
> (sent a patch in https://sourceforge.net/projects/i18nlog . It could become a OSGI bundle in the 1.0.10 version. Waiting for a response. Fixed in 1.15)
> log4j 1.2.16
> (that's ok...it is a osgi bundle ;))
> jcip-annotations 1.0
> (I sent a patch via email to brian(a)briangoetz.com and a post in http://tembrel.blogspot.com. Sent the patch in concurrency-interest(a)cs.oswego.edu too. They responded to me. There is a OSGI version with a different artifact name. I changed the dependency in the pom.xml of the parent project)
> We should make sure proper 'Import-Package' property is specified in the MANIFEST.MF so that:
> 1- it fails to load obviously when there's any missing bundles that are essential in using the very core functionality of Infinispan.
> 2 - it does not fail due to the dependency that is not really essential.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years