[JBoss JIRA] (ISPN-2578) Two PrepareCommands in parallel cause ConcurrentModificationException
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2578?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2578:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> Two PrepareCommands in parallel cause ConcurrentModificationException
> ---------------------------------------------------------------------
>
> Key: ISPN-2578
> URL: https://issues.jboss.org/browse/ISPN-2578
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.Beta5
> Reporter: Radim Vansa
> Assignee: Adrian Nistor
> Priority: Critical
> Fix For: 5.2.0.CR1
>
>
> Situation:
> 1) Node A broadcasts PrepareCommand to nodes B, C
> 2) Node A leaves cluster, causing new topology to be installed
> 3) The command arrives to B and C, with lower topology than the current one
> 4) Both B and C forward the command to node D
> 5) D executes the two commands in parallel and finds out that A has left, therefore executing RollbackCommand
> In {{AbstractTxLockingInterceptor.visitRollbackCommand}} we call {{LockManagerImpl.unlockAll}} which iterates over the keys and unlocks them. As these two prepares aren't synchronized over the {{lockedKeys}} set, one may unlock and remove these keys while the other is iterating through them, causing {{ConcurrentModificationException}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2550) NoSuchElementException in Hot Rod Encoder
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2550?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2550:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> NoSuchElementException in Hot Rod Encoder
> -----------------------------------------
>
> Key: ISPN-2550
> URL: https://issues.jboss.org/browse/ISPN-2550
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta4
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR1
>
>
> Tomas noticed this a while ago in a specific functional test:
> https://bugzilla.redhat.com/show_bug.cgi?id=875151
> I'm creating a more general JIRA, cause I'm having this in resilience test.
> What I found by quick debug, is that here:
> https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/ma...
> {code}
> for (segmentIdx <- 0 until numSegments) {
> val denormalizedSegmentHashIds = allDenormalizedHashIds(segmentIdx)
> val segmentOwners = ch.locateOwnersForSegment(segmentIdx)
> for (ownerIdx <- 0 until segmentOwners.length) {
> val address = segmentOwners(ownerIdx % segmentOwners.size)
> val serverAddress = members(address)
> val hashId = denormalizedSegmentHashIds(ownerIdx)
> log.tracef("Writing hash id %d for %s:%s", hashId, serverAddress.host, serverAddress.port)
> writeString(serverAddress.host, buf)
> writeUnsignedShort(serverAddress.port, buf)
> buf.writeInt(hashId)
> }
> }
> {code}
> we're trying to obtain serverAddress for nonexistent address and NoSuchElementException is not handled properly.
> It hapens after I kill a node in a resilience test and the exception appears when querying for the node in the members cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2483) State transfer issue with the transactions for which the originator has crashed
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2483?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2483:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> State transfer issue with the transactions for which the originator has crashed
> -------------------------------------------------------------------------------
>
> Key: ISPN-2483
> URL: https://issues.jboss.org/browse/ISPN-2483
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer, Transactions
> Affects Versions: 5.1.8.Final, 5.2.0.Beta3
> Reporter: Mircea Markus
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR1, 5.2.0.Final
>
>
> State transfer migrates and prepares the transactions for which the originator has left. On the receiving node, this results in the transaction being prepared and acquiring backup locks which are never released (unless manual intervention).
> This should behave as follows:
> - if there's no recovery enabled, the state producer should not send such transactions but drop them
> - if recovery is enabled these transactions should be sent across. They shouldn't be prepared/acquire backup locks, but be placed in the recovery cache (see RecoveryManagerImpl.inDoubtTransactions)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2473) Forwarding for non-transactional write commands should be asynchronous
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2473?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2473:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> Forwarding for non-transactional write commands should be asynchronous
> ----------------------------------------------------------------------
>
> Key: ISPN-2473
> URL: https://issues.jboss.org/browse/ISPN-2473
> Project: Infinispan
> Issue Type: Sub-task
> Components: State transfer
> Affects Versions: 5.2.0.Beta3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.CR1
>
>
> A write command keeps a lock on the originator while doing the RPC on the actual owners, so forwarding the command back to the originator synchronously leads to a deadlock.
> Keeping track of the real originator for forwarded commands could allow us to avoid the deadlock, but it would be too complicated. OTOH, non-tx caches don't guarantee consistency, so we could just do the forwarding asynchronously.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2588) Lock leak during state transfer (causing StaleLocksTransactionTest to fail)
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2588?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2588:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> Lock leak during state transfer (causing StaleLocksTransactionTest to fail)
> ---------------------------------------------------------------------------
>
> Key: ISPN-2588
> URL: https://issues.jboss.org/browse/ISPN-2588
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta5
> Reporter: Mircea Markus
> Assignee: Dan Berindei
> Fix For: 5.2.0.CR1
>
> Attachments: StaleLocksTransactionTest.zip
>
>
> numOwners=1, pessimistic cache (same applies if A is the only node in cluster)
> 1. tx1 running on A with writes on k, lockOwner(k) == {A}
> 2. A.tx1.lock(k), this doesn't go remotely, and control returns in the InterceptorStack
> 3. at this point B is started and lockOwner(k) == {B}
> 4. the StateTransferInterceptor forwards the command to B which acquires the lock locally
> 5. this is followed by a tx.commit/rollback that would not send the message to B, so the lock on B is pending.
> The logic which determines whether the message to be sent remotely or not is in DistributionInterceptor.visitCommitCommand, which invokes:
> {code:java}
> protected boolean shouldInvokeRemoteTxCommand(TxInvocationContext ctx) {
> return ctx.isOriginLocal() && (ctx.hasModifications() ||
> !((LocalTxInvocationContext) ctx).getRemoteLocksAcquired().isEmpty());
> }
> {code}
> The problem here is that, when forwarding, we don't register the remote node as a locked.I think a more generic solution would also work, e.g. if the viewId of the tx is different from the viewId of the cluster at commit time, always go remotely.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2620) A segment can end up in two different InboutTransferTasks
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2620?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2620:
--------------------------------
Fix Version/s: 5.2.0.CR1
(was: 5.2.0.Beta6)
> A segment can end up in two different InboutTransferTasks
> ---------------------------------------------------------
>
> Key: ISPN-2620
> URL: https://issues.jboss.org/browse/ISPN-2620
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta5
> Reporter: Dan Berindei
> Assignee: Adrian Nistor
> Priority: Critical
> Fix For: 5.2.0.CR1
>
> Attachments: cnolt.log.gz
>
>
> ConcurrentNonOverlappingLeaveTest is de facto disabled because of RehashTestBase doesn't have the @Test annotation (see ISPN-2534).
> I tried to enable it, but rehash wouldn't finish because NodeA wouldn't confirm receiving segment 38 from NodeC (the initial cluster was [NodeA, NodeB, NodeC, NodeD]). I think this is because segment 38 appears in two InboundTransferTasks:
> {noformat}
> 17:12:37,211 TRACE (OOB-5,ISPN,NodeA-49710:dist) [StateConsumerImpl] Segments not received yet for cache dist: {NodeC-40582=[
> InboundTransferTask{segments=[38], finishedSegments=[], unfinishedSegments=[38], source=NodeC-40582, isCancelled=false, topologyId=7, timeout=240000, cacheName=dist},
> InboundTransferTask{segments=[38, 42, 43, 40, 41, 51, 50, 49, 55, 54, 53, 52, 59, 58, 57, 56], finishedSegments=[40], unfinishedSegments=[38, 42, 43, 41, 51, 50, 49, 55, 54, 53, 52, 59, 58, 57, 56], source=NodeC-40582, isCancelled=false, topologyId=9, timeout=240000, cacheName=dist}]}
> 17:12:37,211 DEBUG (OOB-5,ISPN,NodeA-49710:dist) [StateConsumerImpl] Applying new state for segment 38 of cache dist from node NodeC-40582: received 0 cache entries
> 17:12:37,211 TRACE (OOB-5,ISPN,NodeA-49710:dist) [StateConsumerImpl] Segments not received yet for cache dist: {NodeC-40582=[
> InboundTransferTask{segments=[38], finishedSegments=[], unfinishedSegments=[38], source=NodeC-40582, isCancelled=false, topologyId=7, timeout=240000, cacheName=dist},
> InboundTransferTask{segments=[38, 42, 43, 40, 41, 51, 50, 49, 55, 54, 53, 52, 59, 58, 57, 56], finishedSegments=[40, 57, 38], unfinishedSegments=[42, 43, 41, 51, 50, 49, 55, 54, 53, 52, 59, 58, 56], source=NodeC-40582, isCancelled=false, topologyId=9, timeout=240000, cacheName=dist}]}
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years