[JBoss JIRA] (ISPN-4778) PessimisticLockingInterceptor throws when handling remote clear command
by Arjan t (JIRA)
Arjan t created ISPN-4778:
-----------------------------
Summary: PessimisticLockingInterceptor throws when handling remote clear command
Key: ISPN-4778
URL: https://issues.jboss.org/browse/ISPN-4778
Project: Infinispan
Issue Type: Bug
Affects Versions: 6.0.2.Final
Environment: JBoss WildFly 8.1.0.FINAL
Reporter: Arjan t
Assignee: Mircea Markus
Using InfiniSpan as its shipped with Jboss WildFly 8.1.0.Final as distributed cache for Hibernate, it appears that the ClearCommand does not work in a cluster when *pessimistic locking* is used. Pessimistic locking seems to be the default in WildFly, even when theoretically it shouldn't be.
This will result in the following exception:
{noformat}
java.lang.ClassCastException: org.infinispan.context.impl.NonTxInvocationContext cannot be cast to org.infinispan.context.impl.TxInvocationContext
at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitClearCommand(PessimisticLockingInterceptor.java:194)
at org.infinispan.commands.write.ClearCommand.acceptVisitor(ClearCommand.java:38)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)
at org.infinispan.commands.AbstractVisitor.visitClearCommand(AbstractVisitor.java:47)
at org.infinispan.commands.write.ClearCommand.acceptVisitor(ClearCommand.java:38)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:255)
at org.infinispan.interceptors.TxInterceptor.visitClearCommand(TxInterceptor.java:206)
at org.infinispan.commands.write.ClearCommand.acceptVisitor(ClearCommand.java:38)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)
at org.infinispan.commands.AbstractVisitor.visitClearCommand(AbstractVisitor.java:47)
at org.infinispan.commands.write.ClearCommand.acceptVisitor(ClearCommand.java:38)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)
at org.infinispan.commands.AbstractVisitor.visitClearCommand(AbstractVisitor.java:47)
at org.infinispan.commands.write.ClearCommand.acceptVisitor(ClearCommand.java:38)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:39)
at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:48)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)
at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:50)
at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{noformat}
The incoming command looks as follows:
{noformat}
CacheRpcCommand cmd:
command:
ClearCommand{flags=null}
icf:
org.infinispan.context.TransactionalInvocationContextFactory@3ef1861e
Interceptor chain:
>> org.infinispan.interceptors.InvocationContextInterceptor -- checks if stopping, otherwise continues
>> org.infinispan.interceptors.CacheMgmtInterceptor -- does nothing
>> org.infinispan.interceptors.TxInterceptor -- checks "shouldEnlist", if false does nothing
>> org.infinispan.interceptors.NotificationInterceptor -- does nothing
>> org.infinispan.interceptors.locking.PessimisticLockingInterceptor -- Throws exception if something in cache
>> org.infinispan.interceptors.EntryWrappingInterceptor
>> org.infinispan.interceptors.InvalidationInterceptor
>> org.infinispan.interceptors.CallInterceptor
{noformat}
The problem seems to be that {{org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand}} always creates a {{NonTxInvocationContext}}. As per the following line of code:
{code}
final InvocationContext ctx = icf.createRemoteInvocationContextForCommand(vc, getOrigin());
{code}
When handling the ClearCommand, the PessimisticLockInterceptor always casts this to a {{TxInvocationContext}} whenever {{dataContainer}} is not empty, e.g. when there is cached data on the node where the clear command arrives. This happens in the following code:
{code}
public Object visitClearCommand(InvocationContext ctx, ClearCommand command) throws Throwable {
try {
boolean skipLocking = hasSkipLocking(command);
long lockTimeout = getLockAcquisitionTimeout(command, skipLocking);
for (InternalCacheEntry entry : dataContainer.entrySet())
lockAndRegisterBackupLock((TxInvocationContext) ctx, entry.getKey(), lockTimeout, skipLocking);
return invokeNextInterceptor(ctx, command);
} catch (Throwable te) {
releaseLocksOnFailureBeforePrepare(ctx);
throw te;
}
}
{code}
So seemingly this can't ever work.
Either the {{PessimisticLockingInterceptor}} can't be in a the interceptor chain when handling commands from a remote destination, or something has to be done about about the {{InvocationContext}} when handling remote commands?
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4777) Replace command not atomic in REPL_SYNC cache mode
by Anuj Shah (JIRA)
[ https://issues.jboss.org/browse/ISPN-4777?page=com.atlassian.jira.plugin.... ]
Anuj Shah updated ISPN-4777:
----------------------------
Affects Version/s: 5.2.6.Final
> Replace command not atomic in REPL_SYNC cache mode
> --------------------------------------------------
>
> Key: ISPN-4777
> URL: https://issues.jboss.org/browse/ISPN-4777
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.6.Final
> Reporter: Anuj Shah
> Assignee: Mircea Markus
> Attachments: ReaderLockerTest.java
>
>
> This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.
> We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.
> Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.
> Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4777) Replace command not atomic in REPL_SYNC cache mode
by Anuj Shah (JIRA)
[ https://issues.jboss.org/browse/ISPN-4777?page=com.atlassian.jira.plugin.... ]
Anuj Shah updated ISPN-4777:
----------------------------
Fix Version/s: (was: 5.2.6.Final)
> Replace command not atomic in REPL_SYNC cache mode
> --------------------------------------------------
>
> Key: ISPN-4777
> URL: https://issues.jboss.org/browse/ISPN-4777
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.6.Final
> Reporter: Anuj Shah
> Assignee: Mircea Markus
> Attachments: ReaderLockerTest.java
>
>
> This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.
> We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.
> Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.
> Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4777) Replace command not atomic in REPL_SYNC cache mode
by Anuj Shah (JIRA)
[ https://issues.jboss.org/browse/ISPN-4777?page=com.atlassian.jira.plugin.... ]
Anuj Shah updated ISPN-4777:
----------------------------
Attachment: ReaderLockerTest.java
Attached the test we used to determine the problem.
> Replace command not atomic in REPL_SYNC cache mode
> --------------------------------------------------
>
> Key: ISPN-4777
> URL: https://issues.jboss.org/browse/ISPN-4777
> Project: Infinispan
> Issue Type: Bug
> Reporter: Anuj Shah
> Assignee: Mircea Markus
> Fix For: 5.2.6.Final
>
> Attachments: ReaderLockerTest.java
>
>
> This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.
> We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.
> Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.
> Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4777) Replace command not atomic in REPL_SYNC cache mode
by Anuj Shah (JIRA)
Anuj Shah created ISPN-4777:
-------------------------------
Summary: Replace command not atomic in REPL_SYNC cache mode
Key: ISPN-4777
URL: https://issues.jboss.org/browse/ISPN-4777
Project: Infinispan
Issue Type: Bug
Reporter: Anuj Shah
Assignee: Mircea Markus
Fix For: 5.2.6.Final
This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.
We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.
Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.
Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4776) The topology id for the merged cache topology is not always bigger than all the partition topology ids
by Dan Berindei (JIRA)
Dan Berindei created ISPN-4776:
----------------------------------
Summary: The topology id for the merged cache topology is not always bigger than all the partition topology ids
Key: ISPN-4776
URL: https://issues.jboss.org/browse/ISPN-4776
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 7.0.0.Beta2
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Blocker
Fix For: 7.0.0.CR1
With the ISPN-4574 fix, I changed the merge algorithm to pick the partition with the most members (both in the _stable_ topology and in the _current_ topology) instead of the partition with the highest topology id.
However, the biggest topology is not necessarily the partition with the highest topology id, so it's possible that some nodes will ignore the merged topology because they already have a higher topology installed. This happened once in ClusterTopologyManagerTest.testClusterRecoveryAfterThreeWaySplit:
{noformat}
00:24:59,286 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Recovered 3 partition(s) for cache cache: [CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeL-25322: 60+0]}, pendingCH=null, unionCH=null}, CacheTopology{id=6, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+10, NodeN-6727: 30+10]}, pendingCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+30, NodeN-6727: 30+30]}, unionCH=null}, CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}]
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Updating topologies after merge for cache cache, current topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, stable topology = CacheTopology{id=4, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (3)[, NodeL-25322: 20+20, NodeM-12972: 20+20, NodeN-6727: 20+20]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cache, topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,288 TRACE (transport-thread-NodeL-p33097-t3:) [LocalTopologyManagerImpl] Ignoring consistent hash update for cache cache, current topology is 8: CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}
{noformat}
Failure logs here: http://ci.infinispan.org/viewLog.html?buildId=12364&buildTypeId=Infinispa...
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4773) Clean up distribution packaging
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4773?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4773:
----------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/2907
> Clean up distribution packaging
> -------------------------------
>
> Key: ISPN-4773
> URL: https://issues.jboss.org/browse/ISPN-4773
> Project: Infinispan
> Issue Type: Enhancement
> Components: Build process
> Affects Versions: 7.0.0.Beta2
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Fix For: 7.0.0.CR1
>
>
> The distribution build process is currently very messy:
> - it is based on overly complex assembly files
> - the layout of the distribution packages is not ideal
> - some of the documentation is not obviously accessible (jmx, xsd)
> - since the javadoc plugin is configured to use aggregate builds, it cannot be run without building the entire tree
> The solution:
> - move the distribution logic to its own module
> - use ant instead of assembly descriptors to build the distribution packages
> - improve the layout, include the missing docs and remove the aging cruft
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4773) Clean up distribution packaging
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4773?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4773:
----------------------------------
Status: Open (was: New)
> Clean up distribution packaging
> -------------------------------
>
> Key: ISPN-4773
> URL: https://issues.jboss.org/browse/ISPN-4773
> Project: Infinispan
> Issue Type: Enhancement
> Components: Build process
> Affects Versions: 7.0.0.Beta2
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Fix For: 7.0.0.CR1
>
>
> The distribution build process is currently very messy:
> - it is based on overly complex assembly files
> - the layout of the distribution packages is not ideal
> - some of the documentation is not obviously accessible (jmx, xsd)
> - since the javadoc plugin is configured to use aggregate builds, it cannot be run without building the entire tree
> The solution:
> - move the distribution logic to its own module
> - use ant instead of assembly descriptors to build the distribution packages
> - improve the layout, include the missing docs and remove the aging cruft
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months