October 2012 - infinispan-dev - Jboss List Archives

by Mircea Markus

Hopefully the last Beta of the 5.2 series, it contains a set of critical bug fixes (especially around non blocking state transfer) and some performance improvements: http://goo.gl/S8S0e Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

12 years, 1 month

2
1
0 / 0

Stale locks: stress test for the ReplaceCommand

by Sanne Grinovero

Hello, I just pushed branch ReplaceOperationStressTest to my github repository; this started initially to verify correctness of the Cache##replace(Object, Object, Object) operation but it wouldn't work because of lock timeouts on high load. I was initially assuming that I was hitting the same issue Manik was working on, still I refactored the test to keep the stress point on the concurrent writes but use a cyclic barrier to give some fairness and enough time to each thread between each test trigger. So now this test is doing, in very simplified pseudo code: for ( each Cache Mode) { for ( many thousands of iterations ) { 1# many threads wait for each other 2# each thread picks a cache instance (from different cachemanagers connected to each other unless it's LOCAL) 3# each thread attempts a valid replace() operation on the chosen cache 4# each thread waits again that each other thread is done with the replace, then we run state checks. } } Using this pattern when we "shoot" all threads on the replace() operation at the same time and then wait, so that I know for sure that contention is not going to last longer on the key than the needed time to perform the single operation, and then each thread gets lots of fair time to acquire the lock. Now the bad news: not only this is proving that the replace() operation is equally broken on every Cache Mode, but also often it fails because some of the threads throw: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [thisIsTheKeyForConcurrentAccess] for requestor [Thread[OOB-4,ISPN,ReplaceOperationStressTest-NodeO-1000,5,Thread Pools]]! Lock held by [Thread[pool-35-thread-2,5,main]] at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:217) at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLockNoCheck(LockManagerImpl.java:200) at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockKey(AbstractLockingInterceptor.java:115) at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitReplaceCommand(NonTransactionalLockingInterceptor.java:118) at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66) and I have no other explanation than that locks aren't always released. I'm not running too many threads: I'm currently using 9 threads picking among 5 clustered CacheManagers, but fails with 2 too; it doesn't take many cycles to fail either, actually in some cluster modes it often fails at the first loop iteration (which initially mislead me in thinking some modes worked fine, that was just my test not being safe enough). Funnily enough while writing this it just failed a run even in single thread mode: in one iteration it was spotted that the lock wasn't cleaned up; this was REPL_SYNC+TX; I don't think the CacheMode was relevant, more that this is quite unlikely and the number of iterations isn't high enough to certify correctness of all other modes; still annoying that apparently it's not even deterministic in single thread. Anyone available to help me out? And please have a look at my test, I might be doing some mistake? Cheers, Sanne

12 years, 1 month

3
3
0 / 0

Transaction table cleanup

by Vladimir Blagojevic

Hey guys, Investigating why EmbeddedCacheManager#cacheRemove hiccups DistributedTwoNodesMapReduceTest. As you might recall upon end of MapReduceTask there is EmbeddedCacheManager#cacheRemove call to remove intermediate caches across the cluster. Very often, almost every test run, execution of cache remove is blocked by ongoing transactions that have not completed - more specifically TransactionTable shows pending remote transaction. This in turn prevents cache stop call, which in turn causes a timeout on cache remove thus failing the test. MapReduce uses txs in MapReduceManagerImpl#combine - it might well be that there is smth wrong there or it could be that somehow TxCompletionNotificationCommand is not cleaning up remote txs. Either way I would appreciate some help here - Mircea? Regards, Vladimir

12 years, 1 month

4
11
0 / 0

Testsuite: hanging TestNG, CDI proken

by Sanne Grinovero

Hello all, besides having regular failures, I also experienced occasional hangs while running the testsuite; in some cases I found the following stack which suggests a TestNG bug: "pool-3-thread-14" prio=10 tid=0x00007f0d84632000 nid=0x1ce5 runnable [0x00007f0d58a36000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(HashMap.java:374) at org.testng.SuiteRunner.runTest(SuiteRunner.java:320) at org.testng.SuiteRunner.access$000(SuiteRunner.java:34) at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:351) at org.testng.internal.thread.ThreadUtil$CountDownLatchedRunnable.run(ThreadUtil.java:147) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Even when taking multiple dumps that thread is still in the same loop, and having a single CPU stuck at 100% I'm guessing the HashMap was being used in some unsafe way regarding concurrency; we're using the first minor version of TestNG which ever supported parallel testsuite invocations so that might not be very solid. Not sure why, but upgrading TestNG from 5.14.10 to 6.7 seems to resolve the problem. Now I wish I could send a pull request, but even skipping just the core testsuite (which always fails for me even in non-parallel mode) many other modules are broken both with and without my patches, so I'm dropping my experiments as I won't send any pull requests if the tests can't back my changes up. As an example the CDI integration is: Tests run: 247, Failures: 102, Errors: 0, Skipped: 143 ...which means 2 tests are fine. Cheers, Sanne

12 years, 2 months

8
14
0 / 0

cce on invocation context

by Ales Justin

I'm constantly seeing this CCE while running CapeDwarf cluster tests: (running 5.2.Beta2 with my iterator offset patch) 17:43:10,175 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-18,null) ISPN000136: Execution error: java.lang.ClassCastException: org.infinispan.context.impl.NonTxInvocationContext cannot be cast to org.infinispan.context.impl.TxInvocationContext at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitPutKeyValueCommand(PessimisticLockingInterceptor.java:114) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132) at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:63) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:212) at org.infinispan.interceptors.TxInterceptor.visitPutKeyValueCommand(TxInterceptor.java:150) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:207) at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:191) at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:136) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:127) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:129) at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:93) at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:63) at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:347) at org.infinispan.statetransfer.StateConsumerImpl.doApplyState(StateConsumerImpl.java:306) at org.infinispan.statetransfer.StateConsumerImpl.applyState(StateConsumerImpl.java:264) at org.infinispan.statetransfer.StateResponseCommand.perform(StateResponseCommand.java:86) at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95) at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:110) at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:82) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:244) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:217) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604) at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130) at org.jgroups.JChannel.up(JChannel.java:670) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020) at org.jgroups.protocols.RSVP.up(RSVP.java:172) at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244) at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736) at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414) at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:645) at org.jgroups.protocols.BARRIER.up(BARRIER.java:102) at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143) at org.jgroups.protocols.FD.up(FD.java:273) at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) at org.jgroups.protocols.Discovery.up(Discovery.java:359) at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2646) at org.jgroups.protocols.TP.passMessageUp(TP.java:1293) at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1856) at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1829) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [classes.jar:1.6.0_37] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [classes.jar:1.6.0_37] at java.lang.Thread.run(Thread.java:680) [classes.jar:1.6.0_37]

12 years, 2 months

2
1
0 / 0

broken lazy query iteration

by Ales Justin

After searching for the needed in haystack, I finally found the problem. (not to mention complete lack of tests for this *basic* feature ...) The problem is with queries with offset when you iterate over them -- offset is never taken into account. There are two possible fixes -- as I see them. 1) In HS: DocumentExtractorImpl::extract takes into account "firstIndex" public EntityInfo extract(int scoreDocIndex) throws IOException { int docId = queryHits.docId( firstIndex + scoreDocIndex ); Document document = extractDocument( fistIndex + scoreDocIndex ); 2) LazyIterator in Infinispan-Query applies the offset: protected EntityInfo loadEntityInfo(int index) { try { return extractor.extract(extractor.getFirstIndex() + index); --- Since those methods are exposed in DocumentExtractor, I would guess they were meant for external code to use them, instead of putting this logic into extractor itself. So, I'll go ahead and provide a patch for (2). -Ales

12 years, 2 months

2
3
0 / 0

infinispan 5.2.0.Beta3

by Mircea Markus

Hi, We have a lot of pull requests pending. Until Beta3 is released, can we please focus on these and slow down the development for now. I've grouped them as follows. Please feel free to shuffle them around/ ask for more feedback if you think appropriate, but please take ownership of them and make sure they get integrated. Dan: ISPN-2373 State transfer does not end because some segments are erroneously reported as unreceived Lookup optimisation in TransactionTable.getLocalTransaction and cleanup in BaseRpcInterceptor hierarchy ISPN-2381 Locks are removed even if not successfully unlocked Adrian: ISPN-2318 Reimplement a Topology-Aware Consistent Hash Galder: ISPN-2429 Cache restart still doesn't work properly for query-enabled caches JBQA-6819 - Added the ant script which merges and generates jacoco code coverage report file. ISPN-2412 Allow specifying container and cache when connecting via CLI Tristan: [5.1.x] ISPN-2414 Fixes to reduce memory consumption of local caches ISPN-2414 Fixes to reduce memory consumption of local caches Mircea: ISPN-2440 JGroupsTransport.invokeRemotely throws SuspectExceptions even ... Fix DummyInMemoryCacheStoreConfigurationBuilder#read() ISPN-2371 The global component registry fails to start components ISPN-2443 - tests are added for reproducing/verifying the issue. ISPN-2386 - Test reproducing/verifying the issue with ClassCastingException in case of CacheLoader usage (with storeAsBinary conf). ISPN-1042 - Enable distributed and Map/Reduce task interruption/cancellation Vladimir: ISPN-2409 - Reproduction/verification case for NotSerializableException occurence. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

12 years, 2 months

4
6
0 / 0

Unit test PR for master

by Vladimir Blagojevic

Hi, I noticed that we recently have had many PR for unit tests integration into master. Would it not make more sense to integrate unit tests with actual fixes rather than having them merged directly to master as PRs? I'd say leave a unit test attached/referenced in JIRA and a developer/contributor fixing the issue will integrate both the fix and the unit test in a single PR. WDYT? Regards, Vladimir

12 years, 2 months

4
4
0 / 0

Cluster scaling

by Matej Lazar

Hi, to simplify a scenario lets say that I have a distributed cache with no copies on a cluster of two nodes. Is stop operation blocked and waiting for cache data to be transferred to the other node, when one node is stopped (normal stop, not a failure)? My use case is using Infinispan in JBoss AS (CapeDwarf). How is node stopping behaving in Infinispan standalone and how in AS ? Thanks, Matej.

12 years, 2 months

4
8
0 / 0

StampedLock

by Galder Zamarreño

FYI: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/StampedLock.html Could some our internal locks (not requiring reentrancy) be moved to this? Cheers, Begin forwarded message: > ------------------------------ > > Message: 2 > Date: Fri, 12 Oct 2012 10:19:42 -0400 > From: Doug Lea <dl(a)cs.oswego.edu> > To: "Concurrency-interest(a)cs.oswego.edu" > <Concurrency-interest(a)cs.oswego.edu> > Subject: [concurrency-interest] StampedLock > Message-ID: <507826FE.7010107(a)cs.oswego.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > As promised (several months ago) an initial version of > class jsr166e.StampedLock is now available. This is > a combined write/read/optimistic lock that has some nice > properties and good performance. For description, see > http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/StampedLock.html > And see the usual links from > http://gee.cs.oswego.edu/dl/concurrency-interest/index.html > for jsr166e jars and sources. > > Reports about usage experiences would be very welcome! > > SequenceLock is now moved to "jsr166e.extra" and no longer > a candidate for inclusion in JDK8. > > -Doug > -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org

12 years, 2 months

2
2
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev October 2012