[
https://jira.jboss.org/jira/browse/ISPN-277?page=com.atlassian.jira.plugi...
]
Manik Surtani commented on ISPN-277:
------------------------------------
Corrected this. Essentially,
* FIFODataContainer and LRUDataContainer are implementations of a concurrent linked
hashmap, similar to the JDK's LinkedHashMap, utilising the Sundell/Tsigas lock-free
linked list algorithm [1].
* There is a race in the CorrectPrev() function, which leads to an infinite loop under
very specific conditions. Easy to recreate in a stress test I have.
* Fixing this is very hard. I now have 2 alternate implementations (FIFODataContainer and
FIFOAMRDataContainer, see Javadocs on both to understand the differences)
It appears that the problem lies in the concurrent update of a link on get as well as put,
something that the algorithm may not support. I am in conversation with the algorithm
designers to detail this out some more.
In the meanwhile, I have implemented FIFO and LRU data containers using standard JDK
collections but it will perform much worse - O(N log N) during iteration although put, get
and remove have the same O(1) performance. There is a memory overhead too, since last
used or creation timestamps are collected for every entry (an extra 8 bytes per entry for
immortal entries, no extra overhead for others).
My plan is to go with these implementations for now, and once we have sorted the
concurrent linked list issues we look at reverting to the better performing
implementations.
[1]
http://www.md.chalmers.se/~tsigas/papers/Lock-Free-Deques-Doubly-Lists-JP...
LRU data container endlesly looping or exhibiting heavy contention
------------------------------------------------------------------
Key: ISPN-277
URL:
https://jira.jboss.org/jira/browse/ISPN-277
Project: Infinispan
Issue Type: Bug
Components: Eviction
Affects Versions: 4.0.0.CR2
Reporter: Galder Zamarreno
Assignee: Manik Surtani
Priority: Critical
Fix For: 4.0.0.CR3
Attachments: 3dumps-fail.txt, td2.txt
Something around LRU container is not working fine. The attached log from an concurrency
test in the 2nd level cache shows that in 3 thread dumps taken over 30 seconds appart,
UserRunnerThread-5 is stuck in:
"UserRunnerThread-5" prio=10 tid=0x6f65bc00 nid=0xdea runnable [0x05efb000]
java.lang.Thread.State: RUNNABLE
at
org.infinispan.container.FIFODataContainer$LinkedEntry.casNext(FIFODataContainer.java:180)
at org.infinispan.container.FIFODataContainer.correctPrev(FIFODataContainer.java:329)
at org.infinispan.container.FIFODataContainer.linkAtEnd(FIFODataContainer.java:252)
at org.infinispan.container.LRUDataContainer.put(LRUDataContainer.java:70)
at
org.infinispan.container.entries.ReadCommittedEntry.commit(ReadCommittedEntry.java:161)
at
org.infinispan.interceptors.LockingInterceptor.commitEntry(LockingInterceptor.java:298)
at
org.infinispan.interceptors.LockingInterceptor.cleanupLocks(LockingInterceptor.java:281)
at
org.infinispan.interceptors.LockingInterceptor.doAfterCall(LockingInterceptor.java:243)
at
org.infinispan.interceptors.LockingInterceptor.visitPutKeyValueCommand(LockingInterceptor.java:200)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at
org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at
org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:57)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at
org.infinispan.interceptors.MarshalledValueInterceptor.visitPutKeyValueCommand(MarshalledValueInterceptor.java:93)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at
org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:185)
at
org.infinispan.interceptors.TxInterceptor.visitPutKeyValueCommand(TxInterceptor.java:132)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
at
org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:57)
at
org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:76)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
at org.infinispan.CacheDelegate.putIfAbsent(CacheDelegate.java:422)
at org.infinispan.CacheDelegate.putIfAbsent(CacheDelegate.java:153)
at org.infinispan.CacheDelegate.putForExternalRead(CacheDelegate.java:243)
at
org.hibernate.cache.infinispan.util.CacheAdapterImpl.putForExternalRead(CacheAdapterImpl.java:115)
at
org.hibernate.cache.infinispan.access.TransactionalAccessDelegate.putFromLoad(TransactionalAccessDelegate.java:91)
at
org.hibernate.cache.infinispan.collection.TransactionalAccess.putFromLoad(TransactionalAccess.java:44)
at
org.hibernate.engine.loading.CollectionLoadContext.addCollectionToCache(CollectionLoadContext.java:333)
at
org.hibernate.engine.loading.CollectionLoadContext.endLoadingCollection(CollectionLoadContext.java:279)
at
org.hibernate.engine.loading.CollectionLoadContext.endLoadingCollections(CollectionLoadContext.java:245)
at
org.hibernate.engine.loading.CollectionLoadContext.endLoadingCollections(CollectionLoadContext.java:218)
at org.hibernate.loader.Loader.endCollectionLoad(Loader.java:901)
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:886)
at org.hibernate.loader.Loader.doQuery(Loader.java:750)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:257)
at org.hibernate.loader.Loader.loadCollection(Loader.java:2019)
at
org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:62)
at
org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:628)
at
org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:83)
at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1817)
at
org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:366)
at
org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:108)
at
org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:131)
at org.hibernate.collection.PersistentSet.isEmpty(PersistentSet.java:169)
at
org.hibernate.test.cache.infinispan.functional.ConcurrentWriteTest.getFirstContact(ConcurrentWriteTest.java:405)
at
org.hibernate.test.cache.infinispan.functional.ConcurrentWriteTest.access$0(ConcurrentWriteTest.java:397)
at
org.hibernate.test.cache.infinispan.functional.ConcurrentWriteTest$UserRunner.contactExists(ConcurrentWriteTest.java:519)
at
org.hibernate.test.cache.infinispan.functional.ConcurrentWriteTest$UserRunner.call(ConcurrentWriteTest.java:540)
at
org.hibernate.test.cache.infinispan.functional.ConcurrentWriteTest$UserRunner.call(ConcurrentWriteTest.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
This concurrency test does not get to finish in over a minute. However, once LRU is
switched to NONE or FIFO, the test runs in 5 seconds.
So, something looks fishy with LRU.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira