[infinispan-dev] Need help

Pedro Ruivo pedro at infinispan.org
Sat Oct 5 19:01:12 EDT 2013


Hi Sanne.

Thanks for your comments. please see inline...

Cheers,
Pedro

On 10/05/2013 09:15 PM, Sanne Grinovero wrote:
> Hi Pedro,
> looks like you're diving in some good fun :-)
> BTW please keep the dev discussions on the mailing list, adding it.
>
> inline :
>
> On 4 October 2013 22:01, Pedro Ruivo <pedro at infinispan.org> wrote:
>> Hi,
>>
>> Sanne I need your expertise in here. I'm afraid that the problem is in
>> FileListOperations :(
>> I think the FileListOperations implementation needs a transactional cache
>> with strong consistency...
>>
>> I'm 99% sure that it is originating the java.lang.AssertionError: file XPTO
>> does not exist. I find out that we have multiple threads adding and removing
>> files from the list. The scenario in [1] we see 2 threads loading the key
>> from the cache loader and one thread adds a file and other removes. the
>> thread that removes is the last one to commit and the file list is updated
>> to an old state. When it tries to updat an index, I got the assertion error.
>
> Nice, looks like you're on something.
> I've never seen specifically an AssertionError, looks like you have a
> new test. Could you share it?

yes of course: 
https://github.com/pruivo/infinispan/blob/a4483d08b92d301350823c7fd42725c339a65c7b/query/src/test/java/org/infinispan/query/cacheloaders/CacheStoreTest.java

so far, only the tests with eviction are failing...

>
> Let's step back a second and consider the Cache usage from the point
> of view of FileListOperations.
> Note that even if you have two threads writing at the same time, as
> long as they are on the same node they will be adding/removing
> elements from the same instance of a ConcurrentHashMap.
> Since it's the same instance, it doesn't matter which thread will do
> the put operation as last: it will push the correct state.
> (there is an assumptions here, but we can forget about those for the
> sake of this debugging: same node -> fine as there is an external
> lock, no other node is allowed to write at the same time)
>

100% agreed with you but with cache store, we no longer ensure that 2 
(or more) threads are pointing to the same instance of Concurrent Hash Set.

With eviction, the entries are removed from in-memory container and 
persisted in the cache store. The scenario I've described, 2 threads are 
trying to add/remove a file and the file list does not exist in-memory. 
So, each thread will read from cache store and deserialize the byte 
array. In the end, each thread will have a pointer for different 
instances of ConcurrentHashSet but with the same elements. And when this 
happens, we lost one of the operation.

Also, the problem is easily reproduce when you enable the storeAsBinary 
for values because each cache.get will deserialize the byte array and 
create different instances.

That's why I think we would need a transaction.

> But you are focusing on CacheStore operations, I can see how that
> might be different in terms of implementation but it is not acceptable
> that the CacheStore is storing a different state than what we have in
> memory.

nothing to comment here. so far, I haven't seen anything suggesting that.

> I don't expect to need a Transaction for that ? Writes need *always*
> to be applied in the right order so that the CacheStore content
> matches the in-memory content.
>
> So -1 for the problem being in FileListOperations, it's in the
> CacheStore. Also, I've run plenty of stress tests on in-memory Caches
> and never hit problems: if Infinispan changes the semantics by
> enabling a CacheStore, that's a critical issue.

depends of the semantics you have in mind. If you are assuming that a 
cache.get must always return a reference for the same instance then yes, 
the cache store changes that semantic.

This because we have a windows where we can have multiple threads 
reading from the cache store to put in data container. theses are the 
steps when accessing a key:

1) check the in-memory data container.
2) if data container has returned null, then read from cache store
3) put the deserialize value in data container

>
> Also, this needs to work correctly with async cachestores.
>

hmm yap... I totally forgot the async cache store :(

>
>> Also, I was able to "reproduce" the EOF. This was the first problem I found
>> and it is related to DefaultCacheManager.startCaches(String... cacheName),
>> that is starting the caches in separated threads. The SingleFileStore is
>> failing to start but the exception in "swallow" by the thread. So, Hibernate
>> Search is not notified and it uses the cache anyway. To make it worst, the
>> cache accepts the requests but it is not persisting the data. This creates
>> the EOF in the restart... I will open a JIRA about it to discuss it (maybe
>> throw an exception in startCaches? and throw exception if any operation is
>> invoked in a non-successfull started cache?)
>
> +1 on the exception on startCaches, should not be swallowed!

JIRA already created :)
https://issues.jboss.org/browse/ISPN-3588

>
> But why is SingleFileStore throwing an exception?

File.mkdirs() is returning false. This is performed in the beginning to 
create the directories to store the data. Java does not give any hint 
why it is failing (like I said, I'm storing the data in /tmp, I have 
permission to write and enough disk space)

I'm suggesting the following to make it more reliable:
https://issues.jboss.org/browse/ISPN-3590

>
> Thanks a lot, very interesting.
>
> Sanne
>
>>
>> Any comments?
>>
>> If I was not clear let me know :)
>>
>> Thanks!
>> Pedro
>>
>> [1]https://gist.github.com/pruivo/93edeb82a21e9827d2c9


More information about the infinispan-dev mailing list