Hi Sanne.
Thanks for your comments. please see inline...
Cheers,
Pedro
On 10/05/2013 09:15 PM, Sanne Grinovero wrote:
Hi Pedro,
looks like you're diving in some good fun :-)
BTW please keep the dev discussions on the mailing list, adding it.
inline :
On 4 October 2013 22:01, Pedro Ruivo <pedro(a)infinispan.org> wrote:
> Hi,
>
> Sanne I need your expertise in here. I'm afraid that the problem is in
> FileListOperations :(
> I think the FileListOperations implementation needs a transactional cache
> with strong consistency...
>
> I'm 99% sure that it is originating the java.lang.AssertionError: file XPTO
> does not exist. I find out that we have multiple threads adding and removing
> files from the list. The scenario in [1] we see 2 threads loading the key
> from the cache loader and one thread adds a file and other removes. the
> thread that removes is the last one to commit and the file list is updated
> to an old state. When it tries to updat an index, I got the assertion error.
Nice, looks like you're on something.
I've never seen specifically an AssertionError, looks like you have a
new test. Could you share it?
yes of course:
https://github.com/pruivo/infinispan/blob/a4483d08b92d301350823c7fd42725c...
so far, only the tests with eviction are failing...
Let's step back a second and consider the Cache usage from the point
of view of FileListOperations.
Note that even if you have two threads writing at the same time, as
long as they are on the same node they will be adding/removing
elements from the same instance of a ConcurrentHashMap.
Since it's the same instance, it doesn't matter which thread will do
the put operation as last: it will push the correct state.
(there is an assumptions here, but we can forget about those for the
sake of this debugging: same node -> fine as there is an external
lock, no other node is allowed to write at the same time)
100% agreed with you but with cache store, we no longer ensure that 2
(or more) threads are pointing to the same instance of Concurrent Hash Set.
With eviction, the entries are removed from in-memory container and
persisted in the cache store. The scenario I've described, 2 threads are
trying to add/remove a file and the file list does not exist in-memory.
So, each thread will read from cache store and deserialize the byte
array. In the end, each thread will have a pointer for different
instances of ConcurrentHashSet but with the same elements. And when this
happens, we lost one of the operation.
Also, the problem is easily reproduce when you enable the storeAsBinary
for values because each cache.get will deserialize the byte array and
create different instances.
That's why I think we would need a transaction.
But you are focusing on CacheStore operations, I can see how that
might be different in terms of implementation but it is not acceptable
that the CacheStore is storing a different state than what we have in
memory.
nothing to comment here. so far, I haven't seen anything suggesting that.
I don't expect to need a Transaction for that ? Writes need
*always*
to be applied in the right order so that the CacheStore content
matches the in-memory content.
So -1 for the problem being in FileListOperations, it's in the
CacheStore. Also, I've run plenty of stress tests on in-memory Caches
and never hit problems: if Infinispan changes the semantics by
enabling a CacheStore, that's a critical issue.
depends of the semantics you have in mind. If you are assuming that a
cache.get must always return a reference for the same instance then yes,
the cache store changes that semantic.
This because we have a windows where we can have multiple threads
reading from the cache store to put in data container. theses are the
steps when accessing a key:
1) check the in-memory data container.
2) if data container has returned null, then read from cache store
3) put the deserialize value in data container
Also, this needs to work correctly with async cachestores.
hmm yap... I totally forgot the async cache store :(
> Also, I was able to "reproduce" the EOF. This was the first problem I
found
> and it is related to DefaultCacheManager.startCaches(String... cacheName),
> that is starting the caches in separated threads. The SingleFileStore is
> failing to start but the exception in "swallow" by the thread. So,
Hibernate
> Search is not notified and it uses the cache anyway. To make it worst, the
> cache accepts the requests but it is not persisting the data. This creates
> the EOF in the restart... I will open a JIRA about it to discuss it (maybe
> throw an exception in startCaches? and throw exception if any operation is
> invoked in a non-successfull started cache?)
+1 on the exception on startCaches, should not be swallowed!
JIRA already created :)
https://issues.jboss.org/browse/ISPN-3588
But why is SingleFileStore throwing an exception?
File.mkdirs() is returning false. This is performed in the beginning to
create the directories to store the data. Java does not give any hint
why it is failing (like I said, I'm storing the data in /tmp, I have
permission to write and enough disk space)
I'm suggesting the following to make it more reliable:
https://issues.jboss.org/browse/ISPN-3590
Thanks a lot, very interesting.
Sanne
>
> Any comments?
>
> If I was not clear let me know :)
>
> Thanks!
> Pedro
>
> [
1]https://gist.github.com/pruivo/93edeb82a21e9827d2c9