Hi Pedro,
looks like you're diving in some good fun :-)
BTW please keep the dev discussions on the mailing list, adding it.
inline :
On 4 October 2013 22:01, Pedro Ruivo <pedro(a)infinispan.org> wrote:
Hi,
Sanne I need your expertise in here. I'm afraid that the problem is in
FileListOperations :(
I think the FileListOperations implementation needs a transactional cache
with strong consistency...
I'm 99% sure that it is originating the java.lang.AssertionError: file XPTO
does not exist. I find out that we have multiple threads adding and removing
files from the list. The scenario in [1] we see 2 threads loading the key
from the cache loader and one thread adds a file and other removes. the
thread that removes is the last one to commit and the file list is updated
to an old state. When it tries to updat an index, I got the assertion error.
Nice, looks like you're on something.
I've never seen specifically an AssertionError, looks like you have a
new test. Could you share it?
Let's step back a second and consider the Cache usage from the point
of view of FileListOperations.
Note that even if you have two threads writing at the same time, as
long as they are on the same node they will be adding/removing
elements from the same instance of a ConcurrentHashMap.
Since it's the same instance, it doesn't matter which thread will do
the put operation as last: it will push the correct state.
(there is an assumptions here, but we can forget about those for the
sake of this debugging: same node -> fine as there is an external
lock, no other node is allowed to write at the same time)
But you are focusing on CacheStore operations, I can see how that
might be different in terms of implementation but it is not acceptable
that the CacheStore is storing a different state than what we have in
memory.
I don't expect to need a Transaction for that ? Writes need *always*
to be applied in the right order so that the CacheStore content
matches the in-memory content.
So -1 for the problem being in FileListOperations, it's in the
CacheStore. Also, I've run plenty of stress tests on in-memory Caches
and never hit problems: if Infinispan changes the semantics by
enabling a CacheStore, that's a critical issue.
Also, this needs to work correctly with async cachestores.
Also, I was able to "reproduce" the EOF. This was the first
problem I found
and it is related to DefaultCacheManager.startCaches(String... cacheName),
that is starting the caches in separated threads. The SingleFileStore is
failing to start but the exception in "swallow" by the thread. So, Hibernate
Search is not notified and it uses the cache anyway. To make it worst, the
cache accepts the requests but it is not persisting the data. This creates
the EOF in the restart... I will open a JIRA about it to discuss it (maybe
throw an exception in startCaches? and throw exception if any operation is
invoked in a non-successfull started cache?)
+1 on the exception on startCaches, should not be swallowed!
But why is SingleFileStore throwing an exception?
Thanks a lot, very interesting.
Sanne
Any comments?
If I was not clear let me know :)
Thanks!
Pedro
[
1]https://gist.github.com/pruivo/93edeb82a21e9827d2c9