[infinispan-dev] CR3: async stores, again

Tue Aug 31 07:05:20 EDT 2010

On 31 Aug 2010, at 11:57, Sanne Grinovero wrote:

> 2010/8/31 Mircea Markus <mircea.markus at jboss.com>:
>> 
>> On 27 Aug 2010, at 19:37, Sanne Grinovero wrote:
>> 
>>> Some more news about this:
>>> Mircea attached a patch he was working on to JIRA, but he hadn't time
>>> to fix all tests, so I'm continuing.
>>> 
>>> There are two more issues:
>>> 
>>> 1)keeping an ordered list of transactions works fine for transactions
>>> only, but modifications done in a transaction are wrongly reordered
>>> with operations not being done in a transaction.
>> ah I see. The operations within a transaction do not get reordered though, right?
> 
> That's right, if I only use transactions it works as a charm, ignoring
> the corner-case of the clear() operation as explained in next
> paragraph.
> The problem with Lucene (to show a use case) is that it sometimes
> starts a batch for performance reasons; other one-shot operations are
> missing the batch.
> 
>> 
>>> 
>>> 2) Even in the committed current code, I see that a clear() operation
>>> is sent async together with all other operations. Then, when the state
>>> changes are performed, the clear() causes the previously found
>>> operations to be canceled, but "previous" applies here on the iterator
>>> on the state map, not on the order of application of those state
>>> Modifications, so the clear() is removing random Modifications.
>> yes
>>> 
>>> 3) Also my attempt to reuse some code from AbstractCacheStore though
>>> inheritance is not going to make it as this implementation is doing
>>> too much for the needs of AsyncStore: it cares for async Purge, and
>>> all methods I wanted to commit are slightly different because of that.
>>> 
>>> Proposal:
>>> 1- Out-of-transaction modifications are pushed into a BlockingDeque
>>> instead of a Map.
>> do we need a deque  for this? Also do we want it blocking? Perhaps we can make this configurable and operate on an interface directly?
> 
> any kind of order-mantaining will do, I did it with an unbounded
> blocking deque as that's nice to wait on for new work to appear, to
> avoid starvation and provide a clean road for shutdown.
> 
>>> 
>>> 2- A prepare(tx, list) sets the list aside in the "transactions" Map.
>>> 
>>> 3- A commit(tx) takes the list of operations from the "transactions"
>>> and pushes the list on the same BlockingDeque as all other operations,
>>> guaranteeing same order of application.
>> looks good
>>> 
>>> 4- a clear operation clears the deque and pushes a Clear() operation on it
>>> 
>>> 5- a purge operation is sent though the same deque: only risk I see is
>>> being removed by a clear, but I bet there won't be much to purge after
>>> that :) Also we don't care if the purge is going to be handled async
>>> in the other store, that's a store implementation detail from what I
>>> see in AbstractCacheStore.
> Not sure we need to process purge in an async way: it is already
> called asynchronously by the eviction thread, so we can just delegate
> the call to the underlaying call.
> Right, I got inspired by previous version. I have it done now by the
> AsyncStoreCoordinator, not spawning additional AsyncProcessors, as
> it's assumed to be async and return quickly; this way this kind of
> operation gets a sort of "priority" towards other modifications, but
> sill async from the point of view of the caller.
The caller of purge is the eviction thread which is again an async thread (not a user thread). So there would be two levels of async calls.
The only way in which a user can call purge is through the AdvancedCache.getEvictionManager().processEviction() - but I think if one does so it would want to have purging executed sync. 
>>> 
>>> 6- A single processor of AsyncStore drains blocks of a fixed
>>> configurable amount, puts them in a local map, applies the coalescing
>>> algorithm, and puts the results on a new state map.
>>> It seems to me that this worker needs to be exactly one, or reordering
>>> problems arise again in the order of the blocks taken from the deque.
>> +1
> 
> Actually that was the initial idea, but then I had to remove the
> "blocks of fixed configurable amount" to keep this one as simple as
> possible, we can add that later on: it seems to make sense with the
> write-behind delay of Galder's wiki link.
> 
>>> 
>>> 6/a - When receiving a clear() - exceptional case - we clear the map
>>> of operations, and issue the clear() synchronously: not adding
>>> anything to the map, and aquiring a write lock "clearop" to avoid race
>>> conditions with the workers which might already be busy but happen to
>>> finish after the clear().
>> aren't we using only one worker as per the previous paragraph?
> One AsyncStoreCoordinator, N AsyncProcessors. The one manages the
> queue, guarantees ordering and performs the clear() and purge(), when
> it finds Store or Remove operations it pushes these to the
> AsyncProcessors to deal with the (slow?) Store in async way.
> 
>>> 
>>> 6/b - When receiving a purge() this bypasses other logic and is
>>> directly passed to the decorated Store.
>> +1
>>> 
>>> 7- Now here it becomes tricky, we removed the transactions problem,
>>> the clear operation and the purge from the table. But there still is a
>>> map of Modifications to be applied, by N threads, and to which another
>>> thread might write. This looks like the original problem in simplified
>>> form, but reorderings are still possible as worker A might get some
>>> work on key K1, then worker B kicks in, gets new work about K1 and
>>> happens to perform it before A gets priority again.
>>> Clearly this must be performed either by one thread or using very fine
>>> grained locking: as we're only interested on a per-key lock, we don't
>>> care on the order of operations on other keys: we could use a second
>>> map to store the keys which "are being processed": when a worker takes
>>> some work it uses atomic operations to check if somebody is working on
>>> the same stuff, if it is it could either wait or put it back (if it
>>> wasn't changed, otherwise discard) and pick another work to do.
>>> 
>> It might be good to use the same approach as with async marshalling: use one thread per default which would guarantee ordering. Allow a user to specify more that one thread, and clearly explain that reordering might happen: there might be situations where one would not care about ordering after all.
> 
> I think we solved it nicely, guaranteeing correct ordering and still
> doing most of the work in parallel.
> If problems arise, we'll advise to use a single thread.
Good stuff!
> 
>> One more thing I've just noticed: AsyncStore processes Commit but doesn't do the same with Rollback. On rollback it should remove the tx from the transaction(as described at 2) ) map to avoid memory leaks.
> 
> Right, doing that.
> 
> [...more pages were here...]
> 
> Cheers,
> Sanne
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev