[infinispan-dev] Starting caches in parallel

Thu Aug 4 08:47:45 EDT 2011

On Thu, Aug 4, 2011 at 2:01 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> 2011/8/4 Dan Berindei <dan.berindei at gmail.com>:
>> Hi guys
>>
>> I've found a deadlock with transactions spanning multiple caches
>> during rehashing if the joiner's caches are started sequentially (for
>> more details see https://gist.github.com/1124740)
>> After discussing a bit on IRC with Manik and Galderz it appears the
>> only solution for 5.0.0.FINAL would be to have a mechanism to start
>> all the caches in parallel.
>>
>> There are several options to implement that. All of them require the
>> users to know that they should create the caches in parallel at
>> application startup to avoid problems.
>>
>> 1. Advise the users about the problem, but let them create their own
>> threads and call EmbeddedCacheManager.getCache() on those threads.
>>
>>
>> 2. Add a method EmbeddedCacheManager.getCaches() to start multiple
>> caches in parallel.
>>
>> This is not as straightforward as it may seem, first there is a
>> question of whether to use template parameters or not:
>> 2.a. Set<Cache> getCaches(String... cacheNames);
>> vs
>> 2.b. Set<Cache<K, V>> getCaches(String... cacheNames);
>>

It turns out the proper generic version should be Map<String, Cache<?
extends Object, ? extends Object>> createCaches(String...), but even
this version produces unchecked warnings.
We'd also had to make the return type a map, otherwise the user has no
way of getting a specific cache.

>> I don't think having the same K and V for all the caches is going to
>> be very common, so I'd go with 2.a.
>>
>> Then there is the problem of how to request the default cache. I think
>> "" should be fine, but we'd need to document it and also change the
>> regular getCache() methods to accept it.
>
> You could use org.infinispan.manager.CacheContainer.DEFAULT_CACHE_NAME.
>

I thought it was something hidden from the users, I wasn't aware it's
in the public API. This is perfect.

>> 3. Add an async version of EmbeddedCacheManager.getCache():
>>
>> Future<Cache<K, V>> getCacheAsync(String cacheName);
>> Future<Cache<K, V>> getCacheAsync();
>>
>> This nicely sidesteps the generics issue in solution 2 and it's also
>> easier for the users than solution 1, so it's currently my favourite.
>>
>>
>> What do you think?
>
> I think it's very bad but it doesn't surprise me, since starting more
> than a single cache in sequence brings you in the scenario of
> ISPN-658,
> unless you're quick enough. So yes it's not nice, but we highlighted
> the problem since long I think?
>

The workaround for ISPN-658 is to start all caches on application
startup. This scenario appeared specifically because I was trying to
avoid an asymmetric cluster and moved the creation of all caches to a
central location.

> Are you proposing a temporary API to make things work before ISPN-658
> is solved? I don't like the Future approach, it's still unclear that I
> have to send all requests before blocking on any get.

The fix for ISPN-658 will most likely prevent existing nodes from
sending request to a joiner until that joiner has finished the initial
rehash, so I presume it will make my scenario impossible.
There is another JIRA planned for 5.1 that is targeted more
specifically as a workaround for ISPN-658: allow
InboundInvocationHandler to create caches automatically when it has an
incoming command for that cache (can't find the id now).

> I'd make a
>
> void startCaches(String... names);
>
> which implicitly includes the default cache too, and you throw an
> exception if getCache() is used on an unstarted cache, and
> unfortunately you should also throw an exception if startCaches() is
> invoked more than once.
>

I've had a long chat on IRC about this with Sanne and Pete, and our
conclusion was that adding a startCaches method is the clearest option
for the users.
The intention of this API is only to start the caches, so the name
should clearly state that. It will also eliminate an uncertainty for
users: at the end of startCaches call they will know that the caches
have started and that further getCache calls won't fail.

We can't afford to break getCache() for users though (including
ourselves), so getCache() will still start the cache. It should still
print a warning when doing so, directing people to call startCaches
instead.
I don't think we should throw an exception if startCaches is invoked
more than once either to match getCache() and all the other
components.

Cheers
Dan