On 10 May 2010, at 12:41, galder(a)redhat.com wrote:
Hi,
Re:
https://jira.jboss.org/jira/browse/ISPN-425
We've been discussing solutions to the fundamental problem in this issue which is the
fact that operations are allowed in the cache before rehashing has finished starting up.
I've been playing around with a solution based around waiting for rehashing to
complete but this is causing issues with Hot Rod distribution tests. In Hot Rod, this is
what happens:
1. Start Hot Rod server 1 which starts a replicated topology cache.
2. Start Hot Rod server 2 which starts a replicated topology cache.
3. Send a request for a distributed cache called 'hotRodDistSync' in Hot Rod
server 2.
4. As a result of this request, 'hotRodDistSync' cache should be started up but
it does not succeed. It stays in a Rehash join loop, saying:
4595 INFO [org.infinispan.remoting.InboundInvocationHandlerImpl]
(OOB-2,Infinispan-Cluster,eq-52045:) Cache named hotRodDistSync does not exist on this
cache manager!
4595 TRACE [org.infinispan.marshall.VersionAwareMarshaller]
(OOB-2,Infinispan-Cluster,eq-52045:) Wrote version 410
4596 TRACE [org.infinispan.marshall.VersionAwareMarshaller]
(OOB-2,Infinispan-Cluster,eq-64501:) Read version 410
4596 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(Rehasher-eq-64501:) responses: [sender=eq-52045, retval=null, received=true,
suspected=false]
4597 DEBUG [org.infinispan.distribution.JoinTask] (Rehasher-eq-64501:) Retrieved old
consistent hash address list null
4597 TRACE [org.infinispan.distribution.JoinTask] (Rehasher-eq-64501:) Sleeping for 1.54
seconds
The problem here is that Hot Rod server 1 has not yet started 'hotRodDistSync'
cache since no requests where sent to it. Now, this is different to the cache not allowing
invocations yet cos it's in middle of the startup. So, I wondered if
InboundInvocationHandlerImpl.handle() could return a custom response rather than null and
for JoinTask to handle it in such a way that if all the responses received say that the
cache does not exist, then consider rehash completed and finish the process.
I can see how this causes a problem, but I cannot see how returning something other than a
null solves your problem. OK, you won't get into the rehash loop, but since you are
creating caches lazily on the HotRod endpoint, how does this node then create more caches
on remote nodes?
Now, the reason I'm saying to return a custom response is because I can see that
JOIN_REQ returning null can also mean that the coordinator is in the middle of another
join (DMI.requestPermissionToJoin). These two situations are not the same, hence why I
suggest a different treatment.
I agree, different treatment is important. Perhaps what you need, just like
RequestIgnoredResponse, is a CacheDoesntExistResponse. And on seeing such a response
type, the CommandAwareRpcDispatcher could throw a CacheDoesntExistException.
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org