Hi all,

There are a couple of reasons:

- I may not want a task failover policy at all, and the current API kind of obscures this. At the very least, this is different than the 5.1 behavior.

- I specify keys to a task to represent the key set that will be pessimistically acquired to eliminate RPCs. (there’s some Group API magic and taking advantage of the new SyncConsistentHash). A random policy defeats this purpose. Thus if I do want a task failover policy, I want one that allows the task to be retried on the node that owns the representative keys.

Incidentally, the failover policy obscures the original reason the task failed. The exception thrown to the calling node is a FailoverException. The original reason is about 4 levels deep. Example:

java.util.concurrent.ExecutionException: Failover execution failed

at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.failoverExecution(DefaultExecutorService.java:855)

…

Caused by: java.lang.Exception: Failover execution failed

... 45 more

Caused by: java.util.concurrent.ExecutionException: Failover execution failed

at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.failoverExecution(DefaultExecutorService.java:852)

... 44 more

Caused by: java.lang.Exception: Failover execution failed

... 45 more

Caused by: java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [5 seconds] on key [ServiceGroupKey[edgeDeviceId=1,serviceGroupNo=101]] for requestor [GlobalTransaction:<east-dg02-61087(east)>:1203:remote]! Lock held by [GlobalTransaction:<east-dg02-61087(east)>:1198:remote]

at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)

... 43 more

Caused by: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [5 seconds] on key [ServiceGroupKey[edgeDeviceId=1,serviceGroupNo=101]] for requestor [GlobalTransaction:<east-dg02-61087(east)>:1203:remote]! Lock held by [GlobalTransaction:<east-dg02-61087(east)>:1198:remote]

at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:217)

at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLock(LockManagerImpl.java:190)

I really need the original exception for processing/reporting to my analytics engine.

Thanks,

Erik

From: Mircea Markus [mailto:mircea.markus@jboss.com]
Sent: Tuesday, October 23, 2012 11:40 AM
To: infinispan -Dev List
Cc: Erik Salter
Subject: Re: [infinispan-dev] Dist.exec failover

On 22 Oct 2012, at 19:48, Vladimir Blagojevic wrote:

Hey guys,

Erik noted that we should by default have no failover policy installed
rather the default random policy we currently have.

The random policy tries to re-run the task in case of a cluster failure. I imagine the user would do the same, so not sure why not to add it..

Erik, mind comment about the cons of having this failover in place?

Also he noted that
keys are never supplied to failover policy and they might be important
when it comes to a deciding where to dispatch the failed over task.

The main reason for passing the keys is in order to calculate the locality of the task based on consistent hash. We don't have a reference to the consistent hash in the DistributedTaskFailoverPolicy, so not sure this would be useful as it is.

Our current interface is :

public interface DistributedTaskFailoverPolicy {
Address failover(Address failedExecution, List<Address>
executionCandidates, Exception cause);
}

Rather than adding yet another parameter here maybe we should make a
simple container class

public class FailoverContext {

Address failedExecution;
List<Address> executionCandidates;
Exception cause;
List<Object> inputKeys;
}

and have

public interface DistributedTaskFailoverPolicy {
Address failover(FailoverContext context);
}

WDYT?

Regards,
Vladimir
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,

--
Mircea Markus

Infinispan lead (www.infinispan.org)