On 26 Sep 2012, at 08:28, Bela Ban wrote:

Hi Mircea,

I think we need a 3rd option in addition to a retry interval and a
number of attempts, to take a site offline: a min-time (or whatever we
want to call it).

Say we have retry-interval=1000 and maxRetries=5. This means that if we
get a SITE-UNREACHABLE 5 times for a given site, we declare that site
offline and cease sending requests to it.

However, if we have 5 different threads sending requests to the site,
then each of them will increment the counter and thus we take the site
offline after 1 second !
+1, well spotted!

That's where min-time comes in: we should wait at least min-time until
we take any site offline, even if maxRetries has been exceeded.

Example: min-time=60000 (ms), maxRetries=10, retryInterval=1000 (ms)
We don't have a built in retry mechanism, are you referring to the one in jgroups?
Or to add a retry mechanism for xsite operations as well?

If we have 20 threads sending requests to site SFO (which is down), then
we might have numRetries=20 after 10 seconds, and perhaps numRetries=60
after 50 seconds. But only once 60 seconds have elapsed do we take SFO
offline.

The main reason for min-time would be to prevent taking a site offline
during a short period of time when the site master changes and multiple
threads incrementing numRetries in short order.

+1
--
Bela Ban, JGroups lead (http://www.jgroups.org)
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)