[infinispan-dev] x-site: taking a site offline automatically

Bela Ban bban at redhat.com
Wed Sep 26 03:28:02 EDT 2012


Hi Mircea,

I think we need a 3rd option in addition to a retry interval and a 
number of attempts, to take a site offline: a min-time (or whatever we 
want to call it).

Say we have retry-interval=1000 and maxRetries=5. This means that if we 
get a SITE-UNREACHABLE 5 times for a given site, we declare that site 
offline and cease sending requests to it.

However, if we have 5 different threads sending requests to the site, 
then each of them will increment the counter and thus we take the site 
offline after 1 second !

That's where min-time comes in: we should wait at least min-time until 
we take any site offline, even if maxRetries has been exceeded.

Example: min-time=60000 (ms), maxRetries=10, retryInterval=1000 (ms)

If we have 20 threads sending requests to site SFO (which is down), then 
we might have numRetries=20 after 10 seconds, and perhaps numRetries=60 
after 50 seconds. But only once 60 seconds have elapsed do we take SFO 
offline.

The main reason for min-time would be to prevent taking a site offline 
during a short period of time when the site master changes and multiple 
threads incrementing numRetries in short order.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list