[infinispan-issues] [JBoss JIRA] (ISPN-11176) XSite Max Idle

Mon Aug 10 07:02:00 EDT 2020

    [ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375488#comment-14375488 ] 

Dan Berindei commented on ISPN-11176:
-------------------------------------

I have an alternative proposal that I've been mulling over the last few days.

*TLDR;* Pre-expire entries at time {{last access + max-idle timeout}}, actually expire them at time {{last access + max-idle timeout + max-idle delay}}, or as soon as we know all other sites have also pre-expired the entry. Reads of pre-expired entries do not keep the entries alive.

How about, instead of sending touch commands based on the time the entry was last read, we send them based on the time the entry is supposed to expire?

The purge expired entries task would scan not just for entries that are already expired, but also for entries that are expired based on the last positive-touch command sent to the remote sites. Then it's going to send batches of touch+ commands to the remote sites for all the entries in the 2nd set (positive-touch for the ones that have newer accesses locally and negative-touch for the ones that don't).

When a read wants to remove an expired entry, it will first check whether the other sites sent a touch+ command for that entry. If there is a negative touch from all backups, or if the entry should have expired more than a {{max-idle delay}} (e.g. 1 min) ago, then the entry is considered truly expired and removed in the local cluster.

Since there is no synchronous communication with the remote sites, reads that happen after the entry should have expired (or maybe just after the local site sent the negative-touch command) must not extend the lifespan of the entry.

In case site 1's positive-touch command did take more than {{max-idle delay}} to reach site 2, and site 2 has already removed the entry, site 2 will send back (through IRAC) a remove-expired command to force site 1 to remove the entry.

In order to remove the likelihood of this happening, we can send the positive-touch {{max-idle delay}} before the time the entry would expire based on the last sent positive-touch command. This requires the max-idle timeout to be at least twice as big as {{max-idle delay}}, or it wouldn't be very efficient, but it seems like a reasonable limitation.

The advantage over option 2 is that for entries that are read often, we only send a remote touch command once per {{max-idle timeout - max-idle delay}}.

The advantage over option 1 is that all the x-site RPCs are in the asynchronous, so all read operations are fast.

The main disadvantage over option 1 is that we can have reads that see a value but don't keep it alive. But IMO it matches how IRAC would handle the application updating a value in site 1 (in order to "touch" it) and removing the same value in site 2 (because the application considers it expired).

Another disadvantage over option 1 is that we would need to send *-touch commands from all owners, if we rely on the backup owners sending positive-touch commands when they become primary owner that might be too late.

The disadvantage over option 2 is that, just like option 1, we can't have a purely active-backup relationship between sites any more. Here, each site has to know which other sites are active and keep per-entry metadata about which sites sent *-touch commands, so the take-offline behaviour may need tweaking.

Just like option 2, when a remote site is inaccessible but not yet offline, *-touch commands may be lost and max-idle entries may expire prematurely. Option 1 would make reads on entries that should expire time out instead.

When a remote site is brought back online, it would need x-site state transfer to include the timestamp of the last access and the timestamp of the last sent *-touch command timestamp so that the lifetime of the transferred entries is kept in sync. This also requires clock skew between all machines in all sites to be smaller than {{max-idle delay}}, but I think it's a reasonable assumption, and we could warn when not satisfied. I think this is a middle ground between option 1 (more tolerant, at most delaying the expiration with the clock skew) and option 2 (less tolerant, possibly expiring entries on just one site).

> XSite Max Idle
> --------------
>
>                 Key: ISPN-11176
>                 URL: https://issues.redhat.com/browse/ISPN-11176
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: Cross-Site Replication, Expiration
>            Reporter: Will Burns
>            Assignee: Will Burns
>            Priority: Major
>             Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.

--
This message was sent by Atlassian Jira
(v7.13.8#713008)