[
https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi...
]
Dan Berindei commented on ISPN-11176:
-------------------------------------
I have an alternative proposal that I've been mulling over the last few days.
*TLDR;* Pre-expire entries at time {{last access + max-idle timeout}}, actually expire
them at time {{last access + max-idle timeout + max-idle delay}}, or as soon as we know
all other sites have also pre-expired the entry. Reads of pre-expired entries do not keep
the entries alive.
How about, instead of sending touch commands based on the time the entry was last read, we
send them based on the time the entry is supposed to expire?
The purge expired entries task would scan not just for entries that are already expired,
but also for entries that are expired based on the last positive-touch command sent to the
remote sites. Then it's going to send batches of touch+ commands to the remote sites
for all the entries in the 2nd set (positive-touch for the ones that have newer accesses
locally and negative-touch for the ones that don't).
When a read wants to remove an expired entry, it will first check whether the other sites
sent a touch+ command for that entry. If there is a negative touch from all backups, or if
the entry should have expired more than a {{max-idle delay}} (e.g. 1 min) ago, then the
entry is considered truly expired and removed in the local cluster.
Since there is no synchronous communication with the remote sites, reads that happen after
the entry should have expired (or maybe just after the local site sent the negative-touch
command) must not extend the lifespan of the entry.
In case site 1's positive-touch command did take more than {{max-idle delay}} to reach
site 2, and site 2 has already removed the entry, site 2 will send back (through IRAC) a
remove-expired command to force site 1 to remove the entry.
In order to remove the likelihood of this happening, we can send the positive-touch
{{max-idle delay}} before the time the entry would expire based on the last sent
positive-touch command. This requires the max-idle timeout to be at least twice as big as
{{max-idle delay}}, or it wouldn't be very efficient, but it seems like a reasonable
limitation.
The advantage over option 2 is that for entries that are read often, we only send a remote
touch command once per {{max-idle timeout - max-idle delay}}.
The advantage over option 1 is that all the x-site RPCs are in the asynchronous, so all
read operations are fast.
The main disadvantage over option 1 is that we can have reads that see a value but
don't keep it alive. But IMO it matches how IRAC would handle the application updating
a value in site 1 (in order to "touch" it) and removing the same value in site 2
(because the application considers it expired).
Another disadvantage over option 1 is that we would need to send *-touch commands from all
owners, if we rely on the backup owners sending positive-touch commands when they become
primary owner that might be too late.
The disadvantage over option 2 is that, just like option 1, we can't have a purely
active-backup relationship between sites any more. Here, each site has to know which other
sites are active and keep per-entry metadata about which sites sent *-touch commands, so
the take-offline behaviour may need tweaking.
Just like option 2, when a remote site is inaccessible but not yet offline, *-touch
commands may be lost and max-idle entries may expire prematurely. Option 1 would make
reads on entries that should expire time out instead.
When a remote site is brought back online, it would need x-site state transfer to include
the timestamp of the last access and the timestamp of the last sent *-touch command
timestamp so that the lifetime of the transferred entries is kept in sync. This also
requires clock skew between all machines in all sites to be smaller than {{max-idle
delay}}, but I think it's a reasonable assumption, and we could warn when not
satisfied. I think this is a middle ground between option 1 (more tolerant, at most
delaying the expiration with the clock skew) and option 2 (less tolerant, possibly
expiring entries on just one site).
XSite Max Idle
--------------
Key: ISPN-11176
URL:
https://issues.redhat.com/browse/ISPN-11176
Project: Infinispan
Issue Type: Enhancement
Components: Cross-Site Replication, Expiration
Reporter: Will Burns
Assignee: Will Burns
Priority: Major
Fix For: 12.0.0.Final
Max idle expiration currently doesn't work with xsite. That is if an entry was
written and replicated to both sites but one site never reads the value, but the other
does. If they then need to read the value from the other site it will be expired (assuming
the max idle time has elapsed).
There are a few ways we can do this.
1. Keep access times local to every site. When a site finds an entry is expired it asks
the other site(s) if it has a more recent access. If a site is known to have gone down we
should touch all entries, since they may not have updated access times. Requires very
little additional xsite communication.
2. Batch touch commands and only send every so often. Has window of loss, but should be
small. Requires more site usage. Wouldn't work for really low max idle times as an
entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)