[
https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi...
]
Will Burns commented on ISPN-11176:
-----------------------------------
To further expand upon choice #1 here is some more detailed info
XSite Max Idle is a feature that builds upon the max idle feature introduced in
https://issues.redhat.com/browse/ISPN-11020 but extends this to also provide max idle
across multiple sites.
We assume that clustered max idle will invoke the remove expired command when a key is
found to be expired on its local node and that any such command is valid when invoked.
When an entry is found to have expired via max idle, the node who has found the entry has
expired will before replicating the remove expired in the localsite will instead send a
synchronous xsite message to all backups and ask if they have a more recent access for the
given key.
If any site has a more recent access it will send that it is a valid
access. The remote node will also send a touch command to any other node in its local site
to update its access time (this can be done asynchronously). The originating node will
receive the response and send a touch command updating all of its local nodes and return
the value to the user.
If all sites respond that the key hasn't been accessed recently
the original node will add the remove expired command to the IRAC pending queue and let
the command remove the entry from the local site as normal returning null the user. The
remove expired command will not increment the version and instead use the current version,
allowing for any conflicting writes to overwrite it.
For the following we assume the given sites topology for a given key k
Site 1:
Node A (primary owner)
Node B (backup owner)
Node C
Site 2
Node D (primary owner)
Node E (backup owner)
Node F
Use Case 1: Read of non expired entry
NodeB reads the entry and finds it is not expired. NodeB synchronously send a touch
command to Node A (any other owner of the same key in the same site). Once the touch
command is complete the non null value is returned to the user.
Use Case 2: Read of expired entry w/ no concurrent access and expired remote site
NodeB reads the entry finds it is expired. It then invokes a remove expired command.
Before replicating or removing the entry locally NodeB sends an xsite "check last
access" command to Site 2 which then ensures the command is ran on any owner. In this
case an owner (Node D or Node E) says the entry is expired and returns a value to
symbolize this (-1, 0 etc.) which is received by NodeB. NodeB then registers the remove
expired with the IRAC replication queue. NodeB then processes the remove expired as
normall in the local site, removing the value and returning null to the user after
complete. IRAC then remoeves the entry from the remote site asynchronously.
Use Case 3: Read of expired entry w/ no concurrent access and not expired remote site
NodeB reads the entry finds it is expired. It then invokes a remove expired command.
Before replicating or removing the entry locally NodeB sends an xsite "check last
access" command to Site 2 which then ensures the command is ran on any owner. In this
case an owner, Node E, says the entry is not expired it will send a touch command to the
other owner in its site, Node D, and returns a value > 0 which is received by NodeB.
NodeB then broadcasts a touch command to all owners in its local cluster (NodeA) with the
updated access time and once completed returns the non null value to the caller.
Use Case 4: Concurrent reads from same site that are both expired
NodeB and NodeA both read and find the entry is expired. They both send xsite checks and
operate the same as above, just with work duplicated. If the concurrent read is done after
the touch command is replicated then the work is not duplicated. Note that if the reads
are on the same node only one remove expired command is done as it wait on the prior one
to complete first.
Use Case 5: Concurrent reads from different sites that are both expired
If NodeB and NodeD both read an expired entry they will both ask each other if expired
causing duplicated xsite messages (assuming the IRAC replication is not done before the
second access).
Use Case 6: Concurrent expired read with write in same site and expired in remote site
In this case the behavior depends if the read of expired value is done on the primary or
not.
If the read is on the primary than the ordering is handled by locking as the remove
expired and other write operation are not performed concurrently.
If the read is not on the primary then we may have an issue with the xsite check being
performed with the write and possibly losing the write. We may have to promote the xsite
check to be done while holding the lock on primary.
Use Case 7: Concurrent expired read with write in same site and not expired in remote
site
Same as Case 6 above in that we will need to most likely make the xsite max idle check and
write not be able to ran concurrently.
Use Case 8: Concurrent expired read with write in different site and expired in remote
site
Read may see it as not expired (if write already applied) and behaves the same as Use Case
3
However if it does see it expired it will add an entry to IRAC replication queue which
will may conflict with the write, however the write should win as it will have a newer
version as we don't use an incremented version for the remove
Use Case 9: Concurrent expired read with write in different site and not expired in remote
site
Same as Case 8 however the xsite response will always say it is valid. Do we care that the
value is different though?
Consistency Issues
1. Notification of previous value during a write conflict with IRAC may not be updated for
all nodes in the same fashion.
2. Write operations will not return the non expired previous value if the local site is
expired (but not other site). Should we do this??
3. Case 6 above shows that we may require the xsite check to be done while holding the
lock on primary. Need to confirm.
4. Case 9 shows that a read that says it is valid may actually be a read for a new value
that isn't yet replicated from the other site. Is that okay?
XSite Max Idle
--------------
Key: ISPN-11176
URL:
https://issues.redhat.com/browse/ISPN-11176
Project: Infinispan
Issue Type: Enhancement
Components: Cross-Site Replication, Expiration
Reporter: Will Burns
Assignee: Will Burns
Priority: Major
Fix For: 12.0.0.Final
Max idle expiration currently doesn't work with xsite. That is if an entry was
written and replicated to both sites but one site never reads the value, but the other
does. If they then need to read the value from the other site it will be expired (assuming
the max idle time has elapsed).
There are a few ways we can do this.
1. Keep access times local to every site. When a site finds an entry is expired it asks
the other site(s) if it has a more recent access. If a site is known to have gone down we
should touch all entries, since they may not have updated access times. Requires very
little additional xsite communication.
2. Batch touch commands and only send every so often. Has window of loss, but should be
small. Requires more site usage. Wouldn't work for really low max idle times as an
entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)