[
https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi...
]
Will Burns commented on ISPN-11176:
-----------------------------------
{quote}
Does the consistency loss mean a write could be undone, or is it something else that's
maybe more palatable to users?
{quote}
The problem is if the node that has the access goes down before replicating the touch. In
this case the entry can expire early.
{quote}
I agree sending a touch command for all non-expired entries would be very expensive,
especially for x-site where the user would make some estimations before deployment of how
much bandwidth Infinispan would require, and having to send+process lots of max-idle
commands would increase the latency of all the other x-site commands going through the
same site masters.
One thing that's not clear to me is when you say "an entire site is lost",
are you talking about a site being taken offline? Because a site can easily disappear from
the bridge cluster's view for a while just because its site master crashed.
OTOH the take offline policies are maybe too lenient (although I haven't checked if
Pedro made any recent changes w/ IRAC), so a sync x-site RPC is likely to time out before
the site is taken offline.
Edit: Forgot to mention one more way a site can become unavailable: if it splits and the
cache is configured with when-split="DENY_READ_WRITES". Unless IRAC doesn't
allow that configuration anyway?
{quote}
I thought I remember hearing something about all nodes can be site masters now or
something? I really don't know much about xsite/IRAC details though.
But if a single node failure can cause a site to become offline, that would not be good.
XSite Max Idle
--------------
Key: ISPN-11176
URL:
https://issues.redhat.com/browse/ISPN-11176
Project: Infinispan
Issue Type: Enhancement
Components: Cross-Site Replication, Expiration
Reporter: Will Burns
Assignee: Will Burns
Priority: Major
Fix For: 12.0.0.Final
Max idle expiration currently doesn't work with xsite. That is if an entry was
written and replicated to both sites but one site never reads the value, but the other
does. If they then need to read the value from the other site it will be expired (assuming
the max idle time has elapsed).
There are a few ways we can do this.
1. Keep access times local to every site. When a site finds an entry is expired it asks
the other site(s) if it has a more recent access. If a site is known to have gone down we
should touch all entries, since they may not have updated access times. Requires very
little additional xsite communication.
2. Batch touch commands and only send every so often. Has window of loss, but should be
small. Requires more site usage. Wouldn't work for really low max idle times as an
entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)