[infinispan-issues] [JBoss JIRA] (ISPN-11176) XSite Max Idle

Will Burns (Jira) issues at jboss.org
Mon Aug 10 13:57:00 EDT 2020


    [ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377693#comment-14377693 ] 

Will Burns commented on ISPN-11176:
-----------------------------------

{quote}
Does the consistency loss mean a write could be undone, or is it something else that's maybe more palatable to users?
{quote}
The problem is if the node that has the access goes down before replicating the touch. In this case the entry can expire early.


{quote}
I agree sending a touch command for all non-expired entries would be very expensive, especially for x-site where the user would make some estimations before deployment of how much bandwidth Infinispan would require, and having to send+process lots of max-idle commands would increase the latency of all the other x-site commands going through the same site masters.

One thing that's not clear to me is when you say "an entire site is lost", are you talking about a site being taken offline? Because a site can easily disappear from the bridge cluster's view for a while just because its site master crashed.

OTOH the take offline policies are maybe too lenient (although I haven't checked if Pedro made any recent changes w/ IRAC), so a sync x-site RPC is likely to time out before the site is taken offline.

Edit: Forgot to mention one more way a site can become unavailable: if it splits and the cache is configured with when-split="DENY_READ_WRITES". Unless IRAC doesn't allow that configuration anyway?
{quote}

I thought I remember hearing something about all nodes can be site masters now or something? I really don't know much about xsite/IRAC details though.

But if a single node failure can cause a site to become offline, that would not be good.

> XSite Max Idle
> --------------
>
>                 Key: ISPN-11176
>                 URL: https://issues.redhat.com/browse/ISPN-11176
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: Cross-Site Replication, Expiration
>            Reporter: Will Burns
>            Assignee: Will Burns
>            Priority: Major
>             Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the infinispan-issues mailing list