[JBoss JIRA] (ISPN-12179) Allow cache to start in degraded mode
by Ryan Emerson (Jira)
[ https://issues.redhat.com/browse/ISPN-12179?page=com.atlassian.jira.plugi... ]
Ryan Emerson resolved ISPN-12179.
---------------------------------
Fix Version/s: (was: 12.0.0.Final)
Resolution: Done
> Allow cache to start in degraded mode
> -------------------------------------
>
> Key: ISPN-12179
> URL: https://issues.redhat.com/browse/ISPN-12179
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 11.0.1.Final, 12.0.0.Dev01
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.0.0.Dev02
>
>
> Caches expect to receive an initial state for the other nodes during startup ({{await-initial-transfer="true"}} by default).
> There is an exception to this rule when rebalancing is suspended: the cache is able to start on a joiner without receiving any state.
> However, this exception does not apply when a cache is in DEGRADED mode and rebalancing is implicitly suspended. Instead a joiner will wait until the cache goes back to AVAILABLE mode in order to start the initial state transfer, or fail to start with a generic {{Initial state transfer timed out for cache %s on %s}} error message.
> We should handle caches in DEGRADED mode the same way we handle caches with rebalancing suspended, and allow joiners to start without waiting.
>
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 9 months
[JBoss JIRA] (ISPN-12195) Make a About page in the console
by Katia Aresti (Jira)
[ https://issues.redhat.com/browse/ISPN-12195?page=com.atlassian.jira.plugi... ]
Katia Aresti updated ISPN-12195:
--------------------------------
Description: Make a about page in the console, displaying the version that it's running and linking to useful links (was: Make a about page in the console, displaying the version that it's running and linking to documentation, github and tutorials)
> Make a About page in the console
> --------------------------------
>
> Key: ISPN-12195
> URL: https://issues.redhat.com/browse/ISPN-12195
> Project: Infinispan
> Issue Type: Enhancement
> Components: Console
> Affects Versions: 11.0.1.Final
> Reporter: Katia Aresti
> Assignee: Katia Aresti
> Priority: Major
>
> Make a about page in the console, displaying the version that it's running and linking to useful links
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 9 months
[JBoss JIRA] (ISPN-12195) Make a About page in the console
by Katia Aresti (Jira)
Katia Aresti created ISPN-12195:
-----------------------------------
Summary: Make a About page in the console
Key: ISPN-12195
URL: https://issues.redhat.com/browse/ISPN-12195
Project: Infinispan
Issue Type: Enhancement
Components: Console
Affects Versions: 11.0.1.Final
Reporter: Katia Aresti
Assignee: Katia Aresti
Make a about page in the console, displaying the version that it's running and linking to documentation, github and tutorials
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 9 months
[JBoss JIRA] (ISPN-11176) XSite Max Idle
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi... ]
Dan Berindei commented on ISPN-11176:
-------------------------------------
{quote}We assume that clustered max idle will invoke the remove expired command when a key is found to be expired on its local node and that any such command is valid when invoked.
[...]
If any site has a more recent access it will send that it is a valid access.
{quote}
I would have liked more details about "valid when invoked" and "valid access", specifically how clock skew and latency between sites/nodes are addressed (or not relevant).
{quote}The remote node will also send a touch command to any other node in its local site to update its access time (this can be done asynchronously).
{quote}
Are there any differences in consistency depending on whether the remote-site touch command is sync or async?
Maybe if the remote node crashes before actually sending the touch command?
{quote}Use Case 1: Read of non expired entry
NodeB reads the entry and finds it is not expired. NodeB synchronously send a touch command to Node A (any other owner of the same key in the same site). Once the touch command is complete the non null value is returned to the user.
{quote}
Couldn't this origin-site touch command be async as well, maybe using the IRAC version to avoid touching the wrong value?
{quote}Use Case 3: Read of expired entry w/ no concurrent access and not expired remote site
NodeB reads the entry finds it is expired. It then invokes a remove expired command. Before replicating or removing the entry locally NodeB sends an xsite "check last access" command to Site 2 which then ensures the command is ran on any owner. In this case an owner, Node E, says the entry is not expired it will send a touch command to the other owner in its site, Node D, and returns a value 0 which is received by NodeB. NodeB then broadcasts a touch command to all owners in its local cluster (NodeA) with the updated access time and once completed returns the non null value to the caller.
{quote}
Maybe the last local-site touch command from NodeB can be async?
{quote}Use Case 6: Concurrent expired read with write in same site and expired in remote site
If the read is not on the primary then we may have an issue with the xsite check being performed with the write and possibly losing the write.
{quote}
I thought this was scenario that the IRAC version was supposed to solve?
Actually I'm not sure how this works with the non-x-site clustered max-idle, either.
{quote}We may have to promote the xsite check to be done while holding the lock on primary.
{quote}
Or maybe only the primary should ever send remove expired commands, and if the topology changes it should not retry?
{quote}Use Case 9: Concurrent expired read with write in different site and not expired in remote site
Same as Case 8 however the xsite response will always say it is valid. Do we care that the value is different though?
{quote}
If you mean that we say the old value has not expired just because there's now a new value, then yes, it sounds like a consistency problem.
I'm not sure what the IRAC consistency guarantees are though, maybe something similar can happen even without max-idle.
{quote}Consistency Issues
1. Notification of previous value during a write conflict with IRAC may not be updated for all nodes in the same fashion.
{quote}
Not sure what you mean here, are you talking about listeners/cluster listeners/client listeners w/ event factory/query?
{quote}2. Write operations will not return the non expired previous value if the local site is expired (but not other site). Should we do this??
{quote}
I would say yes, we do it for cluster max-idle.
The previous value is also needed for functional commands, both those in FunctionalMap and compute/merge/etc.
{quote}3. Case 6 above shows that we may require the xsite check to be done while holding the lock on primary. Need to confirm.
{quote}
Probably yes.
{quote}4. Case 9 shows that a read that says it is valid may actually be a read for a new value that isn't yet replicated from the other site. Is that okay?
{quote}
Probably no.
Nodes crashing during any of these scenarios will complicate things, but I haven't thought really hard about it (even though I think it might be a problem when it comes to the touch command being async).
> XSite Max Idle
> --------------
>
> Key: ISPN-11176
> URL: https://issues.redhat.com/browse/ISPN-11176
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, Expiration
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 9 months