[Red Hat JIRA] (ISPN-12048) Improve Cross-Site statistics
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-12048?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-12048:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Improve Cross-Site statistics
> -----------------------------
>
> Key: ISPN-12048
> URL: https://issues.redhat.com/browse/ISPN-12048
> Project: Infinispan
> Issue Type: Enhancement
> Affects Versions: 11.0.0.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> Improve the cross-site statistics by adding a per-cache/site pair. Example:
> * sender site
> ** Cache_A
> *** Site_b (min/avg/max rtt, nr_requests_sent)
> ** Cache_B
> *** Site_b (min/avg/max rtt, nr_requests_sent)
> *** Site_c (min/avg/max rtt, nr_requests_sent)
> * Receiver site
> ** Cache_C
> *** Site_a (nr_requests_received)
> *** Site_c (nr_requests_received)
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-11993) Allow automatic registration of Protobuf schemas and marshallers
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11993?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11993:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Allow automatic registration of Protobuf schemas and marshallers
> ----------------------------------------------------------------
>
> Key: ISPN-11993
> URL: https://issues.redhat.com/browse/ISPN-11993
> Project: Infinispan
> Issue Type: Feature Request
> Components: Hot Rod, Marshalling
> Affects Versions: 11.0.0.CR1
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> Currently it's necessary for users to automatically add {{SerializationContextIntializer}} instances to the client via {{addContextInitializers}} and for .proto files to be registered like so:
> {code:java}
> Path proto = Paths.get(Query.class.getClassLoader().getResource("proto/sheep.proto").toURI());
> cacheManager.getCache(ProtobufMetadataManagerConstants.PROTOBUF_METADATA_CACHE_NAME).put("sheep.proto", Files.readString(proto));
> {code}
> Instead we should allow users to configure the client so that:
> # {{SerializationContextInitalizer}} services are automatically registerd if {{builder.autoAddAllContextInitializers()}} is configured.
> # {{*.proto}} files on the classpath and in available {{SerializationContextInitializers}} are automatically registered with the server if {{.autoRegisterSchemas()}} is configured.
> ProtoStreams {{AutoProtoSchemaBuilder}} already provides a `service` attribute to generate the service files for {{SerializationContextInitiailizer}}, however this is false by default. We should update this to true so that users have one less knob to confgure.
> NOTE: {{autoRegisterSchemas()}} will fail if authorization is enabled and the client does not have the ___schema_manager role.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-12024) Preloading does not report unmarshalling errors
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-12024?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-12024:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Preloading does not report unmarshalling errors
> -----------------------------------------------
>
> Key: ISPN-12024
> URL: https://issues.redhat.com/browse/ISPN-12024
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Loaders and Stores
> Affects Versions: 11.0.0.Final, 10.1.8.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> When cache preloading hits an unmarshalling error, it stops in order to preserve the consistency of the cache. However, the thrown exception and the log won't contain the key with the problem, just a generic {{Unable to preload!}} message.
> We should at the very least include the key in the exception message. But we could also provide a configuration option to ignore unmarshalling errors and only log a summary with the keys that could not be loaded at the end.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-12005) Store purge should ignore errors
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-12005?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-12005:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Store purge should ignore errors
> --------------------------------
>
> Key: ISPN-12005
> URL: https://issues.redhat.com/browse/ISPN-12005
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Loaders and Stores
> Affects Versions: 10.1.5.Final, 11.0.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.1.0.Final, 11.0.10.Final
>
>
> Purging of expired entries from stores is a pretty involved process, especially with {{RocksDBStore}}. When there's a problem unmarshalling the expired bucket or deleting an expired key, the purge task bails out immediately, without processing the remaining keys. To make matters worse, the exception it not logged anywhere. The only sign that something is wrong is a growing store (in the case of {{RocksDBStore}}, a growing number of SST files).
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-12151) Add error count metric
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-12151?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-12151:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Add error count metric
> ----------------------
>
> Key: ISPN-12151
> URL: https://issues.redhat.com/browse/ISPN-12151
> Project: Infinispan
> Issue Type: Feature Request
> Components: Analytics
> Affects Versions: 11.0.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> {{CacheMgmtInterceptor}} has metrics for the different kind of key operations, but errors are not tracked. We should add a metric to count errors, and perhaps a histogram for the duration of failed invocations as well.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-12128) Remote read during state transfer should store entry in data container
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-12128?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-12128:
-----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Remote read during state transfer should store entry in data container
> ----------------------------------------------------------------------
>
> Key: ISPN-12128
> URL: https://issues.redhat.com/browse/ISPN-12128
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 11.0.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Fix For: 12.1.0.Final
>
>
> A cache with {{await-initial-transfer="false"}} will execute cache operations while it is receiving state, during rebalance phase {{READ_OLD_WRITE_ALL}}. State transfer can take a long time, and during that time, it is not a read owner for any segments, it is only a write owner for the segments it is receiving.
> That means {{cache.get(k)}} will perform a remote lookup every time, even AFTER the node received entry {{k=v}} via transfer, as long as not all the nodes have confirmed the end of state transfer and the coordinator hasn't changed the rebalance phase to {{READ_ALL_WRITE_ALL}}.
> The extra remote lookups can have a negative impact on application performance. Especially in a replicated cache, the application would expect reads to be very fast, and the repeated remote lookups would break that assumption.
> In order to encourage {{await-initial-transfer="false"}} and eventually make it the default (ISPN-9112), we should limit the number of remote lookups performed by a node while is is a write-only owner of a key:
> * Remote reads should write the entry in the data container, the same way state transfer would (i.e. skipping the write if a write operation already changed the entry).
> * Reads should first look up the key in the local data container before going remotely. If the key exists locally, the value can be returned directly.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-1407) Server-initiated cluster switch
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-1407?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-1407:
----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Server-initiated cluster switch
> -------------------------------
>
> Key: ISPN-1407
> URL: https://issues.redhat.com/browse/ISPN-1407
> Project: Infinispan
> Issue Type: Feature Request
> Components: Remote Protocols
> Reporter: Sanne Grinovero
> Assignee: Tristan Tarrant
> Priority: Major
> Fix For: 12.1.0.Final
>
>
> when starting a rolling upgrade it will be needed to upgrade the server_lists with new entries at the start of the upgrade, and possibly remove some entries when the upgrade is done
> Aside from exposing this operation through the REST endpoint, the CLI should be extended with a {{migrate cluster clients}} command which will tell all clients connected to the source cluster to switch to the target cluster.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months
[Red Hat JIRA] (ISPN-5290) Better automatic merge for caches with enabled partition handling
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-5290?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-5290:
----------------------------------
Fix Version/s: 12.1.0.Final
(was: 12.0.0.Final)
> Better automatic merge for caches with enabled partition handling
> -----------------------------------------------------------------
>
> Key: ISPN-5290
> URL: https://issues.redhat.com/browse/ISPN-5290
> Project: Infinispan
> Issue Type: Feature Request
> Environment: JDG cluster with partitionHandling enabled
> Reporter: Wolf-Dieter Fink
> Assignee: Dan Berindei
> Priority: Major
> Labels: cluster, clustering, infinispan, partition_handling
> Fix For: 12.1.0.Final
>
>
> At the moment there is no detection whether a node which join a cluster is one of the nodes which are known from the "last stable view" or not.
> This will have the drawback that the cluster will be still in DEGRADED_MODE if there are some nodes restarted during the split-brain.
> Assuming the cluster split is a power failure of some nodes the available nodes are DEGRADED as >=numOwners are lost.
> If the failed nodes are restarted, let's say we have an application which use library mode in EAP, these instances are now identified as new nodes as the node-ID's are different.
> If these nodes join the 'cluster' all the nodes are still degraded as the restarted are now known as different nodes and not as the lost nodes, so the cluster will not heal and come back to AVAILABLE.
> There is a way to prevent some of the possibilities by using server hinting to ensure that at least one owner will survive.
> But there are other cases where it would be good to have a different strategy to get the cluster back to AVAILABLE mode.
> During the split-brain there is no way to continue as there is no possiblity to know whether "the other" part is gone or still acessable but not seen.
> For a shared persistence it might be possible but there is a huge drawback for normal working state to synchronize that with locking and version columns.
> If the node ID can be kept I see the following enhancements:
> - with a shared persistence there should no data lost, if all nodes are back in the cluster it can go AVAILABLE and reload the missing entries
> - for a 'side' cache the values are calculated or retrieved from other (slow) systems, so the cluster can be AVAILABLE and reload the entries
> - In other cases there might be a WARNING/ERROR that all members are back from split, there is maybe some data lost and automaticaly or manually set back to AVAILABLE
> It might be complicated to calculate this modes, but a configuration for partition-handling might give the possibility to the administrator to decide which behaviour is apropriate for a cache
> i.e.
> <partition-handling enabled="true" healing="HEALING.MODE"/>
> where modes are
> AVAILABLE_NO_WARNING back to available after all nodes from "last stable" are back
> AVAILABLE_WARNING_DATALOST dto. but log a warning that some DATA can be lost
> WARNING_DATALOST only a warning and a hint how to enable manually
> NONE same as current behaviour (if necessary, maybe WARNING_DATALOST is similar or better)
--
This message was sent by Atlassian Jira
(v8.13.1#813001)
4 years, 8 months