[JBoss JIRA] (ISPN-12212) Emit Cloud Events for security audit events
by Pedro Zapata Fernandez (Jira)
Pedro Zapata Fernandez created ISPN-12212:
---------------------------------------------
Summary: Emit Cloud Events for security audit events
Key: ISPN-12212
URL: https://issues.redhat.com/browse/ISPN-12212
Project: Infinispan
Issue Type: Feature Request
Reporter: John Doyle
Assignee: Tristan Tarrant
If infinispan emitted Cloud Events for security audit events users could use Knative to trigger responses to the audit events.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 2 months
[JBoss JIRA] (ISPN-11176) XSite Max Idle
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi... ]
Dan Berindei commented on ISPN-11176:
-------------------------------------
{quote}
I thought I remember hearing something about all nodes can be site masters now or something? I really don't know much about xsite/IRAC details though.
{quote}
In theory, yes. In practice, I think we only set one.
{quote}
But if a single node failure can cause a site to become offline, that would not be good.
{quote}
It's a bit trickier... a site will be taken online based on the {{TakeOfflineConfiguration}}, which has two attributes: {{after-failures}} and {{min-wait}}. By default they are both 0, so only the administrator can take the site offline manually. If you set both, then the site will be taken offline if {{after-failures}} consecutive backup operations fail (I assume we'll want to ignore {{check last access}} and {{touch}} commands here) and at least {{min-wait}} millis passed since the first of those consecutive failures. This means you can have sites that are present in the bridge cluster view but are not yet taken offline, sites that are offline and yet they're in the bridge cluster view, and sites that are both offline for some caches an online for other caches. You can also have operations that still wait for backup responses after the remote site was taken offline, and RPCs that time out and fail without the site being taken offline.
I'd love it if we could improve the take-offline story, so it's more similar to how failure detection works in a JGroups cluster (and global instead of per-cache, and more friendly for active-active setups), but for now we need to be careful with terminology: a site being taken offline is different from a site becoming unreachable (because it doesn't have any node in the bridge cluster view), and "an entire site is lost" could mean either.
> XSite Max Idle
> --------------
>
> Key: ISPN-11176
> URL: https://issues.redhat.com/browse/ISPN-11176
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, Expiration
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 2 months
[JBoss JIRA] (ISPN-12211) Combine no route to log messages
by Diego Lovison (Jira)
Diego Lovison created ISPN-12211:
------------------------------------
Summary: Combine no route to log messages
Key: ISPN-12211
URL: https://issues.redhat.com/browse/ISPN-12211
Project: Infinispan
Issue Type: Enhancement
Affects Versions: 11.0.3.Final, 12.0.0.Dev01
Reporter: Diego Lovison
When doing some put in a cache that has a backup site, if the backup site is not available, a lot of messages will be printed in logs. I think that we can count and print every 100 or 1000
Current state:
{noformat}
[1;31m21:42:38,542 ERROR (jgroups-146,edg-perf05-62972) [org.jgroups.protocols.relay.RELAY2] edg-perf05-62972: no route to site01: dropping message[m
[1;31m21:42:38,542 ERROR (irac-sender-thread-edg-perf05-62972) [org.jgroups.protocols.relay.RELAY2] edg-perf05-62972: no route to site01: dropping message[m
{noformat}
Desired state: dropping X messages
{noformat}
[1;31m21:42:38,542 ERROR (irac-sender-thread-edg-perf05-62972) [org.jgroups.protocols.relay.RELAY2] edg-perf05-62972: no route to site01: dropping X messages[m
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 2 months
[JBoss JIRA] (ISPN-11176) XSite Max Idle
by Will Burns (Jira)
[ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi... ]
Will Burns commented on ISPN-11176:
-----------------------------------
{quote}
Does the consistency loss mean a write could be undone, or is it something else that's maybe more palatable to users?
{quote}
The problem is if the node that has the access goes down before replicating the touch. In this case the entry can expire early.
{quote}
I agree sending a touch command for all non-expired entries would be very expensive, especially for x-site where the user would make some estimations before deployment of how much bandwidth Infinispan would require, and having to send+process lots of max-idle commands would increase the latency of all the other x-site commands going through the same site masters.
One thing that's not clear to me is when you say "an entire site is lost", are you talking about a site being taken offline? Because a site can easily disappear from the bridge cluster's view for a while just because its site master crashed.
OTOH the take offline policies are maybe too lenient (although I haven't checked if Pedro made any recent changes w/ IRAC), so a sync x-site RPC is likely to time out before the site is taken offline.
Edit: Forgot to mention one more way a site can become unavailable: if it splits and the cache is configured with when-split="DENY_READ_WRITES". Unless IRAC doesn't allow that configuration anyway?
{quote}
I thought I remember hearing something about all nodes can be site masters now or something? I really don't know much about xsite/IRAC details though.
But if a single node failure can cause a site to become offline, that would not be good.
> XSite Max Idle
> --------------
>
> Key: ISPN-11176
> URL: https://issues.redhat.com/browse/ISPN-11176
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, Expiration
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[JBoss JIRA] (ISPN-12210) Always purge non-shared stores on cache startup
by Dan Berindei (Jira)
Dan Berindei created ISPN-12210:
-----------------------------------
Summary: Always purge non-shared stores on cache startup
Key: ISPN-12210
URL: https://issues.redhat.com/browse/ISPN-12210
Project: Infinispan
Issue Type: Bug
Components: Core, Loaders and Stores
Affects Versions: 11.0.3.Final
Reporter: Dan Berindei
Fix For: 12.0.0.Final
{{purgeOnStartup}} is unsafe with shared stores, but it must be enabled for non-shared (private) stores in order to avoid resurrecting removed entries.
We can enhance the graceful cluster/cache shutdown operations to save a "disable purge" flag in the persisted state, and purge on cache startup the non-shared stores of any cache missing the "disable purge" flag. This will remove the need to ever disable {{purgeOnStartup}}.
Then we can ignore the {{purgeOnStartup}} setting and only log a warning that it will be ignored if the user enabled it for shared stores or disabled it for non-shared stores.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[JBoss JIRA] (ISPN-12209) Possibly remove raw values and HotRod wrapping for RemoteStore
by Will Burns (Jira)
[ https://issues.redhat.com/browse/ISPN-12209?page=com.atlassian.jira.plugi... ]
Will Burns updated ISPN-12209:
------------------------------
Description:
ISPN-12165 converts RemoteStore to change marshaller dynamically based upon media types. This allows us to not need hot rod wrapping or raw values in many scenarios. However before their removal we need to verify there are no other use cases for these options.
We would also have to remove these options from documentation as well.
was:ISPN-12165 converts RemoteStore to change marshaller dynamically based upon media types. This allows us to not need hot rod wrapping or raw values in many scenarios. However before their removal we need to verify there are no other use cases for these options.
> Possibly remove raw values and HotRod wrapping for RemoteStore
> --------------------------------------------------------------
>
> Key: ISPN-12209
> URL: https://issues.redhat.com/browse/ISPN-12209
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Reporter: Will Burns
> Priority: Major
> Fix For: 13.0.0.Final
>
>
> ISPN-12165 converts RemoteStore to change marshaller dynamically based upon media types. This allows us to not need hot rod wrapping or raw values in many scenarios. However before their removal we need to verify there are no other use cases for these options.
> We would also have to remove these options from documentation as well.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[JBoss JIRA] (ISPN-12209) Possibly remove raw values and HotRod wrapping for RemoteStore
by Will Burns (Jira)
Will Burns created ISPN-12209:
---------------------------------
Summary: Possibly remove raw values and HotRod wrapping for RemoteStore
Key: ISPN-12209
URL: https://issues.redhat.com/browse/ISPN-12209
Project: Infinispan
Issue Type: Enhancement
Components: Loaders and Stores
Reporter: Will Burns
Fix For: 13.0.0.Final
ISPN-12165 converts RemoteStore to change marshaller dynamically based upon media types. This allows us to not need hot rod wrapping or raw values in many scenarios. However before their removal we need to verify there are no other use cases for these options.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months
[JBoss JIRA] (ISPN-11005) HotRod decoder small performance improvements
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11005?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-11005:
--------------------------------
Git Pull Request: https://github.com/infinispan/infinispan/pull/8615 (was: https://github.com/infinispan/infinispan/pull/8615/files)
> HotRod decoder small performance improvements
> ---------------------------------------------
>
> Key: ISPN-11005
> URL: https://issues.redhat.com/browse/ISPN-11005
> Project: Infinispan
> Issue Type: Enhancement
> Components: Server
> Affects Versions: 10.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: performace
>
> I noticed some small inefficiencies in the flight recordings from the client-server dist read benchmarks:
> * {{Intrinsics.string()}} allocates a temporary {{byte[]}}, we could use {{ByteBuf.toString(start, length, Charset)}} instead (which reuses a thread-local buffer).
> * For reading the cache name it would be even better to use {{ByteString}} and avoid the UTF8 decoding.
> * {{MediaType.hashCode()}} allocates an iterator for the params map even though it's empty.
> * {{JBossMarshallingTranscoder.transcode()}} is called twice for each requests, and even when there is no transcoding to perform it does a lot of {{String.equals()}} checks.
> * {{CacheImpl.getCacheEntryAsync()}} allocates a new {{CompletableFuture}} via {{applyThen()}} just to change the return type, could do the same thing by casting to the erased type.
> * {{EncoderCache.getCacheEntryAsync()}} could also avoid allocating a {{CompletableFuture}} when the read was synchronous.
> * {{Encoder2x}} is stateless, and yet a new instance is created for each request.
> * {{Encoder2x.writeHeader()}} looks up the cache info a second time, as most requests needed that info to execute the operation, plus one useless (I think) {{String.equals()}} check for the counter cache.
> There are also a few issues with the benchmark itself:
> * The load stage took less than 3 mins according to the logs, but flight recordings show {{PutKeyValueCommand}}s being executed at least 1 minute after the end of the load phase.
> * Either RadarGun or FlightRecorder itself is doing lots of JMX calls that throw exceptions constantly through the benchmark, allocating lots of {{StackTraceElement}} instances.
> * Finally, the cluster is unstable, and some nodes are excluded even though the network seems to be fine and GC pauses are quite small.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 3 months