[JBoss JIRA] (ISPN-11005) HotRod decoder small performance improvements
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11005?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-11005:
--------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/8615/files
> HotRod decoder small performance improvements
> ---------------------------------------------
>
> Key: ISPN-11005
> URL: https://issues.redhat.com/browse/ISPN-11005
> Project: Infinispan
> Issue Type: Enhancement
> Components: Server
> Affects Versions: 10.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: performace
>
> I noticed some small inefficiencies in the flight recordings from the client-server dist read benchmarks:
> * {{Intrinsics.string()}} allocates a temporary {{byte[]}}, we could use {{ByteBuf.toString(start, length, Charset)}} instead (which reuses a thread-local buffer).
> * For reading the cache name it would be even better to use {{ByteString}} and avoid the UTF8 decoding.
> * {{MediaType.hashCode()}} allocates an iterator for the params map even though it's empty.
> * {{JBossMarshallingTranscoder.transcode()}} is called twice for each requests, and even when there is no transcoding to perform it does a lot of {{String.equals()}} checks.
> * {{CacheImpl.getCacheEntryAsync()}} allocates a new {{CompletableFuture}} via {{applyThen()}} just to change the return type, could do the same thing by casting to the erased type.
> * {{EncoderCache.getCacheEntryAsync()}} could also avoid allocating a {{CompletableFuture}} when the read was synchronous.
> * {{Encoder2x}} is stateless, and yet a new instance is created for each request.
> * {{Encoder2x.writeHeader()}} looks up the cache info a second time, as most requests needed that info to execute the operation, plus one useless (I think) {{String.equals()}} check for the counter cache.
> There are also a few issues with the benchmark itself:
> * The load stage took less than 3 mins according to the logs, but flight recordings show {{PutKeyValueCommand}}s being executed at least 1 minute after the end of the load phase.
> * Either RadarGun or FlightRecorder itself is doing lots of JMX calls that throw exceptions constantly through the benchmark, allocating lots of {{StackTraceElement}} instances.
> * Finally, the cluster is unstable, and some nodes are excluded even though the network seems to be fine and GC pauses are quite small.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11005) HotRod decoder small performance improvements
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11005?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-11005:
--------------------------------
Git Pull Request: https://github.com/infinispan/infinispan/pull/8615 (was: https://github.com/infinispan/infinispan/pull/8615/files)
> HotRod decoder small performance improvements
> ---------------------------------------------
>
> Key: ISPN-11005
> URL: https://issues.redhat.com/browse/ISPN-11005
> Project: Infinispan
> Issue Type: Enhancement
> Components: Server
> Affects Versions: 10.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: performace
>
> I noticed some small inefficiencies in the flight recordings from the client-server dist read benchmarks:
> * {{Intrinsics.string()}} allocates a temporary {{byte[]}}, we could use {{ByteBuf.toString(start, length, Charset)}} instead (which reuses a thread-local buffer).
> * For reading the cache name it would be even better to use {{ByteString}} and avoid the UTF8 decoding.
> * {{MediaType.hashCode()}} allocates an iterator for the params map even though it's empty.
> * {{JBossMarshallingTranscoder.transcode()}} is called twice for each requests, and even when there is no transcoding to perform it does a lot of {{String.equals()}} checks.
> * {{CacheImpl.getCacheEntryAsync()}} allocates a new {{CompletableFuture}} via {{applyThen()}} just to change the return type, could do the same thing by casting to the erased type.
> * {{EncoderCache.getCacheEntryAsync()}} could also avoid allocating a {{CompletableFuture}} when the read was synchronous.
> * {{Encoder2x}} is stateless, and yet a new instance is created for each request.
> * {{Encoder2x.writeHeader()}} looks up the cache info a second time, as most requests needed that info to execute the operation, plus one useless (I think) {{String.equals()}} check for the counter cache.
> There are also a few issues with the benchmark itself:
> * The load stage took less than 3 mins according to the logs, but flight recordings show {{PutKeyValueCommand}}s being executed at least 1 minute after the end of the load phase.
> * Either RadarGun or FlightRecorder itself is doing lots of JMX calls that throw exceptions constantly through the benchmark, allocating lots of {{StackTraceElement}} instances.
> * Finally, the cluster is unstable, and some nodes are excluded even though the network seems to be fine and GC pauses are quite small.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11005) HotRod decoder small performance improvements
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11005?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-11005:
--------------------------------
Status: Open (was: New)
> HotRod decoder small performance improvements
> ---------------------------------------------
>
> Key: ISPN-11005
> URL: https://issues.redhat.com/browse/ISPN-11005
> Project: Infinispan
> Issue Type: Enhancement
> Components: Server
> Affects Versions: 10.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Labels: performace
>
> I noticed some small inefficiencies in the flight recordings from the client-server dist read benchmarks:
> * {{Intrinsics.string()}} allocates a temporary {{byte[]}}, we could use {{ByteBuf.toString(start, length, Charset)}} instead (which reuses a thread-local buffer).
> * For reading the cache name it would be even better to use {{ByteString}} and avoid the UTF8 decoding.
> * {{MediaType.hashCode()}} allocates an iterator for the params map even though it's empty.
> * {{JBossMarshallingTranscoder.transcode()}} is called twice for each requests, and even when there is no transcoding to perform it does a lot of {{String.equals()}} checks.
> * {{CacheImpl.getCacheEntryAsync()}} allocates a new {{CompletableFuture}} via {{applyThen()}} just to change the return type, could do the same thing by casting to the erased type.
> * {{EncoderCache.getCacheEntryAsync()}} could also avoid allocating a {{CompletableFuture}} when the read was synchronous.
> * {{Encoder2x}} is stateless, and yet a new instance is created for each request.
> * {{Encoder2x.writeHeader()}} looks up the cache info a second time, as most requests needed that info to execute the operation, plus one useless (I think) {{String.equals()}} check for the counter cache.
> There are also a few issues with the benchmark itself:
> * The load stage took less than 3 mins according to the logs, but flight recordings show {{PutKeyValueCommand}}s being executed at least 1 minute after the end of the load phase.
> * Either RadarGun or FlightRecorder itself is doing lots of JMX calls that throw exceptions constantly through the benchmark, allocating lots of {{StackTraceElement}} instances.
> * Finally, the cluster is unstable, and some nodes are excluded even though the network seems to be fine and GC pauses are quite small.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-12208) Operator Docs: Disabling autoscale
by Donald Naro (Jira)
Donald Naro created ISPN-12208:
----------------------------------
Summary: Operator Docs: Disabling autoscale
Key: ISPN-12208
URL: https://issues.redhat.com/browse/ISPN-12208
Project: Infinispan
Issue Type: Enhancement
Components: Documentation
Reporter: Donald Naro
Assignee: Donald Naro
If the OpenShift service CA is available then the Operator enables encryption by default.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11176) XSite Max Idle
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi... ]
Dan Berindei edited comment on ISPN-11176 at 8/10/20 7:05 AM:
--------------------------------------------------------------
{quote}The problem if it is async is you have a window of consistency loss if a node is taken down. We can mitigate this issue by performing a touch all when it occurs. I was told this is not acceptable, and I agree, for clustered max idle where it would occur for a single node. However, I believe that it should be okay when it would occur only when an entire site is lost.
{quote}
Does the consistency loss mean a write could be undone, or is it something else that's maybe more palatable to users?
I agree sending a touch command for all non-expired entries would be very expensive, especially for x-site where the user would make some estimations before deployment of how much bandwidth Infinispan would require, and having to send+process lots of max-idle commands would increase the latency of all the other x-site commands going through the same site masters.
One thing that's not clear to me is when you say "an entire site is lost", are you talking about a site being taken offline? Because a site can easily disappear from the bridge cluster's view for a while just because its site master crashed.
OTOH the take offline policies are maybe too lenient (although I haven't checked if Pedro made any recent changes w/ IRAC), so a sync x-site RPC is likely to time out before the site is taken offline.
Edit: Forgot to mention one more way a site can become unavailable: if it splits and the cache is configured with {{when-split="DENY_READ_WRITES"}}. Unless IRAC doesn't allow that configuration anyway?
was (Author: dan.berindei):
{quote}
The problem if it is async is you have a window of consistency loss if a node is taken down. We can mitigate this issue by performing a touch all when it occurs. I was told this is not acceptable, and I agree, for clustered max idle where it would occur for a single node. However, I believe that it should be okay when it would occur only when an entire site is lost.
{quote}
Does the consistency loss mean a write could be undone, or is it something else that's maybe more palatable to users?
I agree sending a touch command for all non-expired entries would be very expensive, especially for x-site where the user would make some estimations before deployment of how much bandwidth Infinispan would require, and having to send+process lots of max-idle commands would increase the latency of all the other x-site commands going through the same site masters.
One thing that's not clear to me is when you say "an entire site is lost", are you talking about a site being taken offline? Because a site can easily disappear from the bridge cluster's view for a while just because its site master crashed.
OTOH the take offline policies are maybe too lenient (although I haven't checked if Pedro made any recent changes w/ IRAC), so a sync x-site RPC is likely to time out before the site is taken offline.
> XSite Max Idle
> --------------
>
> Key: ISPN-11176
> URL: https://issues.redhat.com/browse/ISPN-11176
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, Expiration
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months