[JBoss JIRA] (ISPRK-22) InfinispanRDD is not fault tolerant
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPRK-22?page=com.atlassian.jira.plugin.s... ]
Gustavo Fernandes updated ISPRK-22:
-----------------------------------
Status: Open (was: New)
> InfinispanRDD is not fault tolerant
> -----------------------------------
>
> Key: ISPRK-22
> URL: https://issues.jboss.org/browse/ISPRK-22
> Project: Infinispan Spark
> Issue Type: Bug
> Components: RDD
> Affects Versions: 0.3
> Reporter: Vojtech Juranek
> Assignee: Gustavo Fernandes
>
> When primary ISPN server fails during processing InfinispanRDD, Spark is not able to overcome this failure.
> This is caused by re-creating {{RemoteCachManager}} with pre-configured ISPN server address (for read [here|https://github.com/infinispan/infinispan-spark/blob/master/src/main/...], for writes [here|https://github.com/infinispan/infinispan-spark/blob/master/src/main/...]), so when this server fails during RDD processing and Spark calls some function, which under the hood creates {{RemoteCacheManager}}, it will fail with connection refused exception.
> [Here|https://github.com/vjuranek/infinispan-spark/commit/c4a3101624e76d61...] are some basic tests and example of exception thrown by HR client:
> {noformat}
> org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport
> at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.borrowTransportFromPool(TcpTransportFactory.java:395)
> at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getTransport(TcpTransportFactory.java:241)
> at org.infinispan.client.hotrod.impl.operations.FaultTolerantPingOperation.getTransport(FaultTolerantPingOperation.java:26)
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:53)
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.ping(RemoteCacheImpl.java:490)
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.resolveCompatibility(RemoteCacheImpl.java:551)
> at org.infinispan.client.hotrod.RemoteCacheManager.createRemoteCache(RemoteCacheManager.java:341)
> at org.infinispan.client.hotrod.RemoteCacheManager.getCache(RemoteCacheManager.java:222)
> at org.infinispan.client.hotrod.RemoteCacheManager.getCache(RemoteCacheManager.java:217)
> at org.infinispan.spark.rdd.InfinispanRDD$$anonfun$1.apply(InfinispanRDD.scala:52)
> at org.infinispan.spark.rdd.InfinispanRDD$$anonfun$1.apply(InfinispanRDD.scala:52)
> at scala.Option.map(Option.scala:146)
> at org.infinispan.spark.rdd.InfinispanRDD.compute(InfinispanRDD.scala:52)
> at org.infinispan.spark.rdd.InfinispanRDD.compute(InfinispanRDD.scala:66)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.infinispan.client.hotrod.exceptions.TransportException:: Could not connect to server: /127.0.0.1:11222
> at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransport.java:78)
> at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:35)
> at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:16)
> at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1220)
> at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.borrowTransportFromPool(TcpTransportFactory.java:390)
> ... 21 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
> at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransport.java:68)
> ... 25 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6130) # Activations shows incorrect value
by Pedro Zapata (JIRA)
[ https://issues.jboss.org/browse/ISPN-6130?page=com.atlassian.jira.plugin.... ]
Pedro Zapata closed ISPN-6130.
------------------------------
Resolution: Cannot Reproduce Bug
It seems that's the expected behavior and it has been verified to be correct.
> # Activations shows incorrect value
> ------------------------------------
>
> Key: ISPN-6130
> URL: https://issues.jboss.org/browse/ISPN-6130
> Project: Infinispan
> Issue Type: Bug
> Components: Console
> Affects Versions: 8.1.0.Beta1
> Reporter: Martin Vrabel
> Assignee: Pedro Ruivo
>
> Page: Caches -> select cache container -> select cache.
> Configuration of the cache: replicated cache, 2 nodes in the domain evictions size=10
> In the Entries lifecycle" tab, there is a field "# Activations" which should show number of Activations . When I put 10 entries in the cache and read 10, this field #Activations show 0 as is should. But when I insert another 10 entries (so the first 10 entries will evict and be stored in the cache store) and read the first 10 entries the #Activations field will show 0, but it should be 10
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6576) Functional API does not load values from cache loader on non-primary owners
by Krzysztof Sobolewski (JIRA)
[ https://issues.jboss.org/browse/ISPN-6576?page=com.atlassian.jira.plugin.... ]
Krzysztof Sobolewski commented on ISPN-6576:
--------------------------------------------
The issue is basically independent but the tests won't work until ISPN-6542 is fixed.
> Functional API does not load values from cache loader on non-primary owners
> ---------------------------------------------------------------------------
>
> Key: ISPN-6576
> URL: https://issues.jboss.org/browse/ISPN-6576
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.2.1.Final
> Reporter: Krzysztof Sobolewski
> Attachments: AbstractFunctionalCachestoreTest.java, AbstractFunctionalCachestoreTest.java, FunctionalCachestoreTestNonTx.java, FunctionalCachestoreTestTx.java
>
>
> We have a cluster in DIST mode with numOwners > 1, so there are primary and secondary owners, with persistence enabled. When I do a read-write operation on the cluster using the Functional API on a key that is present in the cache loader, the entry that gets passed to the functional operation has a non-null value only for the primary owner. See:
> org.infinispan.interceptors.ClusteredCacheLoaderInterceptor.skipLoadForWriteCommand()
> line 53.
> This is using non-transactional cache.
> NOTE: The attached tests are modified version of the ones in ISPN-6573.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-6580 at 5/5/16 12:24 AM:
--------------------------------------------------------------
For an initial implementation I am thinking the following operations would be executed on the netty socket read thread
# ContainsKeyRequest
# GetRequest
# GetWithVersionRequest
# GetWithMetadataRequest
# PingRequest
# StatsRequest
And then I am also thinking if this server has no remote cache manager (detecable?) then we could also do other single operations such as the following in the netty socket read thread
# PutRequest
# PutIfAbsentRequest
# ReplaceRequest
# ReplaceIfUnmodifiedRequest
# RemoveRequest
# RemoveIfUnmodifiedRequest
Thinking more though I don't even know if we can do this with a cache that has a cache loader either as that would be blocking and could delay async clients.. is this okay?
was (Author: william.burns):
For an initial implementation I am thinking the following operations would be executed on the netty socket read thread
# ContainsKeyRequest
# GetRequest
# GetWithVersionRequest
# GetWithMetadataRequest
# PingRequest
# StatsRequest
And then I am also thinking if this server has no remote cache manager (detecable?) then we could also do other single operations such as the following in the netty socket read thread
# PutRequest
# PutIfAbsentRequest
# ReplaceRequest
# ReplaceIfUnmodifiedRequest
# RemoveRequest
# RemoveIfUnmodifiedRequest
Also thinking more I don't even know if we can do this with a cache that has a cache loader either as that would be blocking and could delay async clients.. is this okay?
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6580:
-------------------------------------
Also I have a protoype that adds a local executor but it didn't give me the same performance for gets I got when I completely disabled the execution group. It gave quite a bit better performance than the current version though.
I will look into it further tomorrow.
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6580:
-------------------------------------
For an initial implementation I am thinking the following operations would be executed on the netty socket read thread
# ContainsKeyRequest
# GetRequest
# GetWithVersionRequest
# GetWithMetadataRequest
# PingRequest
# StatsRequest
And then I am also thinking if this server has no remote cache manager (detecable?) then we could also do other single operations such as the following in the netty socket read thread
# PutRequest
# PutIfAbsentRequest
# ReplaceRequest
# ReplaceIfUnmodifiedRequest
# RemoveRequest
# RemoveIfUnmodifiedRequest
Also thinking more I don't even know if we can do this with a cache that has a cache loader either as that would be blocking and could delay async clients.. is this okay?
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-6580 at 5/5/16 12:05 AM:
--------------------------------------------------------------
I tweaked my server to not have the execution group changes and the performance is the same as the old server. Unfortunately by doing this we cannot support async clients and event OOM will not be solved. Unfortunately it seems like the context switching caused by handing off the messages is causing a major performance hit for operations that don't do remote operations.
I am thinking to fix this by possibly having a get operation performed on the handler thread instead of handing it off to the execution thread. Unfortunately this means for an async client the server could read multiple requests but only be able handle 1 remote get instead of possibly multiple in parallel, but other operations would work in parallel. This might be okay, but if the client has an outdated topology this could cause a major slowdown if it has to go remote. I am thinking I might want to add an extra handler that will delegate calls if no remote operations are required.
Also if you have multiple servers this latency should be minimized for writes quite a bit and this is why in my testing I didn't notice such a big performance issue since I was testing with 3 nodes.
was (Author: william.burns):
I tweaked my server to not have the execution group changes and the performance is the same as the old server. Unfortunately by doing this we cannot support async clients and event OOM will not be solved. Unfortunately it seems like the context switching caused by handing off the messages is causing a major performance hit for operations that don't do remote operations.
I am thinking to fix this by possibly having a get operation performed on the handler thread instead of handing it off to the execution thread. Unfortunately this means for an async client the server could read multiple requests but only be able handle 1 remote get instead of possibly multiple in parallel, but other operations would work in parallel. This might be okay, but if the client has an outdated topology this could cause a major slowdown if it has to go remote. I am thinking I might want to add an extra handler that will delegate calls if no remote operations are required.
Also if you have multiple servers this latency should be minimized quite a bit and this is why in my testing I didn't notice such a big performance issue since I was testing with 3 nodes.
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-6580 at 5/4/16 11:42 PM:
--------------------------------------------------------------
I tweaked my server to not have the execution group changes and the performance is the same as the old server. Unfortunately by doing this we cannot support async clients and event OOM will not be solved. Unfortunately it seems like the context switching caused by handing off the messages is causing a major performance hit for operations that don't do remote operations.
I am thinking to fix this by possibly having a get operation performed on the handler thread instead of handing it off to the execution thread. Unfortunately this means for an async client the server could read multiple requests but only be able handle 1 remote get instead of possibly multiple in parallel, but other operations would work in parallel. This might be okay, but if the client has an outdated topology this could cause a major slowdown if it has to go remote. I am thinking I might want to add an extra handler that will delegate calls if no remote operations are required.
Also if you have multiple servers this latency should be minimized quite a bit and this is why in my testing I didn't notice such a big performance issue since I was testing with 3 nodes.
was (Author: william.burns):
I tweaked my server to not have the execution group changes and the performance is the same as the old server. Unfortunately by doing this we cannot support async clients and event OOM will not be solved. Unfortunately it seems like the context switching caused by handing off the messages is causing a major performance hit for operations that don't do remote operations.
I am thinking to fix this by possibly having a get operation performed on the handler thread instead of handing it off to the execution thread. Unfortunately this means for an async client the server could read multiple requests but only be able handle 1 remote get instead of possibly multiple in parallel, but other operations would work in parallel. This might be okay, but if the client has an outdated topology this could cause a major slowdown if it has to go remote. I am thinking I might want to add an extra handler that will delegate calls if no remote operations are required, possibly.
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6580:
-------------------------------------
I tweaked my server to not have the execution group changes and the performance is the same as the old server. Unfortunately by doing this we cannot support async clients and event OOM will not be solved. Unfortunately it seems like the context switching caused by handing off the messages is causing a major performance hit for operations that don't do remote operations.
I am thinking to fix this by possibly having a get operation performed on the handler thread instead of handing it off to the execution thread. Unfortunately this means for an async client the server could read multiple requests but only be able handle 1 remote get instead of possibly multiple in parallel, but other operations would work in parallel. This might be okay, but if the client has an outdated topology this could cause a major slowdown if it has to go remote. I am thinking I might want to add an extra handler that will delegate calls if no remote operations are required, possibly.
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6096) Cache container page does not show any containers when more hotrod-connectors are defined
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-6096?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic updated ISPN-6096:
--------------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 9.0.0.Alpha2
Resolution: Done
> Cache container page does not show any containers when more hotrod-connectors are defined
> -----------------------------------------------------------------------------------------
>
> Key: ISPN-6096
> URL: https://issues.jboss.org/browse/ISPN-6096
> Project: Infinispan
> Issue Type: Bug
> Components: Console
> Reporter: Martin Gencur
> Assignee: Ryan Emerson
> Fix For: 9.0.0.Alpha2
>
>
> When there's more than one <hotrod-connector> defined in domain.xml configuration, the cache container page does not show any containers. Tested many times.
> The server should be able to provide multiple HotRod endpoints bound to different cache containers.
> More specifically, this bug appears whenever I add "name" attribut to <hotrod-connector> tag. This attribute is required to differentiate hotrod connectors when there are more than one.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months