[JBoss JIRA] (ISPN-6925) Race condition in staggered gets
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin.... ]
Paul Ferraro edited comment on ISPN-6925 at 8/16/16 10:00 AM:
--------------------------------------------------------------
Attached logs files from 3 WF nodes. Request is initiated from node2 after view is stable. ClusteredGetCommand is sent to node1 at 2016-08-16 09:51:39,885 and to node3 at 2016-08-16 09:51:39,894. As indicated in the node1 & node3 logs, the command is received, executed, and a response is sent, as expected. However, node2 eventually logs this:
{noformat}
2016-08-16 09:51:54,885 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (timeout-thread--p11-t1) Responses: [sender=node3, received=false, suspected=false]
[sender=node1, received=false, suspected=false]
{noformat}
was (Author: pferraro):
Attached logs files from 3 WF nodes. Request is initiated from node2 after view is stable. ClusteredGetCommand is sent to node1 at 2016-08-16 09:51:39,885 and to node3 at 2016-08-16 09:51:39,894. As indicated in the node1 & node3 logs, the command is received, executed, and a response it sent as expected. However, node2 eventually logs this:
{noformat}
2016-08-16 09:51:54,885 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (timeout-thread--p11-t1) Responses: [sender=node3, received=false, suspected=false]
[sender=node1, received=false, suspected=false]
{noformat}
> Race condition in staggered gets
> --------------------------------
>
> Key: ISPN-6925
> URL: https://issues.jboss.org/browse/ISPN-6925
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Priority: Critical
> Attachments: server.log.node1, server.log.node2, server.log.node3
>
>
> There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets. When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}} is filled in - both of them can set is as received but later see that the other response was not received yet, because there's no memory barrieri n between the {{setValue}}/{{setException}} and checking {{wasReceived}}.
> The race above happens when two responses come but none of them is accepted by the filter, but there's a second one in JGroupsTransport when the first response is accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written response was received and it's not an exception according to flags, but the value will be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6925) Race condition in staggered gets
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin.... ]
Paul Ferraro updated ISPN-6925:
-------------------------------
Attachment: server.log.node3
server.log.node2
server.log.node1
Attached logs files from 3 WF nodes. Request is initiated from node2 after view is stable. ClusteredGetCommand is sent to node1 at 2016-08-16 09:51:39,885 and to node3 at 2016-08-16 09:51:39,894. As indicated in the node1 & node3 logs, the command is received, executed, and a response it sent as expected. However, node2 eventually logs this:
{noformat}
2016-08-16 09:51:54,885 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (timeout-thread--p11-t1) Responses: [sender=node3, received=false, suspected=false]
[sender=node1, received=false, suspected=false]
{noformat}
> Race condition in staggered gets
> --------------------------------
>
> Key: ISPN-6925
> URL: https://issues.jboss.org/browse/ISPN-6925
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Priority: Critical
> Attachments: server.log.node1, server.log.node2, server.log.node3
>
>
> There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets. When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}} is filled in - both of them can set is as received but later see that the other response was not received yet, because there's no memory barrieri n between the {{setValue}}/{{setException}} and checking {{wasReceived}}.
> The race above happens when two responses come but none of them is accepted by the filter, but there's a second one in JGroupsTransport when the first response is accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written response was received and it's not an exception according to flags, but the value will be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6676) Use HTTP2/Upgrade in HotRod server
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-6676?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec commented on ISPN-6676:
-------------------------------------------
Kubernetes and OpenShift architecture looks like the following:
{noformat}
Client ---the-Internet---> Route ---> Service ---Round-Robin---+--> Pod 1
+--> Pod 2
+--> Pod 3
{noformat}
The round robin strategy is the default but we can also use Session Affinity by Client IP. But the main problem is how to get to a certain Pod (the client can calculate the consistent hash, so it knows where to look for data)?
> Use HTTP2/Upgrade in HotRod server
> ----------------------------------
>
> Key: ISPN-6676
> URL: https://issues.jboss.org/browse/ISPN-6676
> Project: Infinispan
> Issue Type: Feature Request
> Components: Cloud Integrations
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
> Labels: hackathon
>
> With HTTP2 it is possible to reuse the same TCP connection and switch into custom binary protocol. This would be a perfect way to cooperate with many load balancers including those deployed in the Cloud.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6925) Race condition in staggered gets
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin.... ]
Paul Ferraro commented on ISPN-6925:
------------------------------------
Raising priority to critical, as this is easily reproducible in WF 10.x resulting in request failures, forcing us to disable staggered gets for WF 10.1.
> Race condition in staggered gets
> --------------------------------
>
> Key: ISPN-6925
> URL: https://issues.jboss.org/browse/ISPN-6925
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Priority: Critical
>
> There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets. When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}} is filled in - both of them can set is as received but later see that the other response was not received yet, because there's no memory barrieri n between the {{setValue}}/{{setException}} and checking {{wasReceived}}.
> The race above happens when two responses come but none of them is accepted by the filter, but there's a second one in JGroupsTransport when the first response is accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written response was received and it's not an exception according to flags, but the value will be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6925) Race condition in staggered gets
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin.... ]
Paul Ferraro updated ISPN-6925:
-------------------------------
Priority: Critical (was: Major)
> Race condition in staggered gets
> --------------------------------
>
> Key: ISPN-6925
> URL: https://issues.jboss.org/browse/ISPN-6925
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Priority: Critical
>
> There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets. When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}} is filled in - both of them can set is as received but later see that the other response was not received yet, because there's no memory barrieri n between the {{setValue}}/{{setException}} and checking {{wasReceived}}.
> The race above happens when two responses come but none of them is accepted by the filter, but there's a second one in JGroupsTransport when the first response is accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written response was received and it's not an exception according to flags, but the value will be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6676) Use HTTP2/Upgrade in HotRod server
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-6676?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec commented on ISPN-6676:
-------------------------------------------
Thoughts on design for this feature:
* HTTP/2 [does not support upgrading to another protocol|https://http2.github.io/http2-spec/#rfc.section.8.1.2.2]. [TLS ALPN|https://http2.github.io/http2-spec/#rfc.section.3.1] should be used to ensure this type functionality.
* HTTP/1.1 Upgrade header is supported to ensure backwards compatibility. In other words we should consider using [HTTP/1.1 Upgrade|https://http2.github.io/http2-spec/#rfc.section.3.2] or [ALPN|https://tools.ietf.org/html/rfc7301] to support this kind of functionality.
* [HTTP/2 client with TLS support must use ALPN|https://http2.github.io/http2-spec/#rfc.section.3.4]
* [HTTP/2 supports extensions, which can use new type of Frames, Settings and Error Codes|https://http2.github.io/http2-spec/#rfc.section.5.5].
* HTTP/2 supports [PING|https://http2.github.io/http2-spec/#rfc.section.6.7] and [CLOSE|https://http2.github.io/http2-spec/#rfc.section.6.8] operations out of the box.
* [ALPN is considered as a protection for cross-protocol attacks|https://http2.github.io/http2-spec/#rfc.section.10.2] for HTTP/2
> Use HTTP2/Upgrade in HotRod server
> ----------------------------------
>
> Key: ISPN-6676
> URL: https://issues.jboss.org/browse/ISPN-6676
> Project: Infinispan
> Issue Type: Feature Request
> Components: Cloud Integrations
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
> Labels: hackathon
>
> With HTTP2 it is possible to reuse the same TCP connection and switch into custom binary protocol. This would be a perfect way to cooperate with many load balancers including those deployed in the Cloud.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6676) Use HTTP2/Upgrade in HotRod server
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/ISPN-6676?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec edited comment on ISPN-6676 at 8/16/16 8:37 AM:
--------------------------------------------------------------------
Since many browsers will enforce TLS, I believe we should support only secured connection in HTTP/2 client.
was (Author: sebastian.laskawiec):
Since many browsers will enforce TLS, I believe we should focus on encryption here.
> Use HTTP2/Upgrade in HotRod server
> ----------------------------------
>
> Key: ISPN-6676
> URL: https://issues.jboss.org/browse/ISPN-6676
> Project: Infinispan
> Issue Type: Feature Request
> Components: Cloud Integrations
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
> Labels: hackathon
>
> With HTTP2 it is possible to reuse the same TCP connection and switch into custom binary protocol. This would be a perfect way to cooperate with many load balancers including those deployed in the Cloud.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (ISPN-6442) NullPointerException in HotRodDecoder.channelActive
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-6442?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-6442:
-----------------------------------------------
Jakub Markos <jmarkos(a)redhat.com> changed the Status of [bug 1322679|https://bugzilla.redhat.com/show_bug.cgi?id=1322679] from ON_QA to VERIFIED
> NullPointerException in HotRodDecoder.channelActive
> ---------------------------------------------------
>
> Key: ISPN-6442
> URL: https://issues.jboss.org/browse/ISPN-6442
> Project: Infinispan
> Issue Type: Bug
> Components: Server, Test Suite - Server
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha1, 9.0.0.Final
>
>
> {{HotRodServer.startInternal}} first starts the Netty transport (with {{super.startInternal()}}) and only then initializes the {{clientListenerRegistry}} field. That means the server can accept a request before {{clientListenerRegistry}} is initialized, causing a NPE in {{HotRodDecoder.channelActive()}}.
> Visible as random failures in {{DistTopologyChangeUnderLoadSingleOwnerTest.testRestartServerWhilePutting}}:
> {noformat}
> 00:10:54,718 ERROR (HotRodServerWorker-408-1) [CacheDecodeContext] ISPN005009: Unexpected error before any request parameters read java.lang.NullPointerException
> at org.infinispan.server.hotrod.HotRodDecoder.channelActive(HotRodDecoder.scala:284)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:183)
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:169)
> at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:817)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:453)
> at io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:377)
> at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:423)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
> java.util.concurrent.ExecutionException: org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for messageId=46498 returned server error (status=0x85): java.lang.NullPointerException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadSingleOwnerTest.testRestartServerWhilePutting(DistTopologyChangeUnderLoadSingleOwnerTest.java:64)
> Caused by: org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for messageId=46498 returned server error (status=0x85): java.lang.NullPointerException
> at org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:343)
> at org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:132)
> at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:118)
> at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
> at org.infinispan.client.hotrod.impl.operations.AbstractKeyValueOperation.sendPutOperation(AbstractKeyValueOperation.java:56)
> at org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:32)
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:54)
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:268)
> at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadSingleOwnerTest$PutHammer.call(DistTopologyChangeUnderLoadSingleOwnerTest.java:76)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadSingleOwnerTest$PutHammer.call(DistTopologyChangeUnderLoadSingleOwnerTest.java:67)
> at org.infinispan.test.AbstractInfinispanTest$LoggingCallable.call(AbstractInfinispanTest.java:478)
> ... 4 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months