[
https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin....
]
Radim Vansa commented on ISPN-6925:
-----------------------------------
Hi [~pferraro], I can confirm that the messages are properly sent and received (by
JGroups) but not processed by Infinispan. Regrettably, there's no further logging that
would pinpoint the issue, but given the fact that the responses are successful (therefore
there's no issue with copying the content) it could be only this race condition (which
I doubt, since the timestamps suggest the events are not interleaved) or unknown bug. I
guess you can't reproduce this in an unit test, can you? I would be also interested if
you can reproduce with the fix in PR.
node2:
2016-08-16 10:25:55,217 TRACE [org.jgroups.protocols.UDP] (default task-1) node2: sending
msg to node1, src=node2, headers are RequestCorrelator: corr_id=200, type=REQ, req_id=42,
rsp_expected=true, FORK: ee:web
, UNICAST3: DATA, seqno=65, TP: [cluster_name=ee]
node1:
2016-08-16 10:25:55,219 TRACE
[org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (thread-15,ee,node1)
About to send back response SuccessfulResponse{responseValue=ImmortalCacheValue
{value=org.wildfly.clustering.web.infinispan.session.SessionCreationMetaDataEntry@669d8e2a}}
for command ClusteredGetCommand{key=c-TToH8iNOx-6YCwJnHNKzxhNKEc0DenxW12ZMgo,
flags=[FORCE_WRITE_LOCK]}
node2:
2016-08-16 10:25:55,220 TRACE [org.jgroups.protocols.UDP] (thread-17,ee,node2) node2:
received [dst: node2, src: node1 (4 headers), size=22 bytes,
flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=200,
type=RSP, req_id=42, rsp_expected=true, FORK: ee:web, UNICAST3: DATA, seqno=65, conn_id=1,
TP: [cluster_name=ee]
2016-08-16 10:25:55,225 TRACE [org.jgroups.protocols.UDP] (timeout-thread--p12-t1) node2:
sending msg to node3, src=node2, headers are RequestCorrelator: corr_id=200, type=REQ,
req_id=43, rsp_expected=true, FORK: ee:web, UNICAST3: DATA, seqno=22, conn_id=1, TP:
[cluster_name=ee]
node3:
2016-08-16 10:25:55,228 TRACE
[org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (thread-18,ee,node3)
About to send back response SuccessfulResponse{responseValue=ImmortalCacheValue
{value=org.wildfly.clustering.web.infinispan.session.SessionCreationMetaDataEntry@2d3471f8}}
for command ClusteredGetCommand{key=c-TToH8iNOx-6YCwJnHNKzxhNKEc0DenxW12ZMgo,
flags=[FORCE_WRITE_LOCK]}
2016-08-16 10:25:55,232 TRACE [org.jgroups.protocols.UDP] (thread-17,ee,node2) node2:
received [dst: node2, src: node3 (4 headers), size=22 bytes,
flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=200,
type=RSP, req_id=43, rsp_expected=true, FORK: ee:web, UNICAST3: DATA, seqno=20, conn_id=1,
TP: [cluster_name=ee]
Race condition in staggered gets
--------------------------------
Key: ISPN-6925
URL:
https://issues.jboss.org/browse/ISPN-6925
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
Reporter: Radim Vansa
Assignee: Radim Vansa
Priority: Critical
Attachments: server.log.node1, server.log.node2, server.log.node3
There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets.
When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}}
is filled in - both of them can set is as received but later see that the other response
was not received yet, because there's no memory barrieri n between the
{{setValue}}/{{setException}} and checking {{wasReceived}}.
The race above happens when two responses come but none of them is accepted by the
filter, but there's a second one in JGroupsTransport when the first response is
accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the
lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently
modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written
response was received and it's not an exception according to flags, but the value will
be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)