[
https://issues.jboss.org/browse/ISPN-3918?page=com.atlassian.jira.plugin....
]
Dan Berindei edited comment on ISPN-3918 at 8/10/17 4:20 AM:
-------------------------------------------------------------
I found another failure mode while trying to work around this issue in
{{NonTxPutIfAbsentDuringLeaveStressTest}} (I was also trying to merge it with
{{NonTxPutIfAbsentDuringJoinStressTest}} into a
{{NonTxPutIfAbsentDuringRebalanceStressTest}}, so the stack traces in the attached log
_NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz_ don't
match master).
Say {{owners(k) = AB}} in topology {{t}}, and {{owners(k) = BA}} in topology {{t+1}}. In
the following scenario, {{putIfAbsent}} fails in thread _C-app1_ and _C-app2_ with
different values:
# _C-app1_: start putIfAbsent(k, v1)
# _C-app1_: send putIfAbsent(k, v1) to A (primary)
# _A-remote1_: receive putIfAbsent(k, v1)
# _A-remote1_: send backup request for putIfAbsent(k, v1) to B
# _A-remote1_: write k=v1
# _C-remote1_: receive null for putIfAbsent(k, v1)
# _C-app2_: start putIfAbsent(k, v2)
# _C-app2_: send putIfAbsent(k, v2) to A (primary)
# _A-remote2_: receive putIfAbsent(k, v1)
# _A-remote2_: read v1 from data container, fail the command
# _C-remote2_: receive v1 for putIfAbsent(k, v2)
# *_C-app2_: return v1 (put was unsuccessful)*
# _B-remote1_: install topology t+1, B is now primary owner
# _B-remote1_: receive backup request for putIfAbsent(k, v1)
# _B-remote1_: check topology, send OutdatedTopologyException ack to C
# _C-remote3_: install topology t+1
# _C-app3_: start putIfAbsent(k, v3)
# _C-app3_: send putIfAbsent(k, v3) to B (primary)
# _B-remote3_: receive putIfAbsent(k, v3)
# _B-remote3_: send backup request for putIfAbsent(k, 3) to A
# _B-remote3_: write k=v3
# _C-remote3_: receive null for putIfAbsent(k, v3)
# _A-remote3_: receive backup request for putIfAbsent(k, v3)
# _A-remote3_: write k=v3
# _A-remote3_: send backup ack for putIfAbsent(k, v3) to C
# _C-remote3_: receive backup ack for putIfAbsent(k, v3)
# *_C-app3_: return null (put was successful)*
# _B-remote1_: retry, B is now primary owner
# _B-remote1_: read v3 from data container, fail the command
# _C-remote1_: receive v3 for putIfAbsent(k, v1)
# *_C-app1_: return v3 (put was unsuccessful)*
I think both this scenario and the previous one are worse than the initial report of
seeing {{null}}, because we're not respecting the {{READ_COMMITTED}} isolation level
(a transaction is seeing a value that was never committed).
was (Author: dan.berindei):
I found another failure mode while trying to work around this issue in
{{NonTxPutIfAbsentDuringLeaveStressTest}} (I was also trying to merge it with
{{NonTxPutIfAbsentDuringJoinStressTest}} into a
{{NonTxPutIfAbsentDuringRebalanceStressTest}}, so the stack traces in the attached log
_NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz_ don't
match master).
Say {{owners(k) = AB}} in topology {{t}}, and {{owners(k) = BA}} in topology {{t+1}}. In
the following scenario, {{putIfAbsent}} fails in thread _C-app1_ and _C-app2_ with
different values:
# _C-app1_: start putIfAbsent(k, v1)
# _C-app1_: send putIfAbsent(k, v1) to A (primary)
# _A-remote1_: receive putIfAbsent(k, v1)
# _A-remote1_: send backup request for putIfAbsent(k, v1) to B
# _A-remote1_: write k=v1
# _C-remote1_: receive null for putIfAbsent(k, v1)
# _C-app2_: start putIfAbsent(k, v2)
# _C-app2_: send putIfAbsent(k, v2) to A (primary)
# _A-remote2_: receive putIfAbsent(k, v1)
# _A-remote2_: read v1 from data container, fail the command
# _C-remote2_: receive v1 for putIfAbsent(k, v2)
# *_C-app2_: return v1 (put was unsuccessful)*
# _B-remote1_: install topology t+1, B is now primary owner
# _B-remote1_: receive backup request for putIfAbsent(k, v1)
# _B-remote1_: check topology, send OutdatedTopologyException ack to C
# _C-remote3_: install topology t+1
# _C-app3_: start putIfAbsent(k, v3)
# _C-app3_: send putIfAbsent(k, v3) to B (primary)
# _B-remote3_: receive putIfAbsent(k, v3)
# _B-remote3_: send backup request for putIfAbsent(k, 3) to A
# _B-remote3_: write k=v3
# _C-remote3_: receive null for putIfAbsent(k, v3)
# _A-remote3_: receive backup request for putIfAbsent(k, v3)
# _A-remote3_: write k=v3
# _A-remote3_: send backup ack for putIfAbsent(k, v3) to C
# _C-remote3_: receive backup ack for putIfAbsent(k, v3)
# *_C-app3_: return null (put was successful)*
# _B-remote1_: retry, B is now primary owner
# _B-remote1_: read v2 from data container, fail the command
# _C-remote1_: receive v2 for putIfAbsent(k, v1)
# *_C-app1_: return v2 (put was unsuccessful)*
I think both this scenario and the previous one are worse than the initial report of
seeing {{null}}, because we're not respecting the {{READ_COMMITTED}} isolation level
(a transaction is seeing a value that was never committed).
Inconsistent view of the cache with putIfAbsent in a non-tx cache
during state transfer
---------------------------------------------------------------------------------------
Key: ISPN-3918
URL:
https://issues.jboss.org/browse/ISPN-3918
Project: Infinispan
Issue Type: Bug
Components: Core, State Transfer
Affects Versions: 6.0.0.Final
Reporter: Dan Berindei
Labels: consistency
Fix For: 9.2.0.Final
Attachments:
NonTxPutIfAbsentDuringLeaveStressTest.testNodeLeavingDuringPutIfAbsent_8.log.gz,
NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz,
ntpiadjst.log.gz
In a non-tx cache, sometimes it's possible for a {{get(k)}} to return {{null}} even
though a previous {{putIfAbsent(k, v)}} returned a non-null value and the only concurrent
operations on the cache are concurrent putIfAbsent calls.
Say \[B, A, C] are the owners of k (C just joined)
1. A starts a {{putIfAbsent(k, v1)}} command, sends it to B
2. B forwards the command to A and C
3. C writes {{k=v1}}
4. C becomes the primary owner of k (owners are now \[C, A])
5. A/B see the new topology before committing and throw an outdatedTopologyException
6. A retries the command, sends it to C
7. C forwards the command to A, which writes {{k=v1}}
8. C doesn't have to update the entry, returns null
If, between steps 3 and 7, another thread on A starts a {{putIfAbsent(k, v2)}} command,
the command will fail and return {{v1}} (because the primary owner already has a value).
However, a subsequent {{get(k)}} command will return {{null}}, because A is an owner and
doesn't have the value.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)