[infinispan-issues] [JBoss JIRA] (ISPN-3918) Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer

Dan Berindei (JIRA) issues at jboss.org
Thu Aug 10 04:07:00 EDT 2017


    [ https://issues.jboss.org/browse/ISPN-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447115#comment-13447115 ] 

Dan Berindei commented on ISPN-3918:
------------------------------------

I found another failure mode while trying to work around this issue in {{NonTxPutIfAbsentDuringLeaveStressTest}} (I was also trying to merge it with {{NonTxPutIfAbsentDuringJoinStressTest}} into a {{NonTxPutIfAbsentDuringRebalanceStressTest}}, so the stack traces in the attached log _NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz_ don't match master).

Say {{owners(k) = AB}} in topology {{t}}, and {{owners(k) = BA}} in topology {{t+1}}. In the following scenario, {{putIfAbsent}} fails in thread _C-app1_ and _C-app2_ with different values:

# _C-app1_: start putIfAbsent(k, v1)
# _C-app1_: send putIfAbsent(k, v1) to A (primary)
# _A-remote1_: receive putIfAbsent(k, v1)
# _A-remote1_: send backup request for putIfAbsent(k, v1) to B
# _A-remote1_: write k=v1
# _C-remote1_: receive null for putIfAbsent(k, v1)
# _C-app2_: start putIfAbsent(k, v2)
# _C-app2_: send putIfAbsent(k, v2) to A (primary)
# _A-remote2_: receive putIfAbsent(k, v1)
# _A-remote2_: read v1 from data container, fail the command
# _C-remote2_: receive v1 for putIfAbsent(k, v2)
# *_C-app2_: return v1 (put was unsuccessful)*
# _B-remote1_: install topology t+1, B is now primary owner
# _B-remote1_: receive backup request for putIfAbsent(k, v1)
# _B-remote1_: check topology, send OutdatedTopologyException ack to C
# _C-remote3_: install topology t+1
# _C-app3_: start putIfAbsent(k, v3)
# _C-app3_: send putIfAbsent(k, v3) to B (primary)
# _B-remote3_: receive putIfAbsent(k, v3)
# _B-remote3_: send backup request for putIfAbsent(k, 3) to A
# _B-remote3_: write k=v3
# _C-remote3_: receive null for putIfAbsent(k, v3)
# _A-remote3_: receive backup request for putIfAbsent(k, v3)
# _A-remote3_: write k=v3
# _A-remote3_: send backup ack for putIfAbsent(k, v3) to C
# _C-remote3_: receive backup ack for putIfAbsent(k, v3)
# *_C-app3_: return null (put was successful)*
# _B-remote1_: retry, B is now primary owner
# _B-remote1_: read v2 from data container, fail the command
# _C-remote1_: receive v2 for putIfAbsent(k, v1)
# *_C-app1_: return v2 (put was unsuccessful)*

I think both this scenario and the previous one are worse than the initial report of seeing {{null}}, because we're not respecting the {{READ_COMMITTED}} isolation level (a transaction is seeing a value that was never committed).


> Inconsistent view of the cache with putIfAbsent in a non-tx cache during state transfer
> ---------------------------------------------------------------------------------------
>
>                 Key: ISPN-3918
>                 URL: https://issues.jboss.org/browse/ISPN-3918
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core, State Transfer
>    Affects Versions: 6.0.0.Final
>            Reporter: Dan Berindei
>              Labels: consistency
>             Fix For: 9.2.0.Final
>
>         Attachments: NonTxPutIfAbsentDuringLeaveStressTest.testNodeLeavingDuringPutIfAbsent_8.log.gz, NonTxPutIfAbsentDuringRebalanceStressTest.testPutIfAbsentDuringJoin_1.log.gz, ntpiadjst.log.gz
>
>
> In a non-tx cache, sometimes it's possible for a {{get(k)}} to return {{null}} even though a previous {{putIfAbsent(k, v)}} returned a non-null value and the only concurrent operations on the cache are concurrent putIfAbsent calls.
> Say \[B, A, C] are the owners of k (C just joined)
> 1. A starts a {{putIfAbsent(k, v1)}} command, sends it to B
> 2. B forwards the command to A and C
> 3. C writes {{k=v1}} 
> 4. C becomes the primary owner of k (owners are now  \[C, A])
> 5. A/B see the new topology before committing and throw an outdatedTopologyException
> 6. A retries the command, sends it to C
> 7. C forwards the command to A, which writes {{k=v1}}
> 8. C doesn't have to update the entry, returns null
> If, between steps 3 and 7, another thread on A starts a {{putIfAbsent(k, v2)}} command, the command will fail and return {{v1}} (because the primary owner already has a value). However, a subsequent {{get(k)}} command will return {{null}}, because A is an owner and doesn't have the value.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the infinispan-issues mailing list