[infinispan-issues] [JBoss JIRA] (ISPN-6350) Data race in the ShardIndexManager under topology changes

Thursday, 10 March 2016

    [
https://issues.jboss.org/browse/ISPN-6350?page=com.atlassian.jira.plugin....
] 

Gustavo Fernandes commented on ISPN-6350:
-----------------------------------------

One way to solve the issue would be to avoid closing the index immediately when
notification of topology change arrives; instead, issue a delayed index close command to
Hibernate Search backend, for example closeAfterCommit(), so that the lock can be released
after the the current writing finishes.

[~sannegrinovero] proposed another approach:

When the listener arrives with topology changes, figure out if the index manager needs to
close the indexwriter, by inspecting the new ownership of the segment.
*  In case ownership does not change, keep the lock open
* If the segment ownership is moving to another node:
** Start forwarding the index works to the new owner, and issue a flushAndClose to the
local index manager. A flushAndClose will have the same effect as the closeAfterCommit()
described earlier, causing the index to be closed after the pending work finishes
*** On the node that is receiving the forwarded index works: wait for the lock to be
available
**** Once available, applies all the index changes
**** If a timeout happens \[1\], forcefully acquires the lock \[2\]

Open questions:
\[1\] how long a timeout should be? It should be bigger than the time the original node
takes to finish its pending work otherwise index corruption might happen.
\[2\] What if the node dies while still holding non-applied changes to the index?

...
 Data race in the ShardIndexManager under topology changes
 ---------------------------------------------------------

                 Key: ISPN-6350
                 URL: https://issues.jboss.org/browse/ISPN-6350
             Project: Infinispan
          Issue Type: Bug
          Components: Embedded Querying
    Affects Versions: 8.2.0.Final
            Reporter: Gustavo Fernandes
            Assignee: Gustavo Fernandes
              Labels: affinity

 The following example data race can cause unrecoverable errors during indexing:
 \[node1\] cache.put(key)     // key maps to segment 48, owned by node1
 \[node1\] starts shard 48
 \[node1\] acquires lock on shard 48
 \[node1\] starts writing to the index
 \[node1\] notification of topology changed, lock released on shard 48
 \[node1\] lock reacquired (still writing to the index)   
 \[node1\] commit on shard 48
 \[node1\] shard still locked
 \[node2\] cache.put(key)  // Node2 now owns segment 48
 \[node2\] starts shard 48
 \[node2\] tries to acquire the lock on shard 48
 \[node2\] fail (lock still owned by node1)
 The current mechanism employed by the {{ShardIndexManager}} during topology changes
involves using a listener and closing the IndexWriter on all nodes upon ownership changes,
so that the lock is released and can be reacquired by the new owner (1 segment maps to 1
shard).
 Since writing to a shard can take some time, the listener can be triggered in the middle
of an index operation and the closing of the index writer will have a very short duration
because it is sudden reacquired, and not released anymore. 

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-6350) Data race in the ShardIndexManager under topology changes