[infinispan-issues] [JBoss JIRA] (ISPN-6350) Data race in the ShardIndexManager under topology changes

Wed Mar 9 10:52:00 EST 2016

     [ https://issues.jboss.org/browse/ISPN-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gustavo Fernandes updated ISPN-6350:
------------------------------------
    Description: 
The following example data race can cause unrecoverable errors during indexing:

\[node1\] cache.put(key)     // key maps to segment 48, owned by node1
\[node1\] starts shard 48
\[node1\] acquires lock on shard 48
\[node1\] starts writing to the index
\[node1\] notification of topology changed, lock released on shard 48
\[node1\] lock reacquired (still writing to the index)   
\[node1\] commit on shard 48
\[node1\] shard still locked
\[node2\] cache.put(key)  // Node2 now owns segment 48
\[node2\] tries to acquire the lock
\[node2\] fail (lock still owned by node1)

The current mechanism employed by the {{ShardIndexManager}} during topology changes involves using a listener and closing the IndexWriter on all nodes upon ownership changes, so that the lock is released and can be reacquired by the new owner (1 segment maps to 1 shard).
Since writing to a shard can take some time, the listener can be triggered in the middle of an index operation and the closing of the index writer will have a very short duration because it is sudden reacquired, and not released again.

  was:
The following example data race can cause unrecoverable errors during indexing:

\[node1\] cache.put(key)     // key maps to segment 48, owned by node1
\[node1\] starts shard 48
\[node1\] acquires lock on shard 48
\[node1\] starts writing to the index
\[node1\] notification of topology changed, lock released on shard 48
\[node1\] lock reacquired (still writing to the index)   
\[node1\] commit on shard 48
\[node1\] shard still locked
\[node2\] cache.put(key)  // Node2 now owns segment 48
\[node2\] tries to acquire the lock
\[node2\] fail (lock still owned by node1)

The current mechanism employed by the {{ShardIndexManager}} during topology changes involves using a listener and closing the IndexWriter on all nodes upon ownership changes, so that the lock is released and can be reacquired by the new owner (1 segment maps to 1 shard).
Since writing to a shard can take some time, the listener can be triggered in the middle of an index operation and the closing of the index writing will have a very short duration because it is sudden reacquired, and not released again.

> Data race in the ShardIndexManager under topology changes
> ---------------------------------------------------------
>
>                 Key: ISPN-6350
>                 URL: https://issues.jboss.org/browse/ISPN-6350
>             Project: Infinispan
>          Issue Type: Bug
>            Reporter: Gustavo Fernandes
>
> The following example data race can cause unrecoverable errors during indexing:
> \[node1\] cache.put(key)     // key maps to segment 48, owned by node1
> \[node1\] starts shard 48
> \[node1\] acquires lock on shard 48
> \[node1\] starts writing to the index
> \[node1\] notification of topology changed, lock released on shard 48
> \[node1\] lock reacquired (still writing to the index)   
> \[node1\] commit on shard 48
> \[node1\] shard still locked
> \[node2\] cache.put(key)  // Node2 now owns segment 48
> \[node2\] tries to acquire the lock
> \[node2\] fail (lock still owned by node1)
> The current mechanism employed by the {{ShardIndexManager}} during topology changes involves using a listener and closing the IndexWriter on all nodes upon ownership changes, so that the lock is released and can be reacquired by the new owner (1 segment maps to 1 shard).
> Since writing to a shard can take some time, the listener can be triggered in the middle of an index operation and the closing of the index writer will have a very short duration because it is sudden reacquired, and not released again.

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)