[infinispan-issues] [JBoss JIRA] Commented: (ISPN-939) Index corruption when remote node dies during commit

Monday, 21 February 2011

    [
https://issues.jboss.org/browse/ISPN-939?page=com.atlassian.jira.plugin.s...
] 

Sanne Grinovero commented on ISPN-939:
--------------------------------------

ah, right. What about using the SerialMergeScheduler? I now it's not a great solution,
just searching for a temporary workaround for you; also it would be great to know if that
helps.

And you didn't answer about the version you're using :) It's quite different
if you have ISPN-930 included or not: since ISPN-930 was committed (4.2.1.CR2, which I
suppose you're not using because of the xsd issue), the metadata of existing segments
is enabled only at segment close.

About your question: the batches are effectively canceled if you're running a
transactionmanager and use org.infinispan.lucene.locking.TransactionalLockFactory . For
this one to apply index updates and have them visible/committed to the other nodes,
you'll have to frequently close the indexWriter (indexwriter close == commit batched
changes). So using this approach, index will always be guaranteed in consistent state
through the transactionmanager's capabilities but you'll have to wrap your changes
in blocks of work (open IW - apply changes - commit & close IW)(repeat). Also note the
javadoc of TransactionalLockFactory: you'll need SerialMergeScheduler when using this
locking.
Still, an unfinished batch shouldn't have affected the other nodes. you might try
cancelling the batch but I'm not sure of the rollback capabilities of a dummy
transaction manager (which is the implementation behind batching).

The better solution is to hide the suspect exception, or understand if we can remove it.
I'm going to figure out a unit test.

...
 Index corruption when remote node dies during commit
 ----------------------------------------------------

                 Key: ISPN-939
                 URL: https://issues.jboss.org/browse/ISPN-939
             Project: Infinispan
          Issue Type: Bug
          Components: Lucene Directory
    Affects Versions: 4.2.1.CR2
            Reporter: Tristan Tarrant
            Assignee: Sanne Grinovero
         Attachments: read_past_eof.log, suspect_exception_node1.log

 Using a scenario similar to the one described in ISPN-909:
 Infinispan: 3 caches: lockCache (replicated, volatile, no eviction), metadataCache
(replicated, persisted, no eviction), dataCache (distributed, persisted, eviction, hash
numOwners=2)
 Node 1: coordinator, IndexWriter open constantly and writing a stream of documents,
committing after each one
 Node 2: opens a read-only IndexReader to perform queries, using reopen to keep in sync
with the updates coming from node 1
 If we "kill -9" node 2 (to simulate a crash), we get a SuspectException in node
1 during the pre-commit phase (within IndexWriter.commit()). Catching the Throwable we
then close() the writer but from then on we get "Read past EOF" errors when
trying to access the index (both with readers and writers). 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] Commented: (ISPN-939) Index corruption when remote node dies during commit