[
https://issues.jboss.org/browse/ISPN-939?page=com.atlassian.jira.plugin.s...
]
Sanne Grinovero commented on ISPN-939:
--------------------------------------
ah, right. What about using the SerialMergeScheduler? I now it's not a great solution,
just searching for a temporary workaround for you; also it would be great to know if that
helps.
And you didn't answer about the version you're using :) It's quite different
if you have ISPN-930 included or not: since ISPN-930 was committed (4.2.1.CR2, which I
suppose you're not using because of the xsd issue), the metadata of existing segments
is enabled only at segment close.
About your question: the batches are effectively canceled if you're running a
transactionmanager and use org.infinispan.lucene.locking.TransactionalLockFactory . For
this one to apply index updates and have them visible/committed to the other nodes,
you'll have to frequently close the indexWriter (indexwriter close == commit batched
changes). So using this approach, index will always be guaranteed in consistent state
through the transactionmanager's capabilities but you'll have to wrap your changes
in blocks of work (open IW - apply changes - commit & close IW)(repeat). Also note the
javadoc of TransactionalLockFactory: you'll need SerialMergeScheduler when using this
locking.
Still, an unfinished batch shouldn't have affected the other nodes. you might try
cancelling the batch but I'm not sure of the rollback capabilities of a dummy
transaction manager (which is the implementation behind batching).
The better solution is to hide the suspect exception, or understand if we can remove it.
I'm going to figure out a unit test.
Index corruption when remote node dies during commit
----------------------------------------------------
Key: ISPN-939
URL:
https://issues.jboss.org/browse/ISPN-939
Project: Infinispan
Issue Type: Bug
Components: Lucene Directory
Affects Versions: 4.2.1.CR2
Reporter: Tristan Tarrant
Assignee: Sanne Grinovero
Attachments: read_past_eof.log, suspect_exception_node1.log
Using a scenario similar to the one described in ISPN-909:
Infinispan: 3 caches: lockCache (replicated, volatile, no eviction), metadataCache
(replicated, persisted, no eviction), dataCache (distributed, persisted, eviction, hash
numOwners=2)
Node 1: coordinator, IndexWriter open constantly and writing a stream of documents,
committing after each one
Node 2: opens a read-only IndexReader to perform queries, using reopen to keep in sync
with the updates coming from node 1
If we "kill -9" node 2 (to simulate a crash), we get a SuspectException in node
1 during the pre-commit phase (within IndexWriter.commit()). Catching the Throwable we
then close() the writer but from then on we get "Read past EOF" errors when
trying to access the index (both with readers and writers).
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira