[infinispan-issues] [JBoss JIRA] (ISPN-9044) In Cluster - Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node

Mon Apr 9 03:59:00 EDT 2018

    [ https://issues.jboss.org/browse/ISPN-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557814#comment-13557814 ] 

Debashish Bharali edited comment on ISPN-9044 at 4/9/18 3:58 AM:
-----------------------------------------------------------------

{color:red}*---ReplStateTransferCacheLoaderTest.java---*{color}

@Test(groups = "functional", testName = "statetransfer.ReplStateTransferCacheLoaderTest")
@CleanupAfterMethod
public class ReplStateTransferCacheLoaderTest extends MultipleCacheManagersTest implements Serializable{

   /**
	 * 
	 */
	private static final long serialVersionUID = 1L;

private static final Log log = LogFactory.getLog(ReplStateTransferCacheLoaderTest.class);

   private File tmpDir;
   private ConfigurationBuilder builder;

   @Override
   protected void createCacheManagers() {
      tmpDir = new File(TestingUtil.tmpDirectory(this.getClass()));
      Util.recursiveFileRemove(tmpDir);

      // reproduce the MODE-1754 config as closely as possible
      builder = getDefaultClusteredCacheConfig(CacheMode.REPL_SYNC, true, true);
      builder.transaction().*transactionMode(TransactionMode.NON_TRANSACTIONAL)*.*lockingMode(LockingMode.OPTIMISTIC)*
            .transactionManagerLookup(new DummyTransactionManagerLookup())
            .eviction().*maxEntries(100)*.*strategy(EvictionStrategy.LRU)*
            .locking().lockAcquisitionTimeout(20000)
            .*concurrencyLevel(1000)* // lowering this to 50 makes the test pass also on 5.2 but it's just a temporary workaround
            .useLockStriping(false).writeSkewCheck(false).isolationLevel(IsolationLevel.READ_COMMITTED)
            .dataContainer().storeAsBinary()
            .clustering().sync().*replTimeout(120000)*
            .*stateTransfer().timeout(480000)*.*fetchInMemoryState(true)*.chunkSize(10000).*awaitInitialTransfer(true)*
            .persistence().passivation(false).addSingleFileStore().location(new File(tmpDir, "store0").getAbsolutePath()).shared(false).preload(false)
            .*fetchPersistentState(true)*
            .ignoreModifications(false)
            .purgeOnStartup(false);

      createCluster(builder, 1);
      waitForClusterToForm();
   }

   @AfterClass
   protected void clearTempDir() {
//      Util.recursiveFileRemove(tmpDir);
   }

   public void *testStateTransfer*() throws Exception {
      final Long *numKeys = 100000l*;
      
      for (Long i = 0l; i < numKeys; i++) {
    	  TestEntity testEntity = new TestEntity(i,"DEBA_"+i);
    	  cache(0).put(i, testEntity);
      }
      
      log.info("Finished putting keys");
      System.out.println("Debashish -- "+"Finished putting keys");

      for (Long i = 0l; i < numKeys; i++) {
         assertEquals(i, ((TestEntity)cache(0).get(i)).getId());
      }

      log.info("Adding a new node ..");
      System.out.println("Debashish -- "+"Adding a new node ..");
      
      builder.persistence().clearStores().addSingleFileStore().location(new File(tmpDir, "store1").getAbsolutePath())  // make sure this node writes in a different location
            .fetchPersistentState(true)
            .ignoreModifications(false)
            .purgeOnStartup(false);

      addClusterEnabledCacheManager(builder);
      log.info("Added a new node");
      System.out.println("Debashish -- "+"Added a new node");

      for (Long i = 0l; i < numKeys; i++) {
    	  assertEquals(i, ((TestEntity)cache(1).get(i)).getId());
//         assertEquals(i, cache(1).get(i));   // some keys are lost in 5.2
      }
   }
   
   
}


was (Author: debashish.bharali):
{color:red}*---ReplStateTransferCacheLoaderTest.java---*{color}

@Test(groups = "functional", testName = "statetransfer.ReplStateTransferCacheLoaderTest")
@CleanupAfterMethod
public class ReplStateTransferCacheLoaderTest extends MultipleCacheManagersTest implements Serializable{

   /**
	 * 
	 */
	private static final long serialVersionUID = 1L;

private static final Log log = LogFactory.getLog(ReplStateTransferCacheLoaderTest.class);

   private File tmpDir;
   private ConfigurationBuilder builder;

   @Override
   protected void createCacheManagers() {
      tmpDir = new File(TestingUtil.tmpDirectory(this.getClass()));
      Util.recursiveFileRemove(tmpDir);

      // reproduce the MODE-1754 config as closely as possible
      builder = getDefaultClusteredCacheConfig(CacheMode.REPL_SYNC, true, true);
      builder.transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL).lockingMode(LockingMode.OPTIMISTIC)
            .transactionManagerLookup(new DummyTransactionManagerLookup())
            .eviction().maxEntries(100).strategy(EvictionStrategy.LRU)
            .locking().lockAcquisitionTimeout(20000)
            .concurrencyLevel(1000) // lowering this to 50 makes the test pass also on 5.2 but it's just a temporary workaround
            .useLockStriping(false).writeSkewCheck(false).isolationLevel(IsolationLevel.READ_COMMITTED)
            .dataContainer().storeAsBinary()
            .clustering().sync().replTimeout(120000)
            .stateTransfer().timeout(480000).fetchInMemoryState(true).chunkSize(10000).awaitInitialTransfer(true)
            .persistence().passivation(false).addSingleFileStore().location(new File(tmpDir, "store0").getAbsolutePath()).shared(false).preload(false)
            .fetchPersistentState(true)
            .ignoreModifications(false)
            .purgeOnStartup(false);

      createCluster(builder, 1);
      waitForClusterToForm();
   }

   @AfterClass
   protected void clearTempDir() {
//      Util.recursiveFileRemove(tmpDir);
   }

   public void testStateTransfer() throws Exception {
      final Long numKeys = 100000l;
      
      for (Long i = 0l; i < numKeys; i++) {
    	  TestEntity testEntity = new TestEntity(i,"DEBA_"+i);
    	  cache(0).put(i, testEntity);
      }
      
      log.info("Finished putting keys");
      System.out.println("Debashish -- "+"Finished putting keys");

      for (Long i = 0l; i < numKeys; i++) {
         assertEquals(i, ((TestEntity)cache(0).get(i)).getId());
      }

      log.info("Adding a new node ..");
      System.out.println("Debashish -- "+"Adding a new node ..");
      
      builder.persistence().clearStores().addSingleFileStore().location(new File(tmpDir, "store1").getAbsolutePath())  // make sure this node writes in a different location
            .fetchPersistentState(true)
            .ignoreModifications(false)
            .purgeOnStartup(false);

      addClusterEnabledCacheManager(builder);
      log.info("Added a new node");
      System.out.println("Debashish -- "+"Added a new node");

      for (Long i = 0l; i < numKeys; i++) {
    	  assertEquals(i, ((TestEntity)cache(1).get(i)).getId());
//         assertEquals(i, cache(1).get(i));   // some keys are lost in 5.2
      }
   }
   
   
}

> In Cluster - Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ISPN-9044
>                 URL: https://issues.jboss.org/browse/ISPN-9044
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Lucene Directory
>    Affects Versions: 8.2.5.Final
>            Reporter: Debashish Bharali
>            Priority: Critical
>         Attachments: neutrino-hibernatesearch-infinispan.xml
>
>
> Infinispan - SingleFileStore - fetchPersistentState/StateTransfer not transferring complete data to Joining Node.
> Related to ISPN-8980 (https://issues.jboss.org/browse/ISPN-8980).
> We are using Hibernate Search Indexes - Lucene indexes being stored on Infinispan with SingleFileStore.
> In case of more than 1 node. For example 4 nodes. We are observing below behaviour.
> Below are the steps:
> # We startup the first node *'N1'* in maintenance mode - with MassIndexer - creating initial indexes.
> # Now after all the MassIndexer/EntityLoader threads ends (after 1-2 Hrs). I.e. MassIndexing has been completed. We startup all other 3 nodes *'N2' , 'N3' and 'N4'*. Without MassIndexer.
> # Now on moderate to heavy application usage (concurrency), we are again getting the same exception of *Exception occurred java.io.FileNotFoundException: Error loading metadata for index file. Which indicates, {color:red}Some entries are not present in cache.{color}*
> # *But this exception comes only on the other 3 nodes (N2, N3 and N4). Not on the first node N1.*
> # On checking the sizes of the Cache stores in all the Nodes, the 3 Nodes (N2,N3 and N4) are having almost equal size (600 MB), which is 50%-70% of the size of Cache Stores of N1 (1.2 GB).
> # We have repeated these steps multiple times. Even switched MassIndexing node to other 3 nodes too. We have even reduced the number of nodes to 2.
> # *But the behaviour is exactly same. I.e. Exception on all the nodes except the initial node doing MassIndexing.*
> # {color:red} It seems like, *'N1's* cache-store's persistent state is not getting fetched by *'N2' 'N3' and 'N4'*, when these node joins joins.{color}
> # This is indicated by the fact that, FileNotFoundException doesn't comes in 'N1'. It comes in other nodes only (who joined later -- like N2, N3 & N4). And size of cache store's *'.DAT'* files are smaller then *'N1's*.
> Require urgent support.
> Attaching the corresponding Infinispan config file (neutrino-hibernatesearch-infinispan.xml)


--
This message was sent by Atlassian JIRA
(v7.5.0#75005)