[jboss-user] [JBoss Cache Users] - TimeoutException while running in cluster

Wed Nov 4 12:12:51 EST 2009

Hi all,

apologies if this was answered already but I have not yet found the solution.

I am running a JBoss cluster, 2 machines, Java 5 with: JBossCache 3.2.0.GA, JBoss 5.1.0.GA and JBossESB 4.6 and essentially performing the following:

  | try {
  |     Node<Object, Object> seriesNode = _addSeries(seriesUuid);
  |     _sanityCheck(info, seriesNode);
  |     _storeInCache(seriesNode, message, info);
  |     if (_isSequenceComplete(seriesNode))
  |         result = _createAggregatedMessage(seriesNode);
  | } catch (Exception e) {
  |     _logger.error("Exception intercepted while processing message: " + e.getLocalizedMessage());
  |     e.printStackTrace();
  | }
  | 

relevant code is also the following:

  | private Node<Object, Object> _addSeries(final String seriesUuid) throws Exception {
  |     final Fqn<String> seriesFQN = Fqn.fromRelativeElements(FQN_BASE, seriesUuid);
  |     try {
  |         txManager.begin();
  |         Node<Object, Object> seriesNode = rootNode.addChild(seriesFQN);
  |         txManager.commit();
  |         _logger.info("Added series " + seriesUuid + " to cache!");
  |         return seriesNode;
  |     } catch (Exception e) {
  |         _logger.warn("Could not add series " + seriesUuid + " to cache!");
  |         throw e;
  |     }
  | }
  | 
  | private void _sanityCheck(MessageDetails info, Node<Object, Object> seriesNode) throws Exception {
  |     try {
  |         if (info.getMessageTerminationCondition() instanceof Boolean && Boolean.TRUE.equals(info.getMessageTerminationCondition())) {
  |             txManager.begin();
  |             seriesNode.put("total", info.getMessageNumber());
  |             txManager.commit();
  |             _logger.info("Series total set (" + info.getMessageNumber() + ")");
  |         }
  |     } catch (Exception e) {
  |         _logger.warn("Could not save termination condition in cache!");
  |         throw e;
  |     }
  | }
  | 
  | private void _storeMessageInCache(final Node<Object, Object> seriesNode, final Message message, final MessageDetails info) throws Exception {
  |     try {
  |         txManager.begin();
  |         Node<Object, Object> messageNode = seriesNode.addChild(Fqn.fromString(info.getMessageNumber().toString()));
  |         messageNode.put("payload", Util.serialize(message));
  |         txManager.commit();
  |         _logger.info("\n******************\nStored message number " + info.getMessageNumber() + ": " + info + "\n******************\n");
  |     } catch (Exception e) {
  |         _logger.warn("Could not store message " + info.getMessageNumber());
  |         throw e;
  |     }
  | }
  | 

I'm omitting the rest as it is not relevant, as far as I can see.
Also, these are the properties I use:

  | <property name="transactionManagerLookupClass">org.jboss.cache.transaction.GenericTransactionManagerLookup</property>
  | <property name="clusterName">${jboss.partition.name:DefaultPartition}-MyCache</property>
  | <property name="multiplexerStack">${jboss.default.jgroups.stack:udp}</property>
  | <property name="fetchInMemoryState">true</property>
  | <property name="isolationLevel">READ_COMMITTED</property>
  | <property name="writeSkewCheck">true</property>
  | <property name="useLockStriping">true</property>
  | <property name="lockParentForChildInsertRemove">false</property>
  | <property name="cacheMode">REPL_SYNC</property>
  | <property name="syncReplTimeout">35000</property>
  | <property name="stateRetrievalTimeout">60000</property>
  | <property name="lockAcquisitionTimeout">30000</property>
  | <property name="useRegionBasedMarshalling">true</property>
  | <property name="serializationExecutorPoolSize">0</property>
  | <property name="listenerAsyncPoolSize">0</property>
  | <property name="cacheLoaderConfig">
  |     <bean class="org.jboss.cache.config.CacheLoaderConfig">
  |         <property name="passivation">false</property>
  |         <property name="shared">true</property>
  |         <property name="preload">/</property>
  |         <property name="individualCacheLoaderConfigs">
  |             <list>
  |                 <bean class="org.jboss.cache.loader.JDBCCacheLoaderConfig">
  |                     <property name="properties">
  |                         cache.jdbc.datasource=java:/pojoCacheDS
  |                         cache.jdbc.table.drop=false
  |                         cache.jdbc.table.name=jbosscache
  |                         cache.jdbc.table.create=true
  |                         cache.jdbc.table.primarykey=jbosscache_pk
  |                         cache.jdbc.fqn.column=fqn
  |                         cache.jdbc.fqn.type=VARCHAR(255)
  |                         cache.jdbc.node.column=node
  |                         cache.jdbc.node.type=BLOB
  |                         cache.jdbc.parent.column=parent
  |                     </property>
  |                     <property name="async">false</property>
  |                     <property name="fetchPersistentState">true</property>
  |                     <property name="purgeOnStartup">false</property>
  |                     <property name="ignoreModifications">false</property>
  |                 </bean>
  |             </list>
  |         </property>
  |     </bean>
  | </property>
  | 

So, when this is run with 3 threads per machine, 2 machines in cluster, 2000 messages coming in asynchronously, at some stage, sometimes, I see in my stacktrace:

  | 15:15:48,386 WARN  [TxInterceptor] Caught exception, will now set transaction to roll back
  | org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996] after [30000] milliseconds for requestor [GlobalTransaction:<10.3.0.181:53730>:1329]! Lock held by [GlobalTransaction:<10.1.0.30:58999>:2852]
  |         at org.jboss.cache.mvcc.MVCCNodeHelper.acquireLock(MVCCNodeHelper.java:159)
  |         at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:253)
  |         at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:186)
  |         at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadIfNeeded(CacheLoaderInterceptor.java:300)
  |         at org.jboss.cache.interceptors.CacheLoaderInterceptor.visitPutDataMapCommand(CacheLoaderInterceptor.java:124)
  |         at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
  |         at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:150)
  |         at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutDataMapCommand(ReplicationInterceptor.java:113)
  |         at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
  |         at org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
  |         at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
  |         at org.jboss.cache.interceptors.TxInterceptor.replayModifications(TxInterceptor.java:501)
  |         at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:388)
  |         at org.jboss.cache.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:134)
  |         at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
  |         at org.jboss.cache.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:140)
  |         at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
  |         at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
  |         at org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178)
  |         at org.jboss.cache.interceptors.InvocationContextInterceptor.visitPrepareCommand(InvocationContextInterceptor.java:106)
  |         at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
  |         at org.jboss.cache.interceptors.InterceptorChain.invokeRemote(InterceptorChain.java:316)
  |         at org.jboss.cache.commands.remote.ReplicateCommand.processSingleCommand(ReplicateCommand.java:139)
  |         at org.jboss.cache.commands.remote.ReplicateCommand.perform(ReplicateCommand.java:115)
  |         at org.jboss.cache.marshall.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:319)
  |         at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:95)
  |         at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637)
  |         at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545)
  |         at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368)
  |         at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775)
  |         at org.jgroups.JChannel.up(JChannel.java:1339)
  |         at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
  |         at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:501)
  |         at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:310)
  |         at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
  |         at org.jgroups.protocols.FC.up(FC.java:473)
  |         at org.jgroups.protocols.pbcast.GMS.up(GMS.java:824)
  |         at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
  |         at org.jgroups.protocols.UNICAST.up(UNICAST.java:299)
  |         at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:884)
  |         at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:716)
  |         at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
  |         at org.jgroups.protocols.FD.up(FD.java:284)
  |         at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
  |         at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
  |         at org.jgroups.protocols.Discovery.up(Discovery.java:264)
  |         at org.jgroups.protocols.PING.up(PING.java:273)
  |         at org.jgroups.protocols.TP.passMessageUp(TP.java:1285)
  |         at org.jgroups.protocols.TP.access$100(TP.java:49)
  |         at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1838)
  |         at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1812)
  |         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
  |         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
  |         at java.lang.Thread.run(Thread.java:613)
  | 
  | 15:15:48,392 WARN  [ClusteredTest] Could not store message 1996
  | 

10.3.0.181 definitively wants to perform a _storeMessageInCache for Fqn /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 while 10.1.0.30 knows nothing about that Fqn but owns some lock...
Btw, /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 IS in the DB...

I wonder if this has anything to do with https://jira.jboss.org/jira/browse/JBCACHE-1540 and, if so, if you would recommend the proposed solution (i.e.: has anyone verified the implications of that workaround?)

Kind Regards
\j

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4263990#4263990

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4263990