[jboss-user] [JBoss Cache Users] - MVCC leading to deadlocks in synchronous replication mode

vsevel do-not-reply at jboss.com
Thu Nov 5 04:44:26 EST 2009


Hi,
I have an application where there is a high probability that 2 or more client calls will hit my server asking for a data that is not known from the cache running on this server. In that case, the server calculates the data (eg: it fetches from the db), puts it in the cache then returns it: simple standard use case.

The application is deployed on a cluster of 2 two nodes. clients get load balanced, so there is a high probability that client calls, asking for the same data at the same time, do not all land onto the same server.

Finally, I want to make sure that when the data is in the cache, it is in the other node as well. That is the reason why I decided to use synchronous replication.

As far as locking is concerned, I first tried with the PESSIMISTIC mode. I ended up with lots of deadlocks where 2 clients would land on 2 differents nodes, get each a local lock, then finally deadlock each other at commit time. On a simple stress test case, I was able to make fail most (almost all) of the client calls.

I switched to OPTIMISTIC, and the situation improved. in the same stress the failure ratio went from 95% with PESSIMISTIC to 5 to 10%, with no deadlocks at all (all calls returned in less than 200ms).

Finally, I decided to give MVCC a try. to my surprise, I found that the behavior I was getting was almost identical to the one with PESSIMISTIC. I linked that to the fact that writers immediately take a lock locally, and will take a remote lock at commit time only.

This leads me to 2 reflections:
 - Am I missing something obvious in either my configuration or my code? (I am attaching the jboss cache config and java code)
 - If this behavior is expected, and since OPTIMISTIC and PESSIMISTIC are supposed to be removed from the code base at some time, what is then the solution for users that cannot afford those deadlocks?

Thanks,

Here is the configuration:



mvcc_treecache-cfg.xml:

  | <?xml version="1.0" encoding="UTF-8"?>
  | <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
  | 	<locking nodeLockingScheme="MVCC" isolationLevel="REPEATABLE_READ" lockAcquisitionTimeout="10000" lockParentForChildInsertRemove="false" />
  | 	<transaction transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup" />
  | 	<clustering mode="replication" clusterName="mvcc_domain_TreeCache_VSE">
  | 		<sync replTimeout="10000" />
  | 		<stateRetrieval fetchInMemoryState="false" timeout="10000" />
  | 		<jgroupsConfig configFile="jbosscache_pessimistic/mvcc_treecache-cfg-jgroupsConfig.xml" />
  | 	</clustering>
  | 	<eviction wakeUpInterval="3000">
  | 		<default algorithmClass="org.jboss.cache.eviction.LRUAlgorithm">
  | 			<property name="maxNodes" value="5000" />
  | 			<property name="timeToLive" value="-1" />
  | 		</default>
  | 		<region name="/myregion" algorithmClass="org.jboss.cache.eviction.LRUAlgorithm" eventQueueSize="">
  | 			<property name="maxNodes" value="1000" />
  | 			<property name="timeToLive" value="-1" />
  | 		</region>
  | 		<region name="/myregion/Users" algorithmClass="org.jboss.cache.eviction.LRUAlgorithm" eventQueueSize="">
  | 			<property name="maxNodes" value="0" />
  | 			<property name="timeToLive" value="172800000" />
  | 		</region>
  | 	</eviction>
  | </jbosscache>
  | 

mvcc_treecache-cfg-jgroupsConfig.xml:

  | <config>
  |   <UDP mcast_addr="239.202.87.15" mcast_port="11001" ip_ttl="64"
  |     ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
  |     ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
  |     bind_addr="${jbosscache_bind_addr}" loopback="true" />
  |   <PING timeout="10000" num_initial_members="3" />
  |   <MERGE2 min_interval="10000" max_interval="20000" />
  |   <FD_SOCK />
  |   <VERIFY_SUSPECT timeout="1500" />
  |   <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" />
  |   <UNICAST timeout="600,1200,2400,4800" />
  |   <pbcast.STABLE desired_avg_gossip="40000" />
  |   <FRAG frag_size="8192" />
  |   <pbcast.GMS join_timeout="5000" shun="true"
  |     print_local_addr="true" />
  |   <pbcast.STATE_TRANSFER />
  | </config>
  | 

Here is the stateless session bean impl:

  | public class TestBean implements SessionBean {
  | 
  |     private static final Logger log = Logger.getLogger(TestBean.class);
  | 
  |     private static final long serialVersionUID = -1934030976428012964L;
  | 
  |     private static Map<String, Cache<String, String>> cacheMap = new HashMap<String, Cache<String, String>>();
  | 
  |     private SessionContext sessionContext;
  | 
  |     public static Object lookup(String jndi, String url) throws NamingException {
  |         Properties properties = new Properties();
  |         properties.put(INITIAL_CONTEXT_FACTORY, WLInitialContextFactory.class.getName());
  |         properties.put(PROVIDER_URL, url);
  |         properties.put(ALLOW_GLOBAL_RESOURCE_LOOKUP, "true");
  |         properties.put(ALLOW_EXTERNAL_APP_LOOKUP, "true");
  |         properties.put(RELAX_VERSION_LOOKUP, "true");
  |         return new InitialContext(properties).lookup(jndi);
  |     }
  | 
  |     static synchronized Cache<String, String> getCache(String cacheName) {
  |         try {
  |             Cache<String, String> cache = cacheMap.get(cacheName);
  |             if (cache != null) return cache;
  |             log.info("initializing cache " + cacheName);
  |             DefaultCacheFactory<String, String> factory = new DefaultCacheFactory<String, String>();
  |             cache = factory.createCache("jbosscache_pessimistic/" + cacheName + "_treecache-cfg.xml");
  |             cacheMap.put(cacheName, cache);
  |             log.info("initialized cache " + cacheName);
  |             return cache;
  |         } catch (Exception e) {
  |             log.error(cacheName + " init error", e);
  |             throw new RuntimeException(e.toString(), e);
  |         }
  |     }
  | 
  |     static synchronized void closeCaches() {
  |         for (Map.Entry<String, Cache<String, String>> entry : cacheMap.entrySet()) {
  |             String cacheName = entry.getKey();
  |             try {
  |                 log.info("closing cache " + cacheName);
  |                 Cache<String, String> cache = entry.getValue();
  |                 cache.stop();
  |                 cache.destroy();
  |             } catch (Exception e) {
  |                 log.error("unable to close cache " + cacheName, e);
  |             }
  |         }
  |     }
  | 
  |     public String test(String cacheKey, String cacheValue, String cacheName, long waittime) throws InterruptedException {
  |         log.debug("attempting to get user " + cacheKey);
  |         Cache<String, String> cache = getCache(cacheName);
  |         Fqn<String> userFQN = Fqn.fromString("/myregion/Users/" + cacheKey);
  | 
  |         if (cacheValue == null) {
  |             log.debug("removing node " + userFQN);
  |             cache.removeNode(userFQN);
  |             log.debug("removed node " + userFQN);
  |             return null;
  |         }
  | 
  |         Node<String, String> user = cache.getNode(userFQN);
  | 
  |         if (user == null) {
  |             log.debug("missing cached " + cacheKey + " => caching " + cacheKey + "=" + cacheValue);
  |             user = cache.getRoot().addChild(userFQN);
  |             if(waittime > 0) Thread.sleep(waittime);
  |             user.put("value", cacheValue);
  |             return cacheValue;
  |         } else {
  |             String value = user.get("value");
  |             log.debug("return cached " + cacheKey + "=" + value + "; ignoring passed value: " + cacheValue);
  |             return value;
  |         }
  |     }
  | 

And finally, the stress test client:

  | public class TestSystem {
  | 
  |     private static final Logger log = Logger.getLogger(TestSystem.class);
  | 
  |     @Test
  |     public void testServer() throws Exception {
  |         TestRemote ejb = createService();
  |         log.info("calling test ejb");
  |         ejb.test("myuser3", "toto", "optimistic", 0);
  |         ejb.remove();
  |         log.info("done");
  |     }
  | 
  |     @Test
  |     public void mt() throws Exception {
  | 
  |         // final String cacheName = "pessimistic";
  |         final String cacheName = "optimistic";
  |         // final String cacheName = "mvcc";
  |         final int count = 10000;
  |         int threads = 2;
  |         final String user = "myuser";
  |         final boolean stopOnFirstError = false;
  |         final long waittime = 0;
  | 
  |         Assert.assertNull(createService().test(user, null, cacheName, 0));
  |         Assert.assertEquals("warm up", createService().test(user, "warm up", cacheName, 0));
  |         Assert.assertNull(createService().test(user, null, cacheName, 0));
  | 
  |         log.info("warm up ok");
  |         Thread.sleep(5000);
  |         log.info("starting test");
  |         Thread.sleep(1000);
  | 
  |         final List<String> success = Collections.synchronizedList(new ArrayList<String>());
  |         final List<String> errors = Collections.synchronizedList(new ArrayList<String>());
  |         final List<Long> responseTime = Collections.synchronizedList(new ArrayList<Long>());
  | 
  |         for (int i = 0; i < threads; i++) {
  |             final int id = i;
  | 
  |             new Thread() {
  |                 @Override
  |                 public void run() {
  |                     for (int j = 0; j < count; j++) {
  |                         try {
  |                             if (stopOnFirstError && !errors.isEmpty()) return;
  |                             long start = System.currentTimeMillis();
  |                             String result = createService().test(user, "toto from " + id, cacheName, waittime);
  |                             long time = System.currentTimeMillis() - start;
  |                             responseTime.add(time);
  |                             success.add(result);
  |                             if (time > 1000) log.debug("thread " + id + " test " + j + ": executed in " + time);
  |                             // log.debug("thread " + id + " test " + j + ": " + result);
  |                         } catch (Exception e) {
  |                             String s = TestSystem.toString(e);
  |                             log.debug("thread " + id + " test " + j + ": " + s);
  |                             errors.add(s);
  |                         }
  |                     }
  |                 }
  |             }.start();
  |         }
  | 
  |         int size = 0;
  | 
  |         while ((size = success.size() + errors.size()) != count * threads && !(stopOnFirstError && !errors.isEmpty())) {
  |             try {
  |                 createService().test(user, null, cacheName, 0);
  |                 if (size % 100 == 0) log.debug("tests executed so far: " + size);
  |                 Thread.sleep(200);
  |             } catch (Exception e) {
  |                 log.error("unable to remove user: " + toString(e));
  |             }
  |         }
  | 
  |         Collections.sort(responseTime);
  |         Long maxTime = responseTime.get(responseTime.size() - 1);
  |         log.info("done: success=" + success.size() + "; error=" + errors.size() + "; max time=" + maxTime);
  |     }
  | 
  |     static String toString(Exception e) {
  |         Throwable ex = e;
  |         while (ex.getCause() != null && ex.getCause() != ex)
  |             ex = ex.getCause();
  |         return firstLine(ex.toString()) + "(original=" + firstLine(e.toString()) + ")";
  |     }
  | 
  |     static String firstLine(String s) {
  |         int index = s.indexOf('\n');
  |         return index == -1 ? s : s.substring(0, index - 1);
  |     }
  | 
  |     private TestRemote createService() throws NamingException, RemoteException, CreateException {
  |         // String url = "t3://localhost:7001,localhost:8001";
  |         String jndi = "com.lodh.bank.ejb.TestRemoteHome";
  | 
  |         Object home = TestBean.lookup(jndi, url);
  |         TestRemoteHome h = (TestRemoteHome) PortableRemoteObject.narrow(home, TestRemoteHome.class);
  |         return h.create();
  |     }
  | }
  | 

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4264064#4264064

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4264064



More information about the jboss-user mailing list