[jboss-dev-forums] [Design of Clustering on JBoss (Clusters/JBoss)] - Re: JBAS-4574 and JBAS-1476

Tue Jul 24 00:51:18 EDT 2007

I'm not going to get into any issues related to SFSBs, etc., as my understanding of the issue from the customer test Ben showed me is that it was related to caching of the HA-JNDI proxy in the static org.jnp.interfaces.NamingContext.haServers map.  If there's more to it beyond that, I'll let Ben comment.

I have simple unit test that shows the issue.  I haven't checked it in this evening because the test deploys an instance of org.jboss.naming.NamingService and I'm concerned that might screw up the AS in some way.  

But, here's essentially what the test does:

  | Properties env = new Properties();
  | env.setProperty("java.naming.provider.url", namingURL);
  |       
  | Context ctx1 = new InitialContext(env);
  | assertEquals("VALUE", ctx1.lookup("NamingRestartBinding"));
  |       
  | // HOLD ONTO REF to ctx1 so the weak ref to it's Naming stub does 
  | // not get gc'ed from static map in org.jnp.interfaces.NamingContext.
  |       
  | // Redeploy the local and HA naming services
  | redeploy("naming-restart.sar");
  |       
  | Context ctx2 = new InitialContext(env);
  | try
  | {
  |     // This lookup will fail
  |     assertEquals(ObjectBinder.VALUE, ctx2.lookup(ObjectBinder.NAME));
  | }
  | catch (NamingException e)
  | {
  |     log.error("Caught NamingException", e);
  |     fail(e.getMessage());
  | }

The test deploys both an alternate local JNDI and an alternate HA-JNDI.  (I figure bouncing the real services is not very friendly to other tests ;) )  The test fails when I test against the HA-JNDI service; passes with regular JNDI.

When I look into it in detail, the failure mode is clear.  The lookup by ctx1 results in a naming proxy being cached.  Server is restarted, so the RMI stub in the cached proxy no longer matches the one exported by the server.  When ctx2 does a lookup, the cached proxy is used and the call fails with "java.rmi.NoSuchObjectException: no such object in table".  I see no indication the failure has nothing at all to do with the correctness or incorrectness of the the viewId. 

If I let the test continue after the failure and do another lookup with ctx2, it succeeds, since the failure flushes the stale proxy out of the haServers cache.

The interesting thing is the test passes with regular JNDI.  Not sure at this point why.  In both cases the call uses RMI.  With regular JNDI a simple RMI stub is used; with HA-JNDI the RMI stub is encapsulated in an HARMIClient.

Part of the problem here is the use of RMI for the HA-JNDI transport.  If Remoting's socket transport were used, bouncing the server would not invalidate the client-side InvokerLocator.

[OT] Re: 'the "view change" is less than it previously was" after a cluster restart. No.  The viewId passed between the server and HA clients is not a counter.  It is a hash of the service's cluster topology.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4066876#4066876

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4066876