[jboss-jira] [JBoss JIRA] Commented: (JBAS-6135) Concurrent connection of HAPartition channels fails

Tue Oct 28 11:19:21 EDT 2008

    [ https://jira.jboss.org/jira/browse/JBAS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12435769#action_12435769 ] 

Brian Stansberry commented on JBAS-6135:
----------------------------------------

In attached failed-start-server.log, relevant logging starts at line 5734.  In testrun-1018-server.log it's at line 5733.

In both cases, the code executing is from org.jboss.ha.framework.server.ClusterPartition.startService():

         // Do the channel connect in another thread while this
         // thread starts the cache and does that channel connect
         ChannelConnectTask task = new ChannelConnectTask(connectLatch);
         this.threadPool.run(task);
      }

      this.cacheHandler.startCache();

      try
      {
         // This will block waiting for any async channel connect above
         connectLatch.await();

         if (this.connectException != null)
         {
            throw this.connectException;
         }

         this.log.debug("Get current members");
         this.waitForView();

         // get current JG group properties
         this.log.debug("get nodeName");

The thread executing this code is named "main"; it's visible in both logs.  The ChannelConnectTask is run in a thread pool thread; it's logging is only shown in failed-start-server.log; in testrun-1018-server.log it doesn't log.

The ChannelConnectTask is as follows:

public void run()
      {
         try
         {
            ClusterPartition.this.channel.connect(ClusterPartition.this.getPartitionName());
         }
         catch (Exception e)
         {
            synchronized (ClusterPartition.this.channelLock)
            {
               ClusterPartition.this.connectException = e;
            }
         }
         finally
         {
            this.latch.countDown();
         }
      }

I checked whether the failure to proceed could be due to the synchronized block in the catch block. But the other code that might contend for ClusterPartition.this.channelLock does not execute until after the  this.log.debug("Get current members"); logging shown above; as you can see in the logs that logging never occurs.

In the testrun-1018-server.log, the "main" thread shows the expected logging for JBC channel connect and state transfer. So in that case, that connect seemed to succeed, while logging was emitted by the thread running the ChannelConnectTask.  In the failed-start-server.log, the connect run by the ChannelConnectTask logs as expected, but no logging from the "main" thread for the JBC channel connect appears.

> Concurrent connection of HAPartition channels fails
> ---------------------------------------------------
>
>                 Key: JBAS-6135
>                 URL: https://jira.jboss.org/jira/browse/JBAS-6135
>             Project: JBoss Application Server
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Clustering
>            Reporter: Brian Stansberry
>            Assignee: Brian Stansberry
>            Priority: Critical
>             Fix For: JBossAS-5.0.0.GA
>
>         Attachments: failed-start-server.log, testrun-1018-server.log
>
>
> Seeing intermittent testsuite server startup failures during HAPartition's attempt to concurrently start two JGroups channels. Attached log shows an example.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira