[jboss-jira] [JBoss JIRA] (JGRP-1850) jGroups 3.4.3 timeouts after a node leaves and rejoins cluster

Justin Cranford (JIRA) issues at jboss.org
Tue Jun 10 17:01:16 EDT 2014


     [ https://issues.jboss.org/browse/JGRP-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Cranford updated JGRP-1850:
----------------------------------

    Description: 
I have this open thread on HA-JDBC forum. The issue seems to be jGroups. I attached my Tomcat log there with steps to reproduce the issue.

http://sourceforge.net/p/ha-jdbc/discussion/383397/thread/843c60ad/?limit=25&page=2#338a


Here is an example of what my jGroups error looks like, copied from the HA-JDBC thread. I can reproduce this quite often, possibly every time I try it, but not sure if it happens 100% of the time. I have reproduced it at least half a dozen times.

May 09, 2014 12:01:45 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:46 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:49 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:50 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:53 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:54 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:57 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:58 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:01 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:02 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:03 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)

  was:
I have this open thread on HA-JDBC forum. The issue seems to be jGroups. I attached my Tomcat log there with steps to reproduce the issue.

http://sourceforge.net/p/ha-jdbc/discussion/383397/thread/843c60ad/?limit=25&page=2#338a


Here is an example of what my jGroups error looks like, copied from the HA-JDBC thread. I can reproducible this quite often, possibly every time I tried it but not sure. I have reproduced it at least half a dozen times when my Tomcat node leaves the cluster and rejoins.

May 09, 2014 12:01:45 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:46 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:49 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:50 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:53 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:54 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:57 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
May 09, 2014 12:01:58 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:01 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:02 AM org.jgroups.logging.JDKLogImpl debug
FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
May 09, 2014 12:02:03 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
WARNING: timeout sending message to 10.0.0.187
org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
        at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
        at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
        at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
        at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)



> jGroups 3.4.3 timeouts after a node leaves and rejoins cluster
> --------------------------------------------------------------
>
>                 Key: JGRP-1850
>                 URL: https://issues.jboss.org/browse/JGRP-1850
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4.3
>         Environment: Tomcat 7.0.54
> Java 7u45 x32
> HA-JDBC 3.0.0, 3.0.1, 3.0.2, 3.0.3-SNAPSHOT
> Debian 5 x64
>            Reporter: Justin Cranford
>            Assignee: Bela Ban
>
> I have this open thread on HA-JDBC forum. The issue seems to be jGroups. I attached my Tomcat log there with steps to reproduce the issue.
> http://sourceforge.net/p/ha-jdbc/discussion/383397/thread/843c60ad/?limit=25&page=2#338a
> Here is an example of what my jGroups error looks like, copied from the HA-JDBC thread. I can reproduce this quite often, possibly every time I try it, but not sure if it happens 100% of the time. I have reproduced it at least half a dozen times.
> May 09, 2014 12:01:45 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
> May 09, 2014 12:01:46 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:49 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:50 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> WARNING: timeout sending message to 10.0.0.187
> org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
>         at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
>         at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
> May 09, 2014 12:01:51 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
> May 09, 2014 12:01:53 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:54 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:57 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> WARNING: timeout sending message to 10.0.0.187
> org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
>         at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
>         at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)
> May 09, 2014 12:01:57 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> FINE: Received MemberAcquireLockCommand(writeLock()) from 10.0.0.187
> May 09, 2014 12:01:58 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:02:01 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:02:02 AM org.jgroups.logging.JDKLogImpl debug
> FINE: 10.0.0.188: sending are-you-alive msg to 10.0.0.187
> May 09, 2014 12:02:03 AM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
> WARNING: timeout sending message to 10.0.0.187
> org.jgroups.TimeoutException: timeout sending message to 10.0.0.187
>         at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419)
>         at net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher.execute(JGroupsCommandDispatcher.java:177)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockCoordinator(DistributedLockManager.java:420)
>         at net.sf.hajdbc.lock.distributed.DistributedLockManager$DistributedLock.lockInterruptibly(DistributedLockManager.java:320)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:864)
>         at net.sf.hajdbc.sql.DatabaseClusterImpl.activate(DatabaseClusterImpl.java:156)



--
This message was sent by Atlassian JIRA
(v6.2.3#6260)


More information about the jboss-jira mailing list