[jboss-jira] [JBoss JIRA] (JGRP-1272) FC: all threads are blocked at credits_available.await

Thu Jun 7 00:40:17 EDT 2012

     [ https://issues.jboss.org/browse/JGRP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban deleted JGRP-1272:
---------------------------

    
> FC: all threads are blocked at credits_available.await
> ------------------------------------------------------
>
>                 Key: JGRP-1272
>                 URL: https://issues.jboss.org/browse/JGRP-1272
>             Project: JGroups
>          Issue Type: Bug
>         Environment: CentOS release 5.2 (Final)
> kernel 2.6.27.24rootserver-20090525a
> JDK 1.6.0_22 64-bit
>            Reporter: Victor N
>            Assignee: Bela Ban
>
> After several days of working one node can not rejoin because of FC protocol issue. The situation is reproduced once or several times per week.
> The problematic node is called "gate9", its view is outdated:
> [gate10.mydomain|1175] [gate10.mydomain, gate7.mydomain, gate8.mydomain, gate9.mydomain, gate11.mydomain, gate14.mydomain, gate5.mydomain, gate12.mydomain, gate2.mydomain, gate4.mydomain, gate3.mydomain, gate6.mydomain]
> Actual view (seen by all other nodes) is: [gate10.mydomain|1176] [gate10.mydomain, gate7.mydomain, gate8.mydomain, gate11.mydomain, gate14.mydomain, gate5.mydomain, gate12.mydomain, gate2.mydomain, gate4.mydomain, gate3.mydomain, gate6.mydomain]
> In log file at gate9 I can see that it sends CREDIT_REQUEST constantly:
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=4308, conn_id=14, TCP: [channel_name=GateCluster]                                                                                                                        
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast(): dest=192.168.1.10:40001 (94 bytes)                                                                 
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit(): gate9.mydomain --> XMIT(gate10.mydomain: #2014)                                     
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=2014, conn_id=14, TCP: [channel_name=GateCluster]                                                                                                                        
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast(): dest=192.168.1.10:40001 (94 bytes)                                                                 
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit(): gate9.mydomain --> XMIT(gate10.mydomain: #1124)
> ...
> At gate10 (the coordinator) I see that it receives the requests and responds:
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst: gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=2397, conn_id=14, TCP: [channel_name=GateCluster]                                                                                   
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived(): gate10.mydomain <-- DATA(gate9.mydomain: #2397, conn_id=14)                 
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain --> SEND_FIRST_SEQNO(gate9.mydomain)                  
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP: [channel_name=GateCluster]                                                                                                                                               
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst: gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=1202, conn_id=14, TCP: [channel_name=GateCluster]                                                                                   
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived(): gate10.mydomain <-- DATA(gate9.mydomain: #1202, conn_id=14)                 
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain --> SEND_FIRST_SEQNO(gate9.mydomain)                  
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP: [channel_name=GateCluster]
> ...
> but I do not see any answer to this at gate9.
> My config and stack traces are attached below. I do not know how to reproduce the problem in tests. But it occurs, so I can provide you with any details, just let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira