[jboss-jira] [JBoss JIRA] Commented: (JGRP-1272) FC: all threads are blocked at credits_available.await

Monday, 17 January 2011

    [
https://issues.jboss.org/browse/JGRP-1272?page=com.atlassian.jira.plugin....
] 

Bela Ban commented on JGRP-1272:
--------------------------------

Rather than creating a JIRA issue right away, we should discuss this on the mailing list..
the way it looks I'm going to close this one too, as it's not reproduceable, like
your other case.

Having said that, there are a few things you could try out (again, we should discuss that
on jg-users):
- FILE_PING.num_initial_members should be more than 1, I suggest set it to 10 for now
- Set TCP.logical_addr_max_size to 500
- Set TCP.logical_addr_cache_expiration to 600000
- Try the latest JGroups (2.12) from GitHub github: github.com/belaban/JGroups

...
 FC: all threads are blocked at credits_available.await
 ------------------------------------------------------

                 Key: JGRP-1272
                 URL: https://issues.jboss.org/browse/JGRP-1272
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.11
         Environment: ubuntu 10.04.1
            Reporter: Victor N
            Assignee: Bela Ban
         Attachments: jgroups-tcp.xml, s1.txt

 After several days of working one node can not rejoin because of FC protocol issue. The
situation is reproduced once or several times per week.
 The problematic node is called "gate9", its view is outdated:
 [gate10.mydomain|1175] [gate10.mydomain, gate7.mydomain, gate8.mydomain, gate9.mydomain,
gate11.mydomain, gate14.mydomain, gate5.mydomain, gate12.mydomain, gate2.mydomain,
gate4.mydomain, gate3.mydomain, gate6.mydomain]
 Actual view (seen by all other nodes) is: [gate10.mydomain|1176] [gate10.mydomain,
gate7.mydomain, gate8.mydomain, gate11.mydomain, gate14.mydomain, gate5.mydomain,
gate12.mydomain, gate2.mydomain, gate4.mydomain, gate3.mydomain, gate6.mydomain]
 In log file at gate9 I can see that it sends CREDIT_REQUEST constantly:
 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to
gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA,
seqno=4308, conn_id=14, TCP: [channel_name=GateCluster]                                   

 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast():
dest=192.168.1.10:40001 (94 bytes)                                                        

 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit():
gate9.mydomain --> XMIT(gate10.mydomain: #2014)                                     
 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to
gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA,
seqno=2014, conn_id=14, TCP: [channel_name=GateCluster]                                   

 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast():
dest=192.168.1.10:40001 (94 bytes)                                                        

 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit():
gate9.mydomain --> XMIT(gate10.mydomain: #1124)
 ...
 At gate10 (the coordinator) I see that it receives the requests and responds:
 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst:
gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC:
CREDIT_REQUEST, UNICAST: DATA, seqno=2397, conn_id=14, TCP: [channel_name=GateCluster]    

 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived():
gate10.mydomain <-- DATA(gate9.mydomain: #2397, conn_id=14)                 
 17/01/2011 09:55:55 UTC| TRACE |
org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain -->
SEND_FIRST_SEQNO(gate9.mydomain)                  
 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to
gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP:
[channel_name=GateCluster]                                                                

 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst:
gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC:
CREDIT_REQUEST, UNICAST: DATA, seqno=1202, conn_id=14, TCP: [channel_name=GateCluster]    

 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived():
gate10.mydomain <-- DATA(gate9.mydomain: #1202, conn_id=14)                 
 17/01/2011 09:55:55 UTC| TRACE |
org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain -->
SEND_FIRST_SEQNO(gate9.mydomain)                  
 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to
gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP:
[channel_name=GateCluster]
 ...
 but I do not see any answer to this at gate9.
 My config and stack traces are attached below. I do not know how to reproduce the problem
in tests. But it occurs, so I can provide you with any details, just let me know. 
-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JGRP-1272) FC: all threads are blocked at credits_available.await