[jboss-jira] [JBoss JIRA] Updated: (JGRP-1272) FC: all threads are blocked at credits_available.await
Victor N (JIRA)
jira-events at lists.jboss.org
Mon Jan 17 05:32:49 EST 2011
[ https://issues.jboss.org/browse/JGRP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victor N updated JGRP-1272:
---------------------------
Attachment: s1.txt
stack traces at the problematic gate9
> FC: all threads are blocked at credits_available.await
> ------------------------------------------------------
>
> Key: JGRP-1272
> URL: https://issues.jboss.org/browse/JGRP-1272
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.11
> Environment: ubuntu 10.04.1
> Reporter: Victor N
> Assignee: Bela Ban
> Attachments: jgroups-tcp.xml, s1.txt
>
>
> After several days of working one node can not rejoin because of FC protocol issue. The situation is reproduced once or several times per week.
> The problematic node is called "gate9", its view is outdated:
> [gate10.mydomain|1175] [gate10.mydomain, gate7.mydomain, gate8.mydomain, gate9.mydomain, gate11.mydomain, gate14.mydomain, gate5.mydomain, gate12.mydomain, gate2.mydomain, gate4.mydomain, gate3.mydomain, gate6.mydomain]
> Actual view (seen by all other nodes) is: [gate10.mydomain|1176] [gate10.mydomain, gate7.mydomain, gate8.mydomain, gate11.mydomain, gate14.mydomain, gate5.mydomain, gate12.mydomain, gate2.mydomain, gate4.mydomain, gate3.mydomain, gate6.mydomain]
> In log file at gate9 I can see that it sends CREDIT_REQUEST constantly:
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=4308, conn_id=14, TCP: [channel_name=GateCluster]
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast(): dest=192.168.1.10:40001 (94 bytes)
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit(): gate9.mydomain --> XMIT(gate10.mydomain: #2014)
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate10.mydomain, src=gate9.mydomain, headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=2014, conn_id=14, TCP: [channel_name=GateCluster]
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.BasicTCP.sendUnicast(): dest=192.168.1.10:40001 (94 bytes)
> 17/01/2011 09:55:52 UTC| TRACE | org.jgroups.protocols.UNICAST.retransmit(): gate9.mydomain --> XMIT(gate10.mydomain: #1124)
> ...
> At gate10 (the coordinator) I see that it receives the requests and responds:
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst: gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=2397, conn_id=14, TCP: [channel_name=GateCluster]
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived(): gate10.mydomain <-- DATA(gate9.mydomain: #2397, conn_id=14)
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain --> SEND_FIRST_SEQNO(gate9.mydomain)
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP: [channel_name=GateCluster]
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.passMessageUp(): received [dst: gate10.mydomain, src: gate9.mydomain (3 headers), size=9 bytes], headers are FC: CREDIT_REQUEST, UNICAST: DATA, seqno=1202, conn_id=14, TCP: [channel_name=GateCluster]
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.handleDataReceived(): gate10.mydomain <-- DATA(gate9.mydomain: #1202, conn_id=14)
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.UNICAST.sendRequestForFirstSeqno(): gate10.mydomain --> SEND_FIRST_SEQNO(gate9.mydomain)
> 17/01/2011 09:55:55 UTC| TRACE | org.jgroups.protocols.TP.down(): sending msg to gate9.mydomain, src=gate10.mydomain, headers are UNICAST: SEND_FIRST_SEQNO, seqno=0, TCP: [channel_name=GateCluster]
> ...
> but I do not see any answer to this at gate9.
> My config and stack traces are attached below. I do not know how to reproduce the problem in tests. But it occurs, so I can provide you with any details, just let me know.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list