[jboss-jira] [JBoss JIRA] (JGRP-1671) It seems TCPConnectionMap didn't restore after network failure
Bela Ban (JIRA)
jira-events at lists.jboss.org
Tue Aug 6 07:05:26 EDT 2013
[ https://issues.jboss.org/browse/JGRP-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795318#comment-12795318 ]
Bela Ban commented on JGRP-1671:
--------------------------------
When node B times out reading A's address, it closes the socket to A, so the TCP connection between A and B should be re-created next time A sends a request to B.
How does the problem you're seeing manifest itself ? Is it that A and B cannot communicate with each other anymore ? Can you come up with a scenario that reproduces the above ? E.g.
* Run A in a debugger
* Set a breakpoint in the code that sends the peer address to B
* Start B
* When the breakpoint in A is hit, don't do anything for a few seconds (the default timeout is 2s)
* Remove the BP in A and continue running it
Would this trigger the issue ?
> It seems TCPConnectionMap didn't restore after network failure
> --------------------------------------------------------------
>
> Key: JGRP-1671
> URL: https://issues.jboss.org/browse/JGRP-1671
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.3.4
> Reporter: Igor Mazur
> Assignee: Bela Ban
>
> I got next exception on node (let say node1).
> WARN [ConnectionMap.Acceptor [xxx.xxx.xxx.xxx:34383],null,null] org.jgroups.protocols.TCP [JGRP00006] failed accepting connection from
> peer: %s
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_17]
> at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_17]
> at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_17]
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) ~[na:1.7.0_17]
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) ~[na:1.7.0_17]
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334) ~[na:1.7.0_17]
> at java.io.DataInputStream.readFully(DataInputStream.java:195) ~[na:1.7.0_17]
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.readPeerAddress(TCPConnectionMap.java:495)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.<init>(TCPConnectionMap.java:377)
> at org.jgroups.blocks.TCPConnectionMap$Acceptor.handleAccept(TCPConnectionMap.java:299)
> at org.jgroups.blocks.TCPConnectionMap$Acceptor.run(TCPConnectionMap.java:283)
> at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
> After it two nodes works in next way:
> node 1 - sends Discovery requests every 3 seconds:
> [2013-08-05 21:02:00,585] TRACE [TransferQueueBundler,global,_index-subscriber-node01] org.jgroups.protocols.TCPPING _index-subscriber-node01.yandex.ru:myt: sending discovery request to xxx.xxx.xxx.xxx:34383
> node 2 - [2013-08-05 21:02:03,791] TRACE [OOB-2,global,_index-subscriber-node02] org.jgroups.protocols.TCPPING _index-subscriber-node02: received GET_MBRS_REQ from _index-subscriber-node01, sending response [PING: type=GET_MBRS_RSP, arg=_index-subscriber-node02, view_id=[_index-subscriber-node03|230], is_server=true, is_coord=false, logical_name=_index-subscriber-node02, physical_addrs=xxx.xxx.xxx.xxx:34383]
> And node 1 - didn't get any response and continue to send discovery request every 3 seconds.
> So it necessary to restart node to restore functionality.
> What is interresting - I see much more simmilar exceptions - and in most cases functionality is restored authomatically. Only few of them break a node.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list