[
https://issues.jboss.org/browse/JGRP-1671?page=com.atlassian.jira.plugin....
]
lokesh raheja commented on JGRP-1671:
-------------------------------------
{code:java}
RROR [TransferQueueBundler,CNC-prod,hybrisnode-703] [] () [org.jgroups.protocols.TCP]
JGRP000034: hybrisnode-703: failure sending message to 10.71.193.206:7800:
java.net.SocketException: Socket closed
ERROR [Timer-15,CNC-prod,hybrisnode-703] [] () [org.jgroups.protocols.TCP] JGRP000029:
hybrisnode-703: failed sending message to 10.71.193.190:7800 (105 bytes):
java.net.SocketException: Socket closed, headers: JDBC_PING: [PING: type=GET_MBRS_REQ,
cluster=CNC-prod], TCP: [channel_name=CNC-prod]
2018-12-25 08:11:43,672 ERROR [TransferQueueBundler,CNC-prod,hybrisnode-703] [] ()
[org.jgroups.protocols.TCP] JGRP000036: hybrisnode-703: exception sending bundled msgs:
java.lang.NullPointerException
WARN [ConnectionMap.Acceptor [10.71.193.174:7800],null,null] [] ()
[org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.readPeerAddress(TCPConnectionMap.java:495)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.<init>(TCPConnectionMap.java:377)
at
org.jgroups.blocks.TCPConnectionMap$Acceptor.handleAccept(TCPConnectionMap.java:299)
at org.jgroups.blocks.TCPConnectionMap$Acceptor.run(TCPConnectionMap.java:283)
at java.lang.Thread.run(Thread.java:748)
{code}
It seems TCPConnectionMap didn't restore after network failure
--------------------------------------------------------------
Key: JGRP-1671
URL:
https://issues.jboss.org/browse/JGRP-1671
Project: JGroups
Issue Type: Bug
Affects Versions: 3.3.4
Reporter: Igor Mazur
Assignee: Bela Ban
Priority: Major
I got next exception on node (let say node1).
WARN [ConnectionMap.Acceptor [xxx.xxx.xxx.xxx:34383],null,null]
org.jgroups.protocols.TCP [JGRP00006] failed accepting connection from
peer: %s
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_17]
at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_17]
at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_17]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) ~[na:1.7.0_17]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
~[na:1.7.0_17]
at java.io.BufferedInputStream.read(BufferedInputStream.java:334) ~[na:1.7.0_17]
at java.io.DataInputStream.readFully(DataInputStream.java:195) ~[na:1.7.0_17]
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.readPeerAddress(TCPConnectionMap.java:495)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.<init>(TCPConnectionMap.java:377)
at
org.jgroups.blocks.TCPConnectionMap$Acceptor.handleAccept(TCPConnectionMap.java:299)
at org.jgroups.blocks.TCPConnectionMap$Acceptor.run(TCPConnectionMap.java:283)
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
After it two nodes works in next way:
node 1 - sends Discovery requests every 3 seconds:
[2013-08-05 21:02:00,585] TRACE [TransferQueueBundler,global,_index-subscriber-node01]
org.jgroups.protocols.TCPPING _index-subscriber-node01: sending discovery
request to xxx.xxx.xxx.xxx:34383
node 2 - [2013-08-05 21:02:03,791] TRACE [OOB-2,global,_index-subscriber-node02]
org.jgroups.protocols.TCPPING _index-subscriber-node02: received GET_MBRS_REQ
from _index-subscriber-node01, sending response [PING: type=GET_MBRS_RSP,
arg=_index-subscriber-node02, view_id=[_index-subscriber-node03|230], is_server=true,
is_coord=false, logical_name=_index-subscriber-node02,
physical_addrs=xxx.xxx.xxx.xxx:34383]
And node 1 - didn't get any response and continue to send discovery request every 3
seconds.
So it necessary to restart node to restore functionality.
What is interresting - I see much more simmilar exceptions - and in most cases
functionality is restored authomatically. Only few of them break a node.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)