Socket Exceptions on Coordinator after adding start_port in FD_SOCK
-------------------------------------------------------------------
Key: JGRP-1155
URL:
https://jira.jboss.org/jira/browse/JGRP-1155
Project: JGroups
Issue Type: Bug
Affects Versions: 2.6.13, 2.6.10
Environment: RHEL 5.2.
135.9.147.170 is coordinator, 135.9.147.158 is other member
Connection is over TCP
Jgroup Channel is over port 7802
FD_SOCK is over port 7803
iptables is running, which is why I need to specify a port; the random port selection gets
blocked otherwise
Reporter: Ken Michie
Assignee: Bela Ban
I was able to reproduce it with Draw program on both 2.6.10.GA and 2.6.13.GA. Here is my
command:
java -cp .:log4j.properties:log4j.jar:jgroups-all.jar:commons-logging.jar
org.jgroups.demos.Draw -props kenTcp.xml
log4j.properties was on WARN and above for org.jgroups.
I would tail the jgroups.log file and ONLY on the coordinator I would eventually see these
messages printing every so often:
2010-02-16 15:49:43,144 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:50167 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
2010-02-16 15:49:43,144 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:50167 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
2010-02-16 15:50:07,624 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:38803 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Socket closed
2010-02-16 15:50:07,624 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:38803 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Socket closed
2010-02-16 15:50:33,608 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:44940 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Socket closed
2010-02-16 15:50:33,608 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:44940 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Socket closed
2010-02-16 15:50:33,611 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:55279 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
2010-02-16 15:50:33,611 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:55279 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
2010-02-16 15:50:35,115 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:42406 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
2010-02-16 15:50:35,115 [ConnectionTable.Connection.Sender local_addr=135.9.147.170:7802
[135.9.147.170:42406 - 135.9.147.158:7803],DrawGroupDemo,135.9.147.170:7802] ERROR
org.jgroups.blocks.ConnectionTable - failed sending data to 135.9.147.158:7803:
java.net.SocketException: Broken pipe
You can see it is trying to use some different ephemeral port each time, but from
"netstat -an | grep 780" (FD_SOCK is on 7803, and channel is on 7802), you can
see it is using the wrong socket, where 158 is the other client:
tcp 0 0 ::ffff:135.9.147.170:7802 :::* LISTEN
tcp 0 0 ::ffff:135.9.147.170:7803 :::* LISTEN
tcp 0 0 ::ffff:135.9.147.170:7803 ::ffff:135.9.147.158:41549 ESTABLISHED
tcp 0 0 ::ffff:135.9.147.170:57478 ::ffff:135.9.147.158:7802 ESTABLISHED
tcp 0 0 ::ffff:135.9.147.170:46929 ::ffff:135.9.147.158:7803 ESTABLISHED
Protocol stack in XML (the TCPPING initial_hosts value is of the other member for the
other JGroups member):
<config>
<TCP start_port="7802"
loopback="false"
recv_buf_size="20000000"
send_buf_size="640000"
discard_incompatible_packets="false"
max_bundle_size="128000"
max_bundle_timeout="100"
use_incoming_packet_handler="true"
enable_bundling="true"
use_send_queues="true"
sock_conn_timeout="300"
skip_suspected_members="true"
use_concurrent_stack="true"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="10"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="1000"
thread_pool.rejection_policy="run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="10"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="1000"
oob_thread_pool.rejection_policy="run"/>
<TCPPING timeout="3000"
initial_hosts="135.9.147.170[7800]}"
port_range="4"/>
<MERGE2 max_interval="100000"
min_interval="20000"/>
<FD_SOCK start_port="7803"/>
<FD timeout="10000" max_tries="5" shun="true"/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200" />
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000"
max_bytes="400000"/>
<VIEW_SYNC avg_send_interval="60000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
shun="true"
view_bundling="true"/>
<FC max_credits="2000000"
min_threshold="0.10"/>
<FRAG2 frag_size="60000" />
<pbcast.STREAMING_STATE_TRANSFER/>
<!-- <pbcast.STATE_TRANSFER/> -->
</config>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira