[
http://jira.jboss.com/jira/browse/JGRP-647?page=comments#action_12398430 ]
Galder Zamarreno commented on JGRP-647:
---------------------------------------
I had a look at some TRACE logs from a customer (I've asked the customer to see
whether it'd be Ok to post them here) and I think I understand what's happening:
I had a closer look to this and seems like PingSender thread is started and stopped
whenever new members need to be found, see PingWaiter.findInitialMembers():
public Vector findInitialMembers() {
...
ping_sender.start();
start_time=System.currentTimeMillis();
time_to_wait=timeout;
try {
while(rsps.size() < num_rsps && time_to_wait > 0 && thread !=
null && Thread.currentThread().equals(thread)) {
if(log.isTraceEnabled()) // +++ remove
log.trace(new StringBuffer("waiting for initial members:
time_to_wait=").append(time_to_wait)
.append(", got ").append(rsps.size()).append(" rsps"));
try {
rsps.wait(time_to_wait);
}
catch(InterruptedException intex) {
}
catch(Exception e) {
log.error("got an exception waiting for responses", e);
}
time_to_wait=timeout - (System.currentTimeMillis() - start_time);
}
if(log.isTraceEnabled())
log.trace(new StringBuffer("initial mbrs are ").append(rsps));
return new Vector(rsps);
}
finally {
if(ping_sender != null)
ping_sender.stop();
}
...
From the customer logs, I don't see the log message in
catch(Exception e)
or "initial mbrs are ", so I suspect that the ping
waiter might be interrupted while
waiting even before the sender sent the message. This leads to ping
sender being interrupted in ping_sender.stop() while it's sending the message.
The Exception itself probably can be ignored as the ping sender will try to start
the thread again when it wants to find new members again.
Looking at the log, there's a continuous attempt to send GET_MBRS_REQ which
comes from MERGE2.run(), which is periodically trying to find initial members.
My concern is, why do we start/interrupt the ping sender thread whenever the
node tries to find new members?
PING phase only happens once, but finding initial members via MERGE2 happens
regularly. Wouldn't it be better to have a thread running all the time and react based
on notifications? Interrupting and starting a thread whenever we need to find new
members seems to be rather costly.
Thoughts?
Potentially Meaningless InterruptedIOException Can Be Logged
Repeatedly
-----------------------------------------------------------------------
Key: JGRP-647
URL:
http://jira.jboss.com/jira/browse/JGRP-647
Project: JGroups
Issue Type: Bug
Affects Versions: 2.5, 2.6, 2.4.1 SP3, 2.4.1 SP4, 2.5.1
Reporter: Jimmy Wilson
Assigned To: Bela Ban
Priority: Minor
Fix For: 2.7, 2.4.2
Attachments: cluster.log.tar.gz
TP.down doesn't catch/handle the InterruptedIOException that can be thrown from the
underlying send call when interrupted:
try {
if(use_outgoing_packet_handler)
outgoing_queue.put(msg);
else
send(msg, dest, multicast);
}
catch(QueueClosedException closed_ex) {
}
catch(InterruptedException interruptedEx) {
}
catch(Throwable e) {
if(log.isErrorEnabled()) {
String dst=msg.getDest() == null? "null" :
msg.getDest().toString();
log.error("failed sending message to " + dst + " (" +
msg.getLength() + " bytes)", e.getCause());
}
}
This can lead to it being logged repeatedly at the ERROR level:
2007-11-14 10:02:30,040 ERROR -> (PingSender) [UDP] failed sending message to null (0
bytes)
java.io.InterruptedIOException: operation interrupted
at java.net.PlainDatagramSocketImpl.send(Native Method)
at java.net.DatagramSocket.send(DatagramSocket.java:612)
at org.jgroups.protocols.UDP._send(UDP.java:316)
at org.jgroups.protocols.UDP.sendToAllMembers(UDP.java:286)
at org.jgroups.protocols.TP.doSend(TP.java:937)
at org.jgroups.protocols.TP.send(TP.java:926)
at org.jgroups.protocols.TP.down(TP.java:712)
at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:499)
at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
at org.jgroups.protocols.PING.sendMcastDiscoveryRequest(PING.java:214)
at org.jgroups.protocols.PING.sendGetMembersRequest(PING.java:208)
at org.jgroups.protocols.PingSender.run(PingSender.java:59)
at java.lang.Thread.run(Thread.java:595)
2.5/2.6 don't handle the InterruptedIOException either, but since this was reported
with EAP 4.2 (JGroups 2.4.1.SP3), I 'd like to see it in 2.4.2 if possible. That way,
it can make it into a 4.2/4.3 EAP CP.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira