[jboss-jira] [JBoss JIRA] Commented: (JGRP-647) Potentially Meaningless InterruptedIOException Can Be Logged Repeatedly

Wed Feb 6 07:39:04 EST 2008

    [ http://jira.jboss.com/jira/browse/JGRP-647?page=comments#action_12398430 ] 

Galder Zamarreno commented on JGRP-647:
---------------------------------------

I had a look at some TRACE logs from a customer (I've asked the customer to see 
whether it'd be Ok to post them here) and I think I understand what's happening:

I had a closer look to this and seems like PingSender thread is started and stopped 
whenever new members need to be found, see PingWaiter.findInitialMembers():

public Vector findInitialMembers() {
...
 ping_sender.start();

 start_time=System.currentTimeMillis();
 time_to_wait=timeout;

 try {
   while(rsps.size() < num_rsps && time_to_wait > 0 && thread != null && Thread.currentThread().equals(thread)) {
     if(log.isTraceEnabled()) // +++ remove
       log.trace(new StringBuffer("waiting for initial members: time_to_wait=").append(time_to_wait)
       .append(", got ").append(rsps.size()).append(" rsps"));

       try {
         rsps.wait(time_to_wait);
       }
       catch(InterruptedException intex) {
       }
       catch(Exception e) {
         log.error("got an exception waiting for responses", e);
       }
       time_to_wait=timeout - (System.currentTimeMillis() - start_time);
    }
    if(log.isTraceEnabled())
      log.trace(new StringBuffer("initial mbrs are ").append(rsps));
    return new Vector(rsps);
  }
  finally {
    if(ping_sender != null)
      ping_sender.stop();
  }
...

>From the customer logs, I don't see the log message in catch(Exception e) 
or "initial mbrs are ", so I suspect that the ping waiter might be interrupted while 
waiting even before the sender sent the message. This leads to ping 
sender being interrupted in ping_sender.stop() while it's sending the message.

The Exception itself probably can be ignored as the ping sender will try to start 
the thread again when it wants to find new members again.

Looking at the log, there's a continuous attempt to send GET_MBRS_REQ which 
comes from MERGE2.run(), which is periodically trying to find initial members. 
My concern is, why do we start/interrupt the ping sender thread whenever the 
node tries to find new members? 

PING phase only happens once, but finding initial members via MERGE2 happens 
regularly. Wouldn't it be better to have a thread running all the time and react based 
on notifications? Interrupting and starting a thread whenever we need to find new 
members seems to be rather costly.

Thoughts?

> Potentially Meaningless InterruptedIOException Can Be Logged Repeatedly
> -----------------------------------------------------------------------
>
>                 Key: JGRP-647
>                 URL: http://jira.jboss.com/jira/browse/JGRP-647
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.5, 2.6, 2.4.1 SP3, 2.4.1 SP4, 2.5.1
>            Reporter: Jimmy Wilson
>         Assigned To: Bela Ban
>            Priority: Minor
>             Fix For: 2.7, 2.4.2
>
>         Attachments: cluster.log.tar.gz
>
>
> TP.down doesn't catch/handle the InterruptedIOException that can be thrown from the underlying send call when interrupted:
>         try {
>             if(use_outgoing_packet_handler)
>                 outgoing_queue.put(msg);
>             else
>                 send(msg, dest, multicast);
>         }
>         catch(QueueClosedException closed_ex) {
>         }
>         catch(InterruptedException interruptedEx) {
>         }
>         catch(Throwable e) {
>             if(log.isErrorEnabled()) {
>                 String dst=msg.getDest() == null? "null" : msg.getDest().toString();
>                 log.error("failed sending message to " + dst + " (" + msg.getLength() + " bytes)", e.getCause());
>             }
>         }
> This can lead to it being logged repeatedly at the ERROR level:
> 2007-11-14 10:02:30,040 ERROR -> (PingSender) [UDP] failed sending message to null (0 bytes)
> java.io.InterruptedIOException: operation interrupted
> at java.net.PlainDatagramSocketImpl.send(Native Method)
> at java.net.DatagramSocket.send(DatagramSocket.java:612)
> at org.jgroups.protocols.UDP._send(UDP.java:316)
> at org.jgroups.protocols.UDP.sendToAllMembers(UDP.java:286)
> at org.jgroups.protocols.TP.doSend(TP.java:937)
> at org.jgroups.protocols.TP.send(TP.java:926)
> at org.jgroups.protocols.TP.down(TP.java:712)
> at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:499)
> at org.jgroups.stack.Protocol.passDown(Protocol.java:533)
> at org.jgroups.protocols.PING.sendMcastDiscoveryRequest(PING.java:214)
> at org.jgroups.protocols.PING.sendGetMembersRequest(PING.java:208)
> at org.jgroups.protocols.PingSender.run(PingSender.java:59)
> at java.lang.Thread.run(Thread.java:595) 
> 2.5/2.6 don't handle the InterruptedIOException either, but since this was reported with EAP 4.2 (JGroups 2.4.1.SP3), I 'd like to see it in 2.4.2 if possible.  That way, it can make it into a 4.2/4.3 EAP CP.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira