[jboss-jira] [JBoss JIRA] Commented: (JGRP-1043) UNICAST: high contention

Sat Sep 12 10:34:23 EDT 2009

    [ https://jira.jboss.org/jira/browse/JGRP-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12485493#action_12485493 ] 

Brian Stansberry commented on JGRP-1043:
----------------------------------------

Further to the issue of contention for AckReceiverWindow.msgs. For acking OOB messages, the relevant logic in handleDataReceived is:

        if(msg.isFlagSet(Message.OOB)) {
            if(added)
                up_prot.up(new Event(Event.MSG, msg));
            Message oob_msg=win.removeOOBMessage();
            if(oob_msg != null)
                sendAckForMessage(oob_msg);

            if(!(win.hasMessagesToRemove() && undelivered_msgs.get() > 0))
                return;
        }

       .....  logic to possibly become thread that passes messages up, if yes, then...

            while(true) {
                Message m=win.remove(processing);
                if(m == null) {
                    released_processing=true;
                    return;
                }

                // discard OOB msg as it has already been delivered (http://jira.jboss.com/jira/browse/JGRP-377)
                if(m.isFlagSet(Message.OOB)) {
                    continue;
                }
                num_regular_msgs_removed++;
                sendAckForMessage(m);
                up_prot.up(new Event(Event.MSG, m));
            }

As we discussed, I believe there is a race where win.removeOOBMessage() can return null, resulting in the OOB message not getting acked, since the logic in the later while loop won't ack an OOB.

Wait! Just realized it's not just the race we discussed, where OOB msg 1 gets added and then another thread adds msg 2 and removes 1 before the first thread calls win.removeOOBMessage(). This can happen in the simple case where AckReceiverWindow.next_to_remove is only up to say seqno 5 and an OOB msg 10 comes in. In that case win.removeOOBMessage() will return null and the msg will not get ACKed. (As we discussed, AckSenderWindow.ack() will help clean up after this, by cleaning 10 once it gets an ack for 11).

Anyway, back to my original point. :)  The win.removeOOBMessage() call is adding contention on AckReceiverWindow.msgs. Just acking OOB messages in the while loop seems cleaner.

Is the goal of the win.removeOOBMessage() and  if(!(win.hasMessagesToRemove() && undelivered_msgs.get() > 0))  return; to try and provide an escape route for OOB threads so they don't become the thread responsible for passing message up?  OK, later comments about JGRP-781 indicate that's the case. In any case the test can be if(!(undelivered_msgs.get() > 0 && win.hasMessagesToRemove()))  Do the cheap test before the expensive one.

Perhaps the whole thing becomes:

        final AtomicBoolean processing=win.getProcessing();
        if(msg.isFlagSet(Message.OOB)) {
            if(added)
                up_prot.up(new Event(Event.MSG, msg));

            // JGRP-781. If no thread is processing the window and passing
            // messages up, we may need to take on that duty even 
            // though we are an OOB thread
            if (!processing.get())
            {
               // First try and clear our own message from the window
               Message oob_msg=win.removeOOBMessage();
	       if(oob_msg != null)
	        sendAckForMessage(oob_msg);

               // Check and see if there is processing work to do; if not
               // return and let whatever thread adds work become the 
               // processing thread
	       if(!(undelivered_msgs.get() > 0 && win.hasMessagesToRemove()))
		 return;
            }
            else {
                return;
            }
        }

I think the undelivered_msgs tracking would be better placed inside AckReceiverWindow. Right now the win.add() and undelivered_msgs.incrementAndGet() calls are not atomic, nor are the remove() and decrement calls. So you can get races plus it makes handleDataReceived more complicated.

> UNICAST: high contention
> ------------------------
>
>                 Key: JGRP-1043
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1043
>             Project: JGroups
>          Issue Type: Task
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.6.13, 2.8
>
>
> If UNICAST receives a high number of messages *and* sends a high number of messages concurrently, there will be a lot of retransmissions. Reason is contention on 'connections', with up- and down messages accessing it concurrently. Both up- and down- messages impede each other and thus messages are not received in time, causing retransmissions (ACKs to be resent).
> SOLUTION:
> #1 Reduce contention and turn 'connections' from HashMap into ConcurrentHashMap
> #2 Break 'connections' into a send-table and receive-table: sends and acks access the send-table, receives the receive-table

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira