[jboss-jira] [JBoss JIRA] Commented: (JGRP-486) UNICAST: messages not retransmitted on load

Bela Ban (JIRA) jira-events at lists.jboss.org
Thu Apr 26 11:27:30 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-486?page=comments#action_12360584 ] 
            
Bela Ban commented on JGRP-486:
-------------------------------

The problem was caused by the timer (TimeScheduler) which dropped 80265 under load ! The TimeScheduler thread was still running, but the message was not in the retransmission queue anymore.
SOLUTION: I rewrote TimeScheduler to use Timer rather than its elaborate message fragment handling, and now the above tests are running without any problems. 

Note that TimeScheduler has already been rewritten in 2.5, so this issue should not occur in 2.5. However, this needs testing (same tests as with 2.4.1)

> UNICAST: messages not retransmitted on load
> -------------------------------------------
>
>                 Key: JGRP-486
>                 URL: http://jira.jboss.com/jira/browse/JGRP-486
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Bela Ban
>         Assigned To: Bela Ban
>             Fix For: 2.5, 2.4.1 SP3
>
>
> To reproduce, run JBento in the ATL lab with 6 instances, then run Brian's stress test (3 instances with a total of 3000 threads).
> This only occurs with buddy replication (which uses UNICAST), it doesn't occur with NAKACK. Also, it doesn't occur with TCP as transport (so that's a workaround).
> Let's say we have buddies A and B.
> After some time, B's UNICAST AckReceiverWindow for A shows next_to_remove=80265, msgs=[80266-82139]. This means that we expect 80265 as next seqno, however the lowest seqno we've received is 80266. The window gets new messages every 5 secs (credit requests from A), and adds them. But it cannot deliver them as it hasn't received 80265 yet !
> A's UNICAST AckSenderWindow for B shows 1 message in the retransmission queue: 80265. The stack trace shows that the timer thread is still running (waiting for tasks to execute), but for some reason, 80265 is never retransmitted to B ! We don't see a retransmit() method in the TRACE logs (we do see the other UNICAST methods invoked, e.g. DATA and ACK traces).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list