[jboss-jira] [JBoss JIRA] Commented: (JGRP-455) Problem with retransmission: message not found

Bela Ban (JIRA) jira-events at lists.jboss.org
Thu May 31 06:05:08 EDT 2007


    [ http://jira.jboss.com/jira/browse/JGRP-455?page=comments#action_12363687 ] 
            
Bela Ban commented on JGRP-455:
-------------------------------

The issue was in STABLE: the heard_from list could be incorrectly updated, so we'd sometimes send a STABILITY message even though we had not yet received votes from all members.
I changed the design slightly, so that now instead of heard_from we have votes, which is empty initially and only when all members have sent their votes, we send a STABILITY message and clear votes again.
The clearing of votes and resetting of the digest is now done with the same lock ("lock"), as there was a race condition when synchronizing separately on digest and votes.
Tests with 1M 1K messages per sender on 8 nodes passed multiple times (before the error occurred after 100K messages).

> Problem with retransmission: message not found
> ----------------------------------------------
>
>                 Key: JGRP-455
>                 URL: http://jira.jboss.com/jira/browse/JGRP-455
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.4.1 SP1
>            Reporter: Bela Ban
>         Assigned To: Bela Ban
>            Priority: Blocker
>             Fix For: 2.5
>
>         Attachments: tmp.txt, tmp2.txt, udp-atl.xml
>
>
> Sometimes a sender cannot find a message with a seqno as requested by a receiver. The seqno requested (47176) was already purged, sent_msgs:  [48059 - 48090] (32)
> This could be either an issue in STABLE, which prematurely purges messages although not all members agreed on the highest seqnos seen, or NAKACK, which might incorrectly 'change' seqnos. Very hard to reproduce !
> Trace:
> 42032 [ERROR] [OOB Thread,perf,10.68.32.33:32968] NAKACK.handleXmitReq(): - (requester=10.68.28.33:33763, local_addr=10.68.32.33:32968) message 10.68.32.33:32968::47176 not found in sent msgs [48059 - 48090] (32)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list