[
https://issues.jboss.org/browse/JGRP-1396?page=com.atlassian.jira.plugin....
]
Bela Ban edited comment on JGRP-1396 at 1/2/12 9:45 AM:
--------------------------------------------------------
Here's another problem: we want to advance 'low' when removing a message (and
nulling it). This happens at the receivers when discard_delivered_msgs is true. But there
is a problem with the following scenario (threads T1 and T2):
- HD=9
- T1 calls add(10). 10 already exists
- T2 calls remove() with nullify=true
- T1 reads HD, value is 9, continues (would terminate if HD was 10)
- T2 checks that element at index HD+1 is not null
- T2 gets element at index HD+1
- T2 increments HD to 10 and nulls the element at index 10
- T1 does a CAS(null, 10) and succeeds because T2 just nulled the element
==> We now deliver the message at index 10 TWICE (or multiple times) !
SOLUTION:
- Interleave the reads and writes of T1 and T2 such that this outcome is impossible:
- remove():
#1 write HD
#2 null element
#3 write LOW
- add(seqno) [with nullify=true]:
#1 read HD (return if seqno <= HD)
#2 read element (return if element != null)
#3 read HD (again!) (return if seqno <= HD)
#4 CAS(null, element): if true: pass message up, else: discard
There is no sequence of add() and remove() with the above solution that ends up with an
incorrect delivery !
was (Author: belaban):
Here's another problem: we want to advance 'low' when removing a message
(and nulling it). This happens at the receivers when discard_delivered_msgs is true. But
there is a problem with the following scenario (threads T1 and T2):
- HD=9
- T1 calls add(10). 10 already exists
- T2 calls remove() with nullify=true
- T1 reads HD, value is 9, continues (would terminate if HD was 10)
- T2 checks that element at index HD+1 is not null
- T2 gets element at index HD+1
- T2 increments HD to 10 and nulls the element at index 10
- T1 does a CAS(null, 10) and succeeds because T2 just nulled the element
==> We now deliver the message at index 10 TWICE (or multiple times) !
SOLUTION:
- Interleave the reads and writes of T1 and T2 such that this outcome is impossible:
- remove():
#1 write HD
#2 null element
- add(seqno) [with nullify=true]:
#1 read HD (return if seqno <= HD)
#2 read element (return if element != null)
#3 read HD (again!) (return if seqno <= HD)
#4 CAS(null, element): if true: pass message up, else: discard
There is no sequence of add() and remove() with the above solution that ends up with an
incorrect delivery !
NAKACK2: merge NakReceiverWindow and Retransmitter
--------------------------------------------------
Key: JGRP-1396
URL:
https://issues.jboss.org/browse/JGRP-1396
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.1
Both NakReceiverWindow and Retransmitter use their own data structures to keep a list of
messages received (NRW) and seqnos to be retransmitted (Retrasmitter). This is redundant
and costly memory-wise.
I suggest let's merge the 2 classes, or at least let them share the data structure
which keeps track of received messages.
Suggestion II: create a ring buffer with a (changeable) capacity that keeps track of
received messages and messages to be retransmitted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira