[
https://issues.jboss.org/browse/JGRP-1644?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-1644:
--------------------------------
I started looking into this issue now.
So what exactly do you do ? Can you come up with a small program that reproduces the issue
?
Re concurrent calling of receive(): JGroups guarantees that regular (not OOB!) messages
are delivered FIFO per sender. So messages A1 and A2 from A will be delivered A1 -> A2,
but if there's a message from B, it will be delivered in parallel to A1 and A2.
You realize that using multiple threads, *application ordering* of messages can get
incorrect, but you said that you only send from 1 thread, so that should be fine.
NAKACK2 violates FIFO property
------------------------------
Key: JGRP-1644
URL:
https://issues.jboss.org/browse/JGRP-1644
Project: JGroups
Issue Type: Bug
Affects Versions: 3.3.1
Environment: Ubuntu 12.04 LTS, kernel 3.2.0-24-generic #39-Ubuntu SMP Mon May 21
16:52:17 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux, Java 1.7.0_21
Reporter: Vadim Tsesko
Assignee: Bela Ban
Fix For: 3.4
Attachments: TCP-NAKACK2.png, UDP-NAKACK2-NAKACK.png
In the [documentation
documentation|http://www.jgroups.org/manual/html/protlist.html#ReliableMe...]
it is stated that:
{quote}
NAKACK provides reliable delivery and FIFO (= First In First Out) properties for messages
sent to all nodes in a cluster.
{quote}
and
{quote}
NAKACK2 was introduced in 3.1 and is a successor to NAKACK (at some point it will replace
NAKACK). It has the same properties as NAKACK, but its implementation is faster and uses
less memory, plus it creates fewer tasks in the timer.
{quote}
I have observed that sometimes multicast messages are received out of order.
We use the following protocol stack configuration:
{code:xml}
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/JGroups-3.3.xsd">
<UDP bind_addr="match-interface:$interface"
bind_interface="$interface"
bind_port="$unicastPort"
ip_ttl="128"
mcast_addr="$multicastGroup"
mcast_port="$multicastPort"
singleton_name="udp-transport"/>
<PING return_entire_cache="true"
break_on_coord_rsp="false"/>
<MERGE3/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT/>
<BARRIER/>
<pbcast.NAKACK print_stability_history_on_failed_xmit="true"/>
<pbcast.STABLE/>
<pbcast.GMS/>
<MFC max_credits="8M"/>
<FRAG2/>
<RSVP/>
</config>
{code}
As you can see, mostly we use the defaults.
The messages are being sent from a single thread using the following code:
{code:java}
channel.send(new Message(null, msg))
{code}
Each message has size from 300 KB up to 4 MB. The message rate is 1-5 messages per
second.
We have a sequential counter inside each message being sent. Sometimes the messages are
received out of order, for instance:
{code}
#1198
#1199
#1200
#1202
#1201
#1203
#1204
{code}
If we replace {{NAKACK2}} by {{NAKACK}} the problem disappears -- everything works as
expected (FIFO).
If we replace JGroups-based transport by ZeroMQ-based transport (actually running over
EPGM and being used for a year) everything works as expected (FIFO) -- just to let you
know, that there are no bugs in out message numbering logic.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira