[JBoss JIRA] Created: (JGRP-659) Merge and UNICAST sequencing problem

[JBoss JIRA] Created:...

[JBoss JIRA] Created: (JBAS-5146)...

Vladimir Blagojevic (JIRA)

Wednesday, 9 January 2008 Wed, 9 Jan '08

5:57 p.m.

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.5, 2.4 Reporter: Vladimir Blagojevic Assigned To: Bela Ban The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Bela Ban (JIRA)

Thursday, 10 January Thu, 10 Jan

1:25 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=all ] Bela Ban updated JGRP-659: -------------------------- Fix Version/s: 2.7

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.4, 2.5, 2.6 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Bela Ban (JIRA)

Wednesday, 5 March Wed, 5 Mar

1:13 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12401355 ] Bela Ban commented on JGRP-659: ------------------------------- If we have JGRP-700 implemented, then this won't occur. But we still need to fix it in case we don't use FLUSH

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Vladimir Blagojevic (JIRA)

Friday, 14 March Fri, 14 Mar

1:46 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12402846 ] Vladimir Blagojevic commented on JGRP-659: ------------------------------------------ Bela, why don't we revert changes made in 348 (trashing connections). JIRA-348 occurs only rarely and will occur almost never now that we consider FD_ALL as a default failure detection instead of FD. If we revert 348 then I believe 659 goes away.

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Troy Schulz (JIRA)

Tuesday, 25 March Tue, 25 Mar

12:09 p.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12404533 ] Troy Schulz commented on JGRP-659: ---------------------------------- The following is a description I posted on the news group that may be helpful resolving the issue: I am currently diagnosing a problem with our application where starting multiple members concurrently fails to properly connect each of the members into a single group. Since our stress test framework is fairly involved and difficult to isolate issues. I started by using the testing our wrapper class used by our application, and once I replicated the issue, I moved to creating an independent eclipse project with none of our application code. This way I could rule out any of our application logic. What I ended up with was a test that starts a single member, waits a few seconds, then spins up X number of add'l members. The test then monitors each of them until they all see each other and have gathered state, or 10*X seconds pass. So, if there are 10 connections, the group has 100 seconds to become stable. In this case, they need to see 11 members (10 plus the coordinator). The test itself tests X= 2, 5, and 10 concurrent connections. 2 has 0% failure rate, 5 has about 40% failure rate, 10 has about 60% failure rate. More details are below, and I can provide the project that I used to replicate the issue, since it has none of our application code in it. The cause of the issue is that in UNICAST, the SenderWindow and ReceiverWindows get out of sync. Sometimes it is from the member to the coordinator, sometimes the other way around. When UNICAST installs a new view, it resets the entry holding the sender and receiver windows. The sequence is something like this: MemberA gets new view and resets connections MemberA sends message(seq=1) to MemberB MemberB expects seq=4(for instance) so drops message(seq=1) MemberB gets new view and resets connections MemberB requests state from MemberA MemberA sends state_message(seq=2) to MemberB MemberB queues state_message(seq=2) and proceeds to wait for message(seq=1) When testing against our application code, MemberA is usually (may be always, not sure) the coordinator of the largest subgroup and MemberB is usually (may be always as well) the new coordinator. Happens when processing a MergeView. This is most likely just the result of the fact that these are the only point-to-point messages being generated within the application logic. With the test in the self contained project, this is more spread out, and not always in response to a getState request. My most recent udp.xml file has Flush enabled, and it seems to help fill the receive window with unprocessed messages. Is there something in the protocol stack we can add/remove to alleviate this problem? Is there perhaps something we have done to inflict this on ourselves with improper message handling or similar? Any insight would be grateful. Tas PS. Environment: I have tried jgroups versions 2.5.0, 2.5.2, and 2.6.2; using the default udp.xml from the respective versions. Each of these versions exhibit the same behavior. With 2.6.2, I then added flush, changed GMS max_bundling time to 250, changed PING's num_initial_members to 2. Again, none of these made any change on the behavior. Using either jdk 1.5 or 1.6 NOTE: Not all permutations of the above were tested. However, since the failures were similar, that is probably not an issue. Logic: I can provide a project if you wish to run the test for yourself, but the gist of the logic is that the member will cache the members when it receives view changes, and if it is not the coordinator it will request state from the coordinator. With merge views, there is additional logic for the member to request state from the coordinator of the largest subgroup, not necessarily from the new view's coordinator. This behavior is how our original implementation is programmed, and so I kept this for the test.

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Troy Schulz (JIRA)

12:13 p.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=all ] Troy Schulz updated JGRP-659: ----------------------------- Attachment: ConcurrentMemberTest.java This is a JUnit test that starts multiple members concurrently, and can be used to demonstrate this issue. It does not fail 100% of the time, but does fail 50+% of the time. The failure usually occurs when multiple subgroups are created initially.

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Troy Schulz (JIRA)

12:15 p.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12404536 ] Troy Schulz commented on JGRP-659: ---------------------------------- As for resolution of the problem without FLUSH, I would just like to throw out an idea: Since the Receiver and Sender windows are directly related to MergeView boundaries, seems like passing the 'version' of the window along with the message would help the receiver identify if it needs to add the message to a new version of the window, rather than add it to the existing window. It would continue to use the old 'version' of the windows until it processes the MergeView. Then when it receives the MergeView it would clean out the old version of the window and then process the messages associated with the new window 'version'. It may also be a good idea to pass the last id of the previous 'version' of the window so that before the old window gets cleaned up, all of its messages are properly processed before moving on to the new version of the windows.

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Bela Ban (JIRA)

Thursday, 24 April Thu, 24 Apr

3:28 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=all ] Bela Ban updated JGRP-659: -------------------------- Fix Version/s: 2.6.3

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7, 2.6.3 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Vladimir Blagojevic (JIRA)

Friday, 25 April Fri, 25 Apr

2:48 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12410660 ] Vladimir Blagojevic commented on JGRP-659: ------------------------------------------ If application observes block notification from FLUSH protocol and stops sending unicasts then solution presented in JGRP-700 will be sufficient for a workaround.

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7, 2.6.3 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Vladimir Blagojevic (JIRA)

2:52 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=all ] Vladimir Blagojevic updated JGRP-659: ------------------------------------- Fix Version/s: (was: 2.6.3)

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Bela Ban (JIRA)

3:10 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=comments#action_12410664 ] Bela Ban commented on JGRP-659: ------------------------------- I like the idea of sending the view ID (VID) with each unicast message. (However, because this changes the serialization format, we can not back port this to 2.6.3). The receivers maintain a hashmap, keyed by VID. The values are receiver windows (Entry with AckReceiverWindow). Upon reception of a message with a new VID, we create a new entry and add the message to the corresponding receiver window. Example: - V2={A,B} and V3={C,D}. - Now a merge occurs with MergeView V5. A and C install V5 in their respective subgroups - At time T, A and B install V5 - At time T+5, C and D install V5 - At time T+2 (*before* {C,D} install V5), B sends a unicast message M1 to C - C's receiver window is keyed by V3 and contains (C:50, D:20). - Now C creates a new entry for M1: V5 with B:2 - At T+5, {C,D} receive MergeView V5 - {C,D} trash all connections in VIDs smaller than V5, so V3's receiver window trashes all connections - C's receiver window for V5 is now A:1, B:2, C:1, D:1 - At T+8, B sends another unicast message M2 to C - C looks up the correct receiver window for V5 (sent with M2) and adds M2 - Now the receiver window at C for V5 is A:1, B:3, C:1, D:1

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.6, 2.4, 2.5 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.7 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Bela Ban (JIRA)

Tuesday, 15 July Tue, 15 Jul

10:48 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ http://jira.jboss.com/jira/browse/JGRP-659?page=all ] Bela Ban updated JGRP-659: -------------------------- Fix Version/s: 2.8 (was: 2.7)

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: http://jira.jboss.com/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.4, 2.5, 2.6 Reporter: Vladimir Blagojevic Assigned To: Bela Ban Fix For: 2.8 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Bela Ban (JIRA)

Wednesday, 14 January Wed, 14 Jan

3:18 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ https://jira.jboss.org/jira/browse/JGRP-659?page=com.atlassian.jira.plugi... ] Bela Ban commented on JGRP-659: ------------------------------- Possible solution: #1: install connections for new members only on a view change #2 trash connections for all members on a merge view #3 Use logical addresses: - A logical address consists of - a user-given (or generated) name, e.g. "X". This name stays with the address from channel creation to channel close - a UUID. This is created on channel connect, deleted on channel disconnect and re-created on channel connect. This is used for equals() and hashCode() of the logical address. Since it is re-created on every connect(), it will prevent reincarnation issues. UUIDs are guaranteed to be unique over time for a given host. This is better than using address:port, because port is not guaranteed to be unique over time, especially when we use bind_port

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: https://jira.jboss.org/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.4, 2.5, 2.6 Reporter: Vladimir Blagojevic Assignee: Bela Ban Fix For: 2.8 Attachments: ConcurrentMemberTest.java The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Bela Ban (JIRA)

Monday, 4 May Mon, 4 May

7:42 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ https://jira.jboss.org/jira/browse/JGRP-659?page=com.atlassian.jira.plugi... ] Bela Ban updated JGRP-659: -------------------------- Attachment: (was: ConcurrentMemberTest.java)

...

Merge and UNICAST sequencing problem ------------------------------------ Key: JGRP-659 URL: https://jira.jboss.org/jira/browse/JGRP-659 Project: JGroups Issue Type: Bug Affects Versions: 2.4, 2.5, 2.6 Reporter: Vladimir Blagojevic Assignee: Bela Ban Fix For: 2.8 The problem is related to trashing of connection table in UNICAST during merge. Consider following scenario: There are 4 nodes in a cluster A,B,C, and D. After network split we have two islands A,B and C,D. When the network healing starts eventually MergeView gets installed in both islands. MergeView installation causes trashing of UNICAST connection table [1]. However if we have a scenario where MergeView gets installed in A,B island at time T and it gets installed in island C,D at time T+N msec and a node from island A,B sends a unicast message in this N msec time window then we'll run into problems with unicast sequencing at C and D. Why? Because next message coming from island A,B into C,D will be will with sequence number > 1 and sequencing in UNICAST of C,D after connection trashing (from merge) expects starting sequence of 1. This causes UNICAST in C and/or D to wait forever for missing messages. Final outcome is thus that no more unicast message coming from A and/or B will ever be delivered at C and/or D! [1]http://jira.jboss.com/jira/browse/JGRP-348

Bela Ban (JIRA)

7:44 a.m.

New subject: [JBoss JIRA] Updated: (JGRP-659) Merge and UNICAST sequencing problem

[ https://jira.jboss.org/jira/browse/JGRP-659?page=com.atlassian.jira.plugi... ] Bela Ban updated JGRP-659: -------------------------- Attachment: ConcurrentMemberTest.java This is the modified file

...

Bela Ban (JIRA)

7:44 a.m.

New subject: [JBoss JIRA] Commented: (JGRP-659) Merge and UNICAST sequencing problem

[ https://jira.jboss.org/jira/browse/JGRP-659?page=com.atlassian.jira.plugi... ] Bela Ban commented on JGRP-659: ------------------------------- The old test file was a bit overwhelming (3 different classes for 1 channel), so I cleaned it up and ported it to use TestNG instead of JUnit 4. The new version passes, so I'm going to close this issue.

...

Bela Ban (JIRA)

7:46 a.m.

New subject: [JBoss JIRA] Resolved: (JGRP-659) Merge and UNICAST sequencing problem

[ https://jira.jboss.org/jira/browse/JGRP-659?page=com.atlassian.jira.plugi... ] Bela Ban resolved JGRP-659. --------------------------- Resolution: Done This issue was fixed by https://jira.jboss.org/jira/browse/JGRP-952 already, the attached test works

...

5733

days inactive

6214

days old

jboss-jira@lists.jboss.org

Manage subscription

16 comments

3 participants

tags (0)

participants (3)

Bela Ban (JIRA)
Troy Schulz (JIRA)
Vladimir Blagojevic (JIRA)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-659) Merge and UNICAST sequencing problem