[JBoss JIRA] (JGRP-1901) GMS: view installation by consensus
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-1901?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-1901:
---------------------------
Fix Version/s: 3.6.2
(was: 3.6.1)
> GMS: view installation by consensus
> -----------------------------------
>
> Key: JGRP-1901
> URL: https://issues.jboss.org/browse/JGRP-1901
> Project: JGroups
> Issue Type: Feature Request
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.6.2
>
>
> Investigate whether view installation should optionally be done via 2PC. Example:
> * View is A,B,C,D, splits into A,B and C,D
> * Before AB is installed in A and B, view A,B,C is installed
> ** (C hasn't been suspected yet. This can happen with {{FD}})
> Infinispan's rebalancing algorithm has problems with this, as it tries to assign state to C which however isn't reachable from the A,B side of the network partition. It would be better if A,B,C,D went directly to A,B and C,D
> Investigate whether we should add a property to {{GMS}} which defines whether to use 2PC for view installation (default would be {{false}}). The algorithm would work as follows:
> * Send a {{PREPARE_VIEW(view)}} message
> * When all responses have been received -> send a {{COMMIT_VIEW}} message
> * Else
> ** Inject SUSPECT(C) event for all misisng acks OR
> ** Set a timer to go off in N ms -> when it fires, send the {{COMMIT_VIEW}} msg
> ** If another view is installed before the tmer goes off (e.g. A,B) -> kill the timer
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (JGRP-1897) ENCRYPT might drop messages during key change
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-1897?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-1897:
---------------------------
Fix Version/s: 3.6.2
(was: 3.6.1)
> ENCRYPT might drop messages during key change
> ---------------------------------------------
>
> Key: JGRP-1897
> URL: https://issues.jboss.org/browse/JGRP-1897
> Project: JGroups
> Issue Type: Bug
> Reporter: Tero Leppikangas
> Assignee: Bela Ban
> Fix For: 3.6.2
>
>
> ENCRYPT might drop some (unicast) messages encrypted with unknown key if the delivery of new view is delayed.
> This problem was noticed while doing some stress testing on the fix for JGRP-1893.
> When view changes, coordinator multicasts the new view after which is starts using new symmetric keys. If some node receives a message sent with the new key before the new view is received, the received message will be dropped since it cannot be decrypted.
> We thought of possible solutions to be.
> 1. Sender specific queue holding the messages received.
> 2. Starting to queue up messages until new view has been received
> I have implemented the second option which is quite straightforward, but it could lead into problems when receiving message with unknown key that is not related to coming view change.
> I wonder if there is another way to overcome this problem?
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (JGRP-1902) Simplify failure detection and merge timeout configuration
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-1902?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-1902:
---------------------------
Fix Version/s: 3.6.2
(was: 3.6.1)
> Simplify failure detection and merge timeout configuration
> ----------------------------------------------------------
>
> Key: JGRP-1902
> URL: https://issues.jboss.org/browse/JGRP-1902
> Project: JGroups
> Issue Type: Enhancement
> Affects Versions: 3.6
> Reporter: Dan Berindei
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 3.6.2, 4.0
>
>
> FD/FD_ALL/FD_ALL2/FD_SOCK javadoc doesn't give any guidance as to how long it would take to detect a leaving member. MERGE2/MERGE3 javadoc also doesn't say how much it would take to detect that the network has healed.
> For an example of how misleading the current settings can be, I have seen MERGE3 take more than 20s to merge two partitions with min_interval=1000 and max_interval=5000. FD also detects a leaver after {{timeout * max_tries}} in the best case, and twice that if 2 consecutive nodes (in the members list) leave at the same time.
> The maximum time it takes to detect a leaver is of particular interest to Infinispan users, because Infinispan is supposed to protect against nodes leaving. But if the users don't configure a high enough RPC timeout in Infinispan, we don't get to detect the node leaving.
> Ideally, the user should be able to specify a maximum detection time, and the protocol should adjust the existing settings to meet that (most of the time).
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month