[jboss-jira] [JBoss JIRA] (JGRP-2297) Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully
Radoslav Husar (Jira)
issues at jboss.org
Mon Oct 8 13:02:00 EDT 2018
[ https://issues.jboss.org/browse/JGRP-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644746#comment-13644746 ]
Radoslav Husar edited comment on JGRP-2297 at 10/8/18 1:01 PM:
---------------------------------------------------------------
Moreover, the non-coordinator seems to always leave the cluster problematically and is always accompanied by the following ERROR log line (in this case, node3 leaves the cluster with a view [node3, node1])
{code}18:57:38,468 ERROR [org.jgroups.protocols.ASYM_ENCRYPT] (thread-14,ejb,node1) key requester node3 is not in current view [node1|6] (1) [node1]; ignoring key request{code}
I think the fix for both is the same, so probably does not need a specific Jira.
was (Author: rhusar):
Moreover, the non-coordinator seems to always leave the cluster problematically and is always acomanied by the following ERROR log line (in this case, node3 leaves the cluster with a view [node3, node1])
{code}18:57:38,468 ERROR [org.jgroups.protocols.ASYM_ENCRYPT] (thread-14,ejb,node1) key requester node3 is not in current view [node1|6] (1) [node1]; ignoring key request{code}
I think the fix for both is the same, so probably does not need a specific Jira.
> Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully
> --------------------------------------------------------------------
>
> Key: JGRP-2297
> URL: https://issues.jboss.org/browse/JGRP-2297
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.14
> Reporter: Radoslav Husar
> Assignee: Bela Ban
> Priority: Blocker
> Fix For: 4.0.16
>
>
> The {{ASYM_ENCRYPT_LeaveTest}} is designed to test graceful leaving coordinator(s) with ASYM_ENCRYPT in the stack. However, the test currently passes due to presence of MERGE3 in the stack. While the intention of the test seems to be testing graceful leaving of coordinator(s), the cluster ends up with inconsistent views later resolved by MERGE3.
> Here is a run of the test with a modification of the test with a *single* coordinator leaving:
> https://gist.github.com/rhusar/89172882fae60a1f29327c33f2d124db
> The problem seems to be with coordinating of key exchange. In this run, roughly:
> 1. node 1 is leaving
> 2. node 2 becomes coordinator and key server
> {noformat}
> 10:55:18.286 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: installing view [2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10]
> ...
> 10:55:18.299 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: I'm the new key server
> 10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> ...
> 10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> {noformat}
> 3. node 9 receives {{FETCH_SECRET_KEY}} however receives stale key? looks like it still contacts the leaving coordinator node 1?
> {noformat}
> 10:55:18.319 [SSL_KEY_EXCHANGE-runner-12,ASYM_ENCRYPT_LeaveTest,1] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 1: accepted SSL connection from /127.0.0.1:51812; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> ...
> 10:55:18.319 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2157); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AF7916A9394F49B085D4F35C4F5A0A3E)
> 10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: ignoring secret key received from key exchange protocol (version: AF7916A9394F49B085D4F35C4F5A0A3E), as it has already been installed
> {noformat}
> 4. new coordinator fails to collect all acks (since it cannot decipher stale key?)
> {noformat}
> 10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] WARN org.jgroups.protocols.pbcast.GMS - 2: failed to collect all ACKs (expected=8) for view [2|10] after 2000ms, missing 1 ACKs from (1) 9
> {noformat}
> 5. node 9 eventually obtains the key but since it has stale view and still thinks node 1 is coordinator? and fails to contact it
> {noformat}
> 10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: asking key exchange protocol to get secret key from 2
> 10:55:20.322 [SSL_KEY_EXCHANGE-runner-26,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 2: accepted SSL connection from /127.0.0.1:51829; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2158); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: installing secret key received from key exchange protocol (version: AB1E6F44DE947D792A7D05D2E957AC85)
> 10:55:23.341 [TQ-Bundler-10,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.TCP - JGRP000034: 9: failure sending message to 1: java.net.ConnectException: Connection refused (Connection refused)
> {noformat}
> 6. cluster is later healed with MERGE3
> {noformat}
> 10:55:27.103 [jgroups-27,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: I will be the merge leader. Starting the merge task. Views: {2=[2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10], 9=[1|9] (10) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> {noformat}
> Another run with MERGE3 omitted from the stack is here:
> https://gist.github.com/rhusar/b51aeee03485a607041f9669bbc6e707
> Further investigation is ongoing, but this might be related to graceful leaving of coordinator JGRP-2293 exacerbating the problem with key exchange in ASYM_ENCRYPT.
> Scaling down is typical cloud workflow, especially with encryption since {{ASYM_ENCRYPT}} is the recommended setup making this problem critical.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the jboss-jira
mailing list