[jboss-jira] [JBoss JIRA] (JGRP-2297) Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully

Radoslav Husar (JIRA) issues at jboss.org
Tue Sep 25 05:25:00 EDT 2018


     [ https://issues.jboss.org/browse/JGRP-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Husar updated JGRP-2297:
---------------------------------
    Description: 
The {{ASYM_ENCRYPT_LeaveTest}} is designed to test graceful leaving coordinator(s) with ASYM_ENCRYPT in the stack. However, the test currently passes due to presence of MERGE3 in the stack. While the intention of the test seems to be testing graceful leaving of coordinator(s), the cluster ends up with inconsistent views later resolved by MERGE3.

Here is a run of the test with a single coordinator leaving: 
https://gist.github.com/rhusar/89172882fae60a1f29327c33f2d124db

The problem seems to be with coordinating of key exchange. In this run, roughly:

1. node 1 is leaving
2. node 2 becomes coordinator and key server

{noformat}
10:55:18.286 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: installing view [2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10]
...
10:55:18.299 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: I'm the new key server
10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
...
10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
{noformat}

3. node 9 receives {{FETCH_SECRET_KEY}} however receives stale key? looks like it still contacts the leaving coordinator node 1?

{noformat}
10:55:18.319 [SSL_KEY_EXCHANGE-runner-12,ASYM_ENCRYPT_LeaveTest,1] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 1: accepted SSL connection from /127.0.0.1:51812; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
...
10:55:18.319 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2157); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AF7916A9394F49B085D4F35C4F5A0A3E)
10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: ignoring secret key received from key exchange protocol (version: AF7916A9394F49B085D4F35C4F5A0A3E), as it has already been installed
{noformat}

4. new coordinator fails to collect all acks (since it cannot decipher stale key?)

{noformat}
10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] WARN  org.jgroups.protocols.pbcast.GMS - 2: failed to collect all ACKs (expected=8) for view [2|10] after 2000ms, missing 1 ACKs from (1) 9
{noformat}

5. node 9 eventually obtains the key but since it has stale view and still thinks node 1 is coordinator? and fails to contact it 

{noformat}
10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: asking key exchange protocol to get secret key from 2
10:55:20.322 [SSL_KEY_EXCHANGE-runner-26,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 2: accepted SSL connection from /127.0.0.1:51829; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2158); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: installing secret key received from key exchange protocol (version: AB1E6F44DE947D792A7D05D2E957AC85)
10:55:23.341 [TQ-Bundler-10,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.TCP - JGRP000034: 9: failure sending message to 1: java.net.ConnectException: Connection refused (Connection refused)
{noformat}

6. cluster is later fixed with MERGE3

{noformat}
10:55:27.103 [jgroups-27,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: I will be the merge leader. Starting the merge task. Views: {2=[2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10], 9=[1|9] (10) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
{noformat}

Another run with MERGE3 omitted from the stack is here:
https://gist.github.com/rhusar/b51aeee03485a607041f9669bbc6e707

Further investigation is ongoing, but this might be related to graceful leaving of coordinator JGRP-2293 exacerbating the problem with key exchange in ASYM_ENCRYPT.

Scaling down is typical cloud workflow, moreover encryption, especially ASYM_ENCRYPT is the recommended setup making this problem critical. 


  was:
The {{ASYM_ENCRYPT_LeaveTest}} is designed to test graceful leaving coordinator(s) with ASYM_ENCRYPT in the stack. However, the test currently passes due to presence of MERGE3 in the stack. While the intention of the test seems to be testing graceful leaving of coordinator(s), the cluster ends up with inconsistent views later resolved by MERGE3.

Here is a run of the test with a single coordinator leaving: 
https://gist.github.com/rhusar/89172882fae60a1f29327c33f2d124db

The problem seems to be with coordinating of key exchange. In this run, roughly:

1. node 1 is leaving
2. node 2 becomes coordinator and key server

{noformat}
10:55:18.286 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: installing view [2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10]
...
10:55:18.299 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: I'm the new key server
10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
...
10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
{noformat}

3. node 9 receives {{FETCH_SECRET_KEY}} however receives stale key (?) because it still contacts the leaving coordinator node 1 (?)

{noformat}
10:55:18.319 [SSL_KEY_EXCHANGE-runner-12,ASYM_ENCRYPT_LeaveTest,1] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 1: accepted SSL connection from /127.0.0.1:51812; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
...
10:55:18.319 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2157); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AF7916A9394F49B085D4F35C4F5A0A3E)
10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: ignoring secret key received from key exchange protocol (version: AF7916A9394F49B085D4F35C4F5A0A3E), as it has already been installed
{noformat}

4. new coordinator failes to collect all acks (since it cannot decipher stale key?)

{noformat}
10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] WARN  org.jgroups.protocols.pbcast.GMS - 2: failed to collect all ACKs (expected=8) for view [2|10] after 2000ms, missing 1 ACKs from (1) 9
{noformat}

5. node 9 eventually obtains the key but since it has stale view and still thinks node 1 is coordinator and fails to contact it 

{noformat}
10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: asking key exchange protocol to get secret key from 2
10:55:20.322 [SSL_KEY_EXCHANGE-runner-26,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 2: accepted SSL connection from /127.0.0.1:51829; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2158); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: installing secret key received from key exchange protocol (version: AB1E6F44DE947D792A7D05D2E957AC85)
10:55:23.341 [TQ-Bundler-10,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.TCP - JGRP000034: 9: failure sending message to 1: java.net.ConnectException: Connection refused (Connection refused)
{noformat}

6. cluster is later fixed with MERGE3

{noformat}
10:55:27.103 [jgroups-27,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: I will be the merge leader. Starting the merge task. Views: {2=[2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10], 9=[1|9] (10) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
{noformat}

Another run with MERGE3 omitted from the stack is here:
https://gist.github.com/rhusar/b51aeee03485a607041f9669bbc6e707

Further investigation is ongoing, but this might be related to graceful leaving of coordinator JGRP-2293 exacerbating the problem with key exchange in ASYM_ENCRYPT.

Scaling down is typical cloud workflow, moreover encryption, especially ASYM_ENCRYPT is the recommended setup making this problem critical. 




> Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully
> --------------------------------------------------------------------
>
>                 Key: JGRP-2297
>                 URL: https://issues.jboss.org/browse/JGRP-2297
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.14
>            Reporter: Radoslav Husar
>            Assignee: Bela Ban
>            Priority: Blocker
>             Fix For: 4.0.15
>
>
> The {{ASYM_ENCRYPT_LeaveTest}} is designed to test graceful leaving coordinator(s) with ASYM_ENCRYPT in the stack. However, the test currently passes due to presence of MERGE3 in the stack. While the intention of the test seems to be testing graceful leaving of coordinator(s), the cluster ends up with inconsistent views later resolved by MERGE3.
> Here is a run of the test with a single coordinator leaving: 
> https://gist.github.com/rhusar/89172882fae60a1f29327c33f2d124db
> The problem seems to be with coordinating of key exchange. In this run, roughly:
> 1. node 1 is leaving
> 2. node 2 becomes coordinator and key server
> {noformat}
> 10:55:18.286 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: installing view [2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10]
> ...
> 10:55:18.299 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: I'm the new key server
> 10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> ...
> 10:55:18.300 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 2: created new secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> {noformat}
> 3. node 9 receives {{FETCH_SECRET_KEY}} however receives stale key? looks like it still contacts the leaving coordinator node 1?
> {noformat}
> 10:55:18.319 [SSL_KEY_EXCHANGE-runner-12,ASYM_ENCRYPT_LeaveTest,1] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 1: accepted SSL connection from /127.0.0.1:51812; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> ...
> 10:55:18.319 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2157); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AF7916A9394F49B085D4F35C4F5A0A3E)
> 10:55:18.321 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: ignoring secret key received from key exchange protocol (version: AF7916A9394F49B085D4F35C4F5A0A3E), as it has already been installed
> {noformat}
> 4. new coordinator fails to collect all acks (since it cannot decipher stale key?)
> {noformat}
> 10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,2] WARN  org.jgroups.protocols.pbcast.GMS - 2: failed to collect all ACKs (expected=8) for view [2|10] after 2000ms, missing 1 ACKs from (1) 9
> {noformat}
> 5. node 9 eventually obtains the key but since it has stale view and still thinks node 1 is coordinator? and fails to contact it 
> {noformat}
> 10:55:20.307 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: asking key exchange protocol to get secret key from 2
> 10:55:20.322 [SSL_KEY_EXCHANGE-runner-26,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 2: accepted SSL connection from /127.0.0.1:51829; protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: created SSL connection to 2 (/127.0.0.1:2158); protocol: TLSv1, cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.SSL_KEY_EXCHANGE - 9: sending up secret key (version: AB1E6F44DE947D792A7D05D2E957AC85)
> 10:55:20.322 [jgroups-3,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.ASYM_ENCRYPT - 9: installing secret key received from key exchange protocol (version: AB1E6F44DE947D792A7D05D2E957AC85)
> 10:55:23.341 [TQ-Bundler-10,ASYM_ENCRYPT_LeaveTest,9] DEBUG org.jgroups.protocols.TCP - JGRP000034: 9: failure sending message to 1: java.net.ConnectException: Connection refused (Connection refused)
> {noformat}
> 6. cluster is later fixed with MERGE3
> {noformat}
> 10:55:27.103 [jgroups-27,ASYM_ENCRYPT_LeaveTest,2] DEBUG org.jgroups.protocols.pbcast.GMS - 2: I will be the merge leader. Starting the merge task. Views: {2=[2|10] (9) [2, 3, 4, 5, 6, 7, 8, 9, 10], 9=[1|9] (10) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
> {noformat}
> Another run with MERGE3 omitted from the stack is here:
> https://gist.github.com/rhusar/b51aeee03485a607041f9669bbc6e707
> Further investigation is ongoing, but this might be related to graceful leaving of coordinator JGRP-2293 exacerbating the problem with key exchange in ASYM_ENCRYPT.
> Scaling down is typical cloud workflow, moreover encryption, especially ASYM_ENCRYPT is the recommended setup making this problem critical. 



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jboss-jira mailing list