[jboss-jira] [JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
Bela Ban (JIRA)
issues at jboss.org
Mon Feb 5 11:13:00 EST 2018
[ https://issues.jboss.org/browse/JGRP-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bela Ban resolved JGRP-2239.
----------------------------
Resolution: Done
> AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
> ------------------------------------------------------------------
>
> Key: JGRP-2239
> URL: https://issues.jboss.org/browse/JGRP-2239
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.6
> Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
> Reporter: Boris Sh
> Assignee: Bela Ban
> Fix For: 4.0.11
>
> Attachments: asym-encrypt.xml
>
>
> Hello,
> I am using the following configuration:
> {code:java}
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <UDP />
> <PING />
> <MERGE3 />
> <FD />
> <VERIFY_SUSPECT />
> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
> sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
> asym_algorithm="RSA" />
> <pbcast.NAKACK2 />
> <UNICAST3 />
> <pbcast.STABLE />
> <FRAG2 />
> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
> keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
> cipher_type="RSA" />
> <pbcast.GMS />
> </config>
> {code}
> I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
> Initially, when these nodes join the cluster, everything is fine.
> The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
> {code:java}
> org.jgroups.protocols.FD up
> WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-36734
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> {code}
> so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
> In cluster coordinator logs I see the following:
> {code:java}
> INFO: ISPN100000: Node node-26978 joined the cluster
> ****
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> org.jgroups.logging.Slf4jLogImpl error
> ERROR: key requester node-26978 is not in current view [***]; ignoring key request
> org.jgroups.logging.Slf4jLogImpl warn
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
> {code}
> My understanding of what has happened:
> For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
> How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
> Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
> So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list