[jboss-jira] [JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)

Boris Sh (JIRA) issues at jboss.org
Mon Dec 4 02:40:00 EST 2017


Boris Sh created JGRP-2239:
------------------------------

             Summary: AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
                 Key: JGRP-2239
                 URL: https://issues.jboss.org/browse/JGRP-2239
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 4.0.6
         Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
            Reporter: Boris Sh
            Assignee: Bela Ban


Hello,
I am using the following configuration:

{code:java}
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
	<UDP />
	<PING />
	<MERGE3 />
	<FD />
	<VERIFY_SUSPECT />

	<ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
		sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
		asym_algorithm="RSA" />

	<pbcast.NAKACK2 />
	<UNICAST3 />
	<pbcast.STABLE />
	<FRAG2 />
	<AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
		keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
		cipher_type="RSA" />

	<pbcast.GMS />
</config>
{code}

I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.

Initially, when these nodes join the cluster, everything is fine.
The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):


{code:java}
org.jgroups.protocols.FD up
WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
WARNING: node-26978: unrecognized cipher; discarding message from node-27291
org.jgroups.protocols.Encrypt handleEncryptedMessage
WARNING: node-26978: unrecognized cipher; discarding message from node-27291
 org.jgroups.protocols.Encrypt handleEncryptedMessage
WARNING: node-26978: unrecognized cipher; discarding message from node-36734
org.jgroups.protocols.Encrypt handleEncryptedMessage
{code}

so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".

In cluster coordinator logs I see the following:

{code:java}
INFO: ISPN100000: Node node-26978 joined the cluster
****
WARN: node-27291: unrecognized cipher; discarding message from node-26978
org.jgroups.logging.Slf4jLogImpl error
ERROR: key requester  node-26978 is not in current view [***]; ignoring key request
org.jgroups.logging.Slf4jLogImpl warn
WARN: node-27291: unrecognized cipher; discarding message from node-26978

INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]

{code}


My understanding of what has happened:
For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and don't fully understand why this happens.

How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, and the key doesn't change.

Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.

So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jboss-jira mailing list