[JBoss JIRA] (WFCORE-3404) Performance issue in audit endpoints
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3404?page=com.atlassian.jira.plugi... ]
Darran Lofthouse commented on WFCORE-3404:
------------------------------------------
I have dropped this to 'Critical' this is not in relation to any new code in WildFly 12.
> Performance issue in audit endpoints
> ------------------------------------
>
> Key: WFCORE-3404
> URL: https://issues.jboss.org/browse/WFCORE-3404
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Affects Versions: 4.0.0.Alpha2
> Reporter: Josef Cacek
> Assignee: Jan Kalina
> Priority: Critical
> Attachments: elytron-flightrecording.jfr
>
>
> Synchronization in audit endpoint implementations leads to a great performance drop.
> We see the issue in {{FileAuditEndpoint.accept(EventPriority, String)}}, but similar code is also in the {{SyslogAuditEndpoint}}.
> Check the attached recording (FlightRecorder) in jmc (Java Mission Control).
> It can be seen in these tabs
> - Threads > Contention
> - Threads > Thread Dumps
> - Threads > Lock Instances
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (WFCORE-3404) Performance issue in audit endpoints
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3404?page=com.atlassian.jira.plugi... ]
Darran Lofthouse updated WFCORE-3404:
-------------------------------------
Priority: Critical (was: Blocker)
> Performance issue in audit endpoints
> ------------------------------------
>
> Key: WFCORE-3404
> URL: https://issues.jboss.org/browse/WFCORE-3404
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Affects Versions: 4.0.0.Alpha2
> Reporter: Josef Cacek
> Assignee: Jan Kalina
> Priority: Critical
> Attachments: elytron-flightrecording.jfr
>
>
> Synchronization in audit endpoint implementations leads to a great performance drop.
> We see the issue in {{FileAuditEndpoint.accept(EventPriority, String)}}, but similar code is also in the {{SyslogAuditEndpoint}}.
> Check the attached recording (FlightRecorder) in jmc (Java Mission Control).
> It can be seen in these tabs
> - Threads > Contention
> - Threads > Thread Dumps
> - Threads > Lock Instances
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (WFCORE-3522) Module dependency order for a deployment is wrong
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3522?page=com.atlassian.jira.plugi... ]
Kabir Khan commented on WFCORE-3522:
------------------------------------
[~dmlloyd] assigning to you since it looks like you already started looking at this :)
> Module dependency order for a deployment is wrong
> -------------------------------------------------
>
> Key: WFCORE-3522
> URL: https://issues.jboss.org/browse/WFCORE-3522
> Project: WildFly Core
> Issue Type: Bug
> Affects Versions: 3.0.8.Final
> Reporter: Ken Finnigan
> Assignee: David Lloyd
> Priority: Blocker
>
> Within WildFly Swarm we need the ability to override the `javax.ws.rs.client.ClientBuilder` service file for a deployment to provide custom handling for implementing Eclipse MicroProfile.
> With a service file present in the deployment, the existing RESTEasy service file is always found first because the deployment module is added as a dependency after other non local dependencies. See https://github.com/wildfly/wildfly-core/blob/master/server/src/main/java/...
> After discussing with [~dmlloyd] on IRC, he agreed this was a bug and a better solution was for the deployment's dependency to always be first if `isLocalLast()` is false, and instead remove any spec packages that might be present on the deployment path instead.
> He explained the purpose of this ordering was to prevent a user deploying `javax.servlet.api` and breaking things, but it appears this can be achieved by alternative methods.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (WFCORE-3522) Module dependency order for a deployment is wrong
by Kabir Khan (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3522?page=com.atlassian.jira.plugi... ]
Kabir Khan reassigned WFCORE-3522:
----------------------------------
Assignee: David Lloyd
> Module dependency order for a deployment is wrong
> -------------------------------------------------
>
> Key: WFCORE-3522
> URL: https://issues.jboss.org/browse/WFCORE-3522
> Project: WildFly Core
> Issue Type: Bug
> Affects Versions: 3.0.8.Final
> Reporter: Ken Finnigan
> Assignee: David Lloyd
> Priority: Blocker
>
> Within WildFly Swarm we need the ability to override the `javax.ws.rs.client.ClientBuilder` service file for a deployment to provide custom handling for implementing Eclipse MicroProfile.
> With a service file present in the deployment, the existing RESTEasy service file is always found first because the deployment module is added as a dependency after other non local dependencies. See https://github.com/wildfly/wildfly-core/blob/master/server/src/main/java/...
> After discussing with [~dmlloyd] on IRC, he agreed this was a bug and a better solution was for the deployment's dependency to always be first if `isLocalLast()` is false, and instead remove any spec packages that might be present on the deployment path instead.
> He explained the purpose of this ordering was to prevent a user deploying `javax.servlet.api` and breaking things, but it appears this can be achieved by alternative methods.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2239?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2239:
--------------------------------
Using the attached {{asym-encrypt.xml}} and nodes A,B,C,D, and injecting a split brain works all the time:
{noformat}
probe.sh op=INJECT_VIEW.injectView["A=A/B;B=A/B;C=C/D;D=C/D"]
{noformat}
Note that this requires {{INJECT_VIEW}} which was added in 4.0.10.Final.
> AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
> ------------------------------------------------------------------
>
> Key: JGRP-2239
> URL: https://issues.jboss.org/browse/JGRP-2239
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.6
> Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
> Reporter: Boris Sh
> Assignee: Bela Ban
> Fix For: 4.0.11
>
> Attachments: asym-encrypt.xml
>
>
> Hello,
> I am using the following configuration:
> {code:java}
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <UDP />
> <PING />
> <MERGE3 />
> <FD />
> <VERIFY_SUSPECT />
> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
> sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
> asym_algorithm="RSA" />
> <pbcast.NAKACK2 />
> <UNICAST3 />
> <pbcast.STABLE />
> <FRAG2 />
> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
> keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
> cipher_type="RSA" />
> <pbcast.GMS />
> </config>
> {code}
> I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
> Initially, when these nodes join the cluster, everything is fine.
> The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
> {code:java}
> org.jgroups.protocols.FD up
> WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-36734
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> {code}
> so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
> In cluster coordinator logs I see the following:
> {code:java}
> INFO: ISPN100000: Node node-26978 joined the cluster
> ****
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> org.jgroups.logging.Slf4jLogImpl error
> ERROR: key requester node-26978 is not in current view [***]; ignoring key request
> org.jgroups.logging.Slf4jLogImpl warn
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
> {code}
> My understanding of what has happened:
> For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
> How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
> Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
> So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2239?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2239.
----------------------------
Resolution: Done
> AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
> ------------------------------------------------------------------
>
> Key: JGRP-2239
> URL: https://issues.jboss.org/browse/JGRP-2239
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.6
> Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
> Reporter: Boris Sh
> Assignee: Bela Ban
> Fix For: 4.0.11
>
> Attachments: asym-encrypt.xml
>
>
> Hello,
> I am using the following configuration:
> {code:java}
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <UDP />
> <PING />
> <MERGE3 />
> <FD />
> <VERIFY_SUSPECT />
> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
> sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
> asym_algorithm="RSA" />
> <pbcast.NAKACK2 />
> <UNICAST3 />
> <pbcast.STABLE />
> <FRAG2 />
> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
> keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
> cipher_type="RSA" />
> <pbcast.GMS />
> </config>
> {code}
> I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
> Initially, when these nodes join the cluster, everything is fine.
> The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
> {code:java}
> org.jgroups.protocols.FD up
> WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-36734
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> {code}
> so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
> In cluster coordinator logs I see the following:
> {code:java}
> INFO: ISPN100000: Node node-26978 joined the cluster
> ****
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> org.jgroups.logging.Slf4jLogImpl error
> ERROR: key requester node-26978 is not in current view [***]; ignoring key request
> org.jgroups.logging.Slf4jLogImpl warn
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
> {code}
> My understanding of what has happened:
> For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
> How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
> Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
> So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2239?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2239:
---------------------------
Attachment: asym-encrypt.xml
> AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
> ------------------------------------------------------------------
>
> Key: JGRP-2239
> URL: https://issues.jboss.org/browse/JGRP-2239
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.6
> Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
> Reporter: Boris Sh
> Assignee: Bela Ban
> Fix For: 4.0.11
>
> Attachments: asym-encrypt.xml
>
>
> Hello,
> I am using the following configuration:
> {code:java}
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <UDP />
> <PING />
> <MERGE3 />
> <FD />
> <VERIFY_SUSPECT />
> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
> sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
> asym_algorithm="RSA" />
> <pbcast.NAKACK2 />
> <UNICAST3 />
> <pbcast.STABLE />
> <FRAG2 />
> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
> keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
> cipher_type="RSA" />
> <pbcast.GMS />
> </config>
> {code}
> I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
> Initially, when these nodes join the cluster, everything is fine.
> The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
> {code:java}
> org.jgroups.protocols.FD up
> WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-36734
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> {code}
> so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
> In cluster coordinator logs I see the following:
> {code:java}
> INFO: ISPN100000: Node node-26978 joined the cluster
> ****
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> org.jgroups.logging.Slf4jLogImpl error
> ERROR: key requester node-26978 is not in current view [***]; ignoring key request
> org.jgroups.logging.Slf4jLogImpl warn
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
> {code}
> My understanding of what has happened:
> For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
> How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
> Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
> So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months
[JBoss JIRA] (JGRP-2239) AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2239?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2239:
--------------------------------
OK, I found out that in some split brain scenarios, we would have multiple keyservers, e.g.: \{A,B\} in A and B and \{C,D\} in C and D, which would have keyservers A and C.
Both A and C would send a {{NEW_KEYSERVER}} message to *all* members, so on a merge A or C could end up with keyserver C!
The fix was to not set {{key_server_addr}} when receiving a {{NEW_KEYSERVER}} message, but only on the view.
> AUTH + ASYM_ENCRYPT causes problem with re-joining cluster (MERGE)
> ------------------------------------------------------------------
>
> Key: JGRP-2239
> URL: https://issues.jboss.org/browse/JGRP-2239
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.6
> Environment: Infinispan 9.1.1 + JGroups 4.0.6.Final + Vert.x 3.5.0
> Reporter: Boris Sh
> Assignee: Bela Ban
> Fix For: 4.0.11
>
>
> Hello,
> I am using the following configuration:
> {code:java}
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
> <UDP />
> <PING />
> <MERGE3 />
> <FD />
> <VERIFY_SUSPECT />
> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128"
> sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048"
> asym_algorithm="RSA" />
> <pbcast.NAKACK2 />
> <UNICAST3 />
> <pbcast.STABLE />
> <FRAG2 />
> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth"
> keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias"
> cipher_type="RSA" />
> <pbcast.GMS />
> </config>
> {code}
> I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
> Initially, when these nodes join the cluster, everything is fine.
> The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
> {code:java}
> org.jgroups.protocols.FD up
> WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-27291
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> WARNING: node-26978: unrecognized cipher; discarding message from node-36734
> org.jgroups.protocols.Encrypt handleEncryptedMessage
> {code}
> so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
> In cluster coordinator logs I see the following:
> {code:java}
> INFO: ISPN100000: Node node-26978 joined the cluster
> ****
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> org.jgroups.logging.Slf4jLogImpl error
> ERROR: key requester node-26978 is not in current view [***]; ignoring key request
> org.jgroups.logging.Slf4jLogImpl warn
> WARN: node-27291: unrecognized cipher; discarding message from node-26978
> INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
> {code}
> My understanding of what has happened:
> For example I have 3 nodes {A, B, C} in the cluster. The cluster gets frozen for some minutes, so node {C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C} ignores that fact. Later, after cluster is up again, it becomes ignoring messages from {C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C} can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
> How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
> Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
> So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years, 3 months