[JBoss JIRA] (ISPN-5046) PartitionHandling: split during commit can leave the cache inconsistent after merge
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5046?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5046:
-----------------------------------------------
Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1171073|https://bugzilla.redhat.com/show_bug.cgi?id=1171073] from NEW to ASSIGNED
> PartitionHandling: split during commit can leave the cache inconsistent after merge
> -----------------------------------------------------------------------------------
>
> Key: ISPN-5046
> URL: https://issues.jboss.org/browse/ISPN-5046
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Beta1
>
>
> Say we have a cluster ABCD; a transaction T was started on A, with B as the primary owner and C the backup owner. B and C both acknowledge the prepare, and the network splits into AB and CD right before A sends the commit command. Eventually A suspects C and D, but the commit still succeeds on B before C and D are suspected. And SuspectExceptions are ignored for commit commands, so the user won't see any error.
> However, C will eventually suspect A and B. When the CD cache topology is installed, it will roll back transaction T. After the merge, both partitions are in degraded mode, so we assume that they both have the latest data and the key is never updated on C.
> From C's point of view, this is very similar to ISPN-3421. The fix should also be similar, we could delay the transaction rollback on C until we get a confirmation from B that T was not committed there. Since B is inaccessible, it will eventually get a SuspectException and the CD cache topology, at which point the cache is in degraded mode and it can wait for a merge. On merge, it should check the status of the transaction on B again, and either commit or rollback based on what B did.
> We also need to suspend the cleanup of completed transactions while the cache is in degraded mode, otherwise C might not find T on B after the merge.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5059) JGroups subsystem doesn't support Vault
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5059?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant reassigned ISPN-5059:
-------------------------------------
Assignee: Tristan Tarrant
> JGroups subsystem doesn't support Vault
> ---------------------------------------
>
> Key: ISPN-5059
> URL: https://issues.jboss.org/browse/ISPN-5059
> Project: Infinispan
> Issue Type: Bug
> Components: Security, Server
> Reporter: Vojtech Juranek
> Assignee: Tristan Tarrant
>
> JGroups subsystem doesn't support passwords encrypted in Vault. E.g. when running [EncryptProtocolIT|https://github.com/infinispan/infinispan/blob/master/se...] with following configuration:
> {noformat}
> <protocol type="ENCRYPT">
> <property name="key_store_name">${jboss.server.config.dir}/server_jceks.keystore</property>
> <property name="store_password">${VAULT::keystore::password::1}</property>
> <property name="alias">memcached</property>
> </protocol>
> {noformat}
> i.e. it uses Vault-encrypted password for keystore, it fails with:
> {noformat}
> groups.channel.clustered: java.lang.Exception: Unable to load keystore infinispan/server/integration/testsuite/target/server/node2/standalone/configuration/server_jceks.keystore: java.io.IOException: Keystore was tampered with, or password was incorrect
> at org.jboss.as.clustering.jgroups.subsystem.ChannelService.start(ChannelService.java:74)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1948) [jboss-msc-1.2.2.Final.jar:1.2.2.Final]
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1881) [jboss-msc-1.2.2.Final.jar:1.2.2.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_55]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_55]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_55]
> Caused by: java.lang.Exception: Unable to load keystore infinispan/server/integration/testsuite/target/server/node2/standalone/configuration/server_jceks.keystore: java.io.IOException: Keystore was tampered with, or password was incorrect
> at org.jgroups.protocols.ENCRYPT.initConfiguredKey(ENCRYPT.java:309)
> at org.jgroups.protocols.ENCRYPT.init(ENCRYPT.java:250)
> at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:860)
> at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:481)
> at org.jgroups.JChannel.init(JChannel.java:848)
> at org.jgroups.JChannel.<init>(JChannel.java:159)
> at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:87)
> at org.jboss.as.clustering.jgroups.subsystem.ChannelService.start(ChannelService.java:69)
> {noformat}
> Vault record for {{keystore::password}} exists:
> {noformat}
> Task: Verify whether a secured attribute exists
> Enter Vault Block:keystore
> Enter Attribute Name:password
> A value exists for (keystore, password)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5059) JGroups subsystem doesn't support Vault
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5059?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5059:
----------------------------------
Status: Open (was: New)
> JGroups subsystem doesn't support Vault
> ---------------------------------------
>
> Key: ISPN-5059
> URL: https://issues.jboss.org/browse/ISPN-5059
> Project: Infinispan
> Issue Type: Bug
> Components: Security, Server
> Reporter: Vojtech Juranek
> Assignee: Tristan Tarrant
>
> JGroups subsystem doesn't support passwords encrypted in Vault. E.g. when running [EncryptProtocolIT|https://github.com/infinispan/infinispan/blob/master/se...] with following configuration:
> {noformat}
> <protocol type="ENCRYPT">
> <property name="key_store_name">${jboss.server.config.dir}/server_jceks.keystore</property>
> <property name="store_password">${VAULT::keystore::password::1}</property>
> <property name="alias">memcached</property>
> </protocol>
> {noformat}
> i.e. it uses Vault-encrypted password for keystore, it fails with:
> {noformat}
> groups.channel.clustered: java.lang.Exception: Unable to load keystore infinispan/server/integration/testsuite/target/server/node2/standalone/configuration/server_jceks.keystore: java.io.IOException: Keystore was tampered with, or password was incorrect
> at org.jboss.as.clustering.jgroups.subsystem.ChannelService.start(ChannelService.java:74)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1948) [jboss-msc-1.2.2.Final.jar:1.2.2.Final]
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1881) [jboss-msc-1.2.2.Final.jar:1.2.2.Final]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_55]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_55]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_55]
> Caused by: java.lang.Exception: Unable to load keystore infinispan/server/integration/testsuite/target/server/node2/standalone/configuration/server_jceks.keystore: java.io.IOException: Keystore was tampered with, or password was incorrect
> at org.jgroups.protocols.ENCRYPT.initConfiguredKey(ENCRYPT.java:309)
> at org.jgroups.protocols.ENCRYPT.init(ENCRYPT.java:250)
> at org.jgroups.stack.ProtocolStack.initProtocolStack(ProtocolStack.java:860)
> at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:481)
> at org.jgroups.JChannel.init(JChannel.java:848)
> at org.jgroups.JChannel.<init>(JChannel.java:159)
> at org.jboss.as.clustering.jgroups.JChannelFactory.createChannel(JChannelFactory.java:87)
> at org.jboss.as.clustering.jgroups.subsystem.ChannelService.start(ChannelService.java:69)
> {noformat}
> Vault record for {{keystore::password}} exists:
> {noformat}
> Task: Verify whether a secured attribute exists
> Enter Vault Block:keystore
> Enter Attribute Name:password
> A value exists for (keystore, password)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5042) Remote gets caused by writes could be replicated only to the primary owner
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5042?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5042:
-------------------------------
Status: Open (was: New)
> Remote gets caused by writes could be replicated only to the primary owner
> --------------------------------------------------------------------------
>
> Key: ISPN-5042
> URL: https://issues.jboss.org/browse/ISPN-5042
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 7.1.0.Alpha1
> Reporter: Dan Berindei
> Priority: Minor
> Fix For: 7.1.0.Final
>
>
> For write operations that need the previous value, a write CH-only owner that doesn't have a key locally will attempt to retrieve the key from the read CH-owners.
> Sending the remote get command to all the previous owners will create extra load on the cluster during state transfer, so it should be more efficient to send the remote get only to the primary owner. Even though the latency of some write operations will be higher, the average latency should be better.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5042) Remote gets caused by writes could be replicated only to the primary owner
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5042?page=com.atlassian.jira.plugin.... ]
Dan Berindei reassigned ISPN-5042:
----------------------------------
Assignee: Dan Berindei
> Remote gets caused by writes could be replicated only to the primary owner
> --------------------------------------------------------------------------
>
> Key: ISPN-5042
> URL: https://issues.jboss.org/browse/ISPN-5042
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 7.1.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Fix For: 7.1.0.Final
>
>
> For write operations that need the previous value, a write CH-only owner that doesn't have a key locally will attempt to retrieve the key from the read CH-owners.
> Sending the remote get command to all the previous owners will create extra load on the cluster during state transfer, so it should be more efficient to send the remote get only to the primary owner. Even though the latency of some write operations will be higher, the average latency should be better.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4949) Split brain: inconsistent data after merge
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4949?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4949:
------------------------------------
The timeout is 4 minutes by default, and it's expected to always be above the time it takes JGroups to suspect a node. In fact, I might make it a full day just to be sure ;)
> Split brain: inconsistent data after merge
> ------------------------------------------
>
> Key: ISPN-4949
> URL: https://issues.jboss.org/browse/ISPN-4949
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.Alpha1
>
>
> 1) cluster A, B, C, D splits into 2 parts:
> A, B (coord A) finds this out immediately and enters degraded mode with CH [A, B, C, D]
> C, D (coord D) first detects that B is lost, gets view A, C, D and starts rebalance with CH [A, C, D]. Segment X is primary owned by C (it had backup on B but this got lost)
> 2) D detects that A was lost as well, therefore enters degraded mode with CH [A, C, D]
> 3) C inserts entry into X: all owners (only C) is present, therefore the modification is allowed
> 4) cluster is merged and coordinator finds out that the max stable topology has CH [A, B, C, D] (it is the older of the two partitions' topologies, got from A, B) - logs 'No active or unavailable partitions, so all the partitions must be in degraded mode' (yes, all partitions are in degraded mode, but write has happened in the meantime)
> 5) The old CH is broadcast in newest topology, no rebalance happens
> 6) Inconsistency: read in X may miss the update
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5062) Cross site state transfer - incorrect status of push operation
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5062?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5062:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 7.1.0.Final
Resolution: Done
> Cross site state transfer - incorrect status of push operation
> ---------------------------------------------------------------
>
> Key: ISPN-5062
> URL: https://issues.jboss.org/browse/ISPN-5062
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 7.1.0.Alpha1
> Reporter: Matej Čimbora
> Assignee: Pedro Ruivo
> Fix For: 7.1.0.Beta1, 7.1.0.Final
>
>
> Status of push operation remains at value "CANCELLED" after the push operation was cancelled and reinvoked (even if state transfer is currently in progress) - "SENDING" value is expected. Otherwise works as expected (after the state transfer completes, status is switched to "OK").
> - Sites LON (lonCache) - main, BRN (brnCache) - backup
> Consider the following scenario:
> [standalone@localhost:9999 distributed-cache=lonCache] site --push BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --cancelpush BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=CANCELED
> [standalone@localhost:9999 distributed-cache=lonCache] site --push BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=CANCELED
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=OK
> Expected behavior:
> [standalone@localhost:9999 distributed-cache=lonCache] site --push BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --cancelpush BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=CANCELED
> [standalone@localhost:9999 distributed-cache=lonCache] site --push BRN
> ok
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=SENDING
> [standalone@localhost:9999 distributed-cache=lonCache] site --pushstatus
> BRN=OK
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months