[JBoss JIRA] (ISPN-5857) HotRod client should use the consistent hash in replicated mode
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5857?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5857:
-----------------------------------------------
Sebastian Łaskawiec <slaskawi(a)redhat.com> changed the Status of [bug 1272390|https://bugzilla.redhat.com/show_bug.cgi?id=1272390] from MODIFIED to ON_QA
> HotRod client should use the consistent hash in replicated mode
> ---------------------------------------------------------------
>
> Key: ISPN-5857
> URL: https://issues.jboss.org/browse/ISPN-5857
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 8.0.1.Final, 8.1.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: hotrod
> Fix For: 8.1.0.Alpha2, 8.1.0.Final
>
>
> The HotRod server assumes that in replicated mode the server accessed by the client doesn't matter, so it doesn't send a consistent hash in the topology updates.
> However, since replicated mode is now implemented on top of distributed mode, hitting the primary owner or another node makes a big difference. We should change the server to send the consistent hash for replicated caches as well.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5885) NonTotalOrderPerCacheInvocationHandlerImpl should release locks
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5885?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5885:
------------------------------
Fix Version/s: (was: 8.0.3.Final)
> NonTotalOrderPerCacheInvocationHandlerImpl should release locks
> ---------------------------------------------------------------
>
> Key: ISPN-5885
> URL: https://issues.jboss.org/browse/ISPN-5885
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.0.1.Final, 8.1.0.Alpha2
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Labels: testsuite_stability
> Fix For: 8.1.0.Final
>
> Attachments: ThreeNodesReplicatedSplitAndMergeTest.log.zip
>
>
> Traditionally, the locking interceptor is the one that releases key locks. But after the ISPN-2849 fix, the locks are acquired by {{NonTotalOrderPerCacheInvocationHandlerImpl}} before the interceptor chain even starts executing. If there's an exception between the lock acquisition and the locking interceptor, the locks will never be released.
> There is currently one situation where I have reproduced this, in {{ThreeNodesReplicatedSplitAndMergeTest.testSplitAndMerge0}}. Node {{C}} is split in a partition by itself, and then it is merged with the {{AB}} partition. After the merge, JGroups sometimes replays the put command broadcasted in the {{AB}} partition on {{C}}. {{C}}'s cache is still in degraded mode, so the write fails, but its lock is never released.
> {noformat}
> 12:53:41,088 INFO (testng-ThreeNodesReplicatedSplitAndMergeTest:[]) [JGroupsTransport] ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[NodeA-44116|10] (3) [NodeA-44116, NodeB-58097, NodeC-3920], 2 subgroups: [NodeA-44116|8] (2) [NodeA-44116, NodeB-58097], [NodeC-3920|9] (1) [NodeC-3920]
> 12:53:42,143 TRACE (OOB-1,NodeC-3920:[]) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}} [sender=NodeB-58097]
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [DefaultLockManager] Lock key=MagicKey#null{ced8b1f2@NodeC-3920/9} for owner=CommandUUID{address=NodeB-58097, id=48899}. timeout=10000 (MILLISECONDS)
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [InfinispanLock] LockPlaceHolder{lockState=ACQUIRED, owner=CommandUUID{address=NodeB-58097, id=48899}} successfully acquired the lock.
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@67bb1b50]
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Checking availability for key=MagicKey#null{ced8b1f2@NodeC-3920/9}, status=DEGRADED_MODE
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Partition is in DEGRADED_MODE mode, access is not allowed for key MagicKey#null{ced8b1f2@NodeC-3920/9}
> 12:53:42,144 ERROR (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] ISPN000136: Execution error
> org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'MagicKey#null{ced8b1f2@NodeC-3920/9}' is not available. Not all owners are in this partition
> 12:53:42,144 WARN (remote-thread-NodeC-p36993-t5:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}}
> {noformat}
> Of course, knowing that {{C}} is in degraded mode, perhaps we should not try to acquire the lock at all. But then we'd move even more responsibility into the invocation handler. There's also the possibility of having exceptions caused by -topology changes or- programming errors, so a solution that handles any exception would be preferable.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5962) Implementation of RpcManagerImpl.invokeRemotely causes high CPU usage
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5962?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5962:
------------------------------
Fix Version/s: 8.1.0.Final
> Implementation of RpcManagerImpl.invokeRemotely causes high CPU usage
> ---------------------------------------------------------------------
>
> Key: ISPN-5962
> URL: https://issues.jboss.org/browse/ISPN-5962
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 8.1.0.Beta1
> Reporter: Sanne Grinovero
> Assignee: Pedro Ruivo
> Fix For: 8.1.0.Final
>
> Attachments: CompletableFutureBenchmarks.java
>
>
> The method {{org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(Collection<Address>, ReplicableCommand, RpcOptions)}} is implemented by blocking on a {{CompletableFuture}}, but this then stalls on {{java.util.concurrent.CompletableFuture.waitingGet(boolean)}} by spending a significant amount of CPU time by spinning.
> When implementing RPC calls and having to wait for remote operations, spinning is probably not a good idea. Could we try implementing this in some way to hint towards a more pessimistic lock?
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5493) SiteProviderTopologyChangeTest.testXSiteSTDuringLeave random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5493?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5493:
------------------------------
Fix Version/s: (was: 8.1.0.Final)
> SiteProviderTopologyChangeTest.testXSiteSTDuringLeave random failures
> ---------------------------------------------------------------------
>
> Key: ISPN-5493
> URL: https://issues.jboss.org/browse/ISPN-5493
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Cross-Site Replication, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: testsuite_failure
> Attachments: SiteProviderTopologyChangeTest.log.gz
>
>
> Looks like the node is killed before the {{XSiteStateTransferControlCommand(START_SEND)}} command was replicated to all the nodes in the source cluster, and the {{SuspectException}} stops the state push:
> {noformat}
> 23:33:22,834 DEBUG (testng-SiteProviderTopologyChangeTest:) [XSiteAdminOperations] Unable to pushState to 'NYC'.java.lang.Exception: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: Suspected member: SiteProviderTopologyChangeTest-NodeBG-25313
> at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:151)
> at org.infinispan.xsite.XSiteAdminOperations.pushState(XSiteAdminOperations.java:238)
> at org.infinispan.xsite.statetransfer.failures.AbstractTopologyChangeTest.startStateTransfer(AbstractTopologyChangeTest.java:144)
> at org.infinispan.xsite.statetransfer.failures.SiteProviderTopologyChangeTest.doXSiteStateTransferDuringTopologyChange(SiteProviderTopologyChangeTest.java:241)
> at org.infinispan.xsite.statetransfer.failures.SiteProviderTopologyChangeTest.testXSiteSTDuringLeave(SiteProviderTopologyChangeTest.java:78)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: Suspected member: SiteProviderTopologyChangeTest-NodeBG-25313
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:77)
> at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.invokeRemotelyInLocalSite(XSiteStateTransferManagerImpl.java:376)
> at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.controlStateTransferOnLocalSite(XSiteStateTransferManagerImpl.java:335)
> at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:142)
> ... 24 more
> Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: Suspected member: SiteProviderTopologyChangeTest-NodeBG-25313
> at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:74)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:586)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:287)
> at org.infinispan.remoting.rpc.RpcManagerImpl$2.call(RpcManagerImpl.java:382)
> at org.infinispan.remoting.rpc.RpcManagerImpl$2.call(RpcManagerImpl.java:378)
> ... 4 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5021) Nodes that finish the rebalance later can see outdated values
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5021?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5021:
------------------------------
Fix Version/s: (was: 8.1.0.Final)
> Nodes that finish the rebalance later can see outdated values
> -------------------------------------------------------------
>
> Key: ISPN-5021
> URL: https://issues.jboss.org/browse/ISPN-5021
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.2.Final
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
>
> Copied from [ISPN-4444|https://issues.jboss.org/browse/ISPN-4444?focusedCommentId=1302...]
> If the CH_UPDATE command is delayed on the old owner, the new owners might update the key without the old owner knowing, and a locality check on the old owner won't help.
> I remember one thing that struck me when reading the Raft algorithm was that they install configuration changes symmetrically, in 3 phases. We might need to do the same for our rebalance: start a rebalance with read_ch=old, write_ch=old+new, when the new owners have all the data install read_ch=new, write_ch=old+new, and finally read_ch=new, write_ch=new. Old cache entries are removed during the 2nd topology update, and further writes should be ignored, in order for this to work.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months
[JBoss JIRA] (ISPN-5885) NonTotalOrderPerCacheInvocationHandlerImpl should release locks
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-5885?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-5885:
------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/3874
> NonTotalOrderPerCacheInvocationHandlerImpl should release locks
> ---------------------------------------------------------------
>
> Key: ISPN-5885
> URL: https://issues.jboss.org/browse/ISPN-5885
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.0.1.Final, 8.1.0.Alpha2
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Labels: testsuite_stability
> Fix For: 8.1.0.Final, 8.0.3.Final
>
> Attachments: ThreeNodesReplicatedSplitAndMergeTest.log.zip
>
>
> Traditionally, the locking interceptor is the one that releases key locks. But after the ISPN-2849 fix, the locks are acquired by {{NonTotalOrderPerCacheInvocationHandlerImpl}} before the interceptor chain even starts executing. If there's an exception between the lock acquisition and the locking interceptor, the locks will never be released.
> There is currently one situation where I have reproduced this, in {{ThreeNodesReplicatedSplitAndMergeTest.testSplitAndMerge0}}. Node {{C}} is split in a partition by itself, and then it is merged with the {{AB}} partition. After the merge, JGroups sometimes replays the put command broadcasted in the {{AB}} partition on {{C}}. {{C}}'s cache is still in degraded mode, so the write fails, but its lock is never released.
> {noformat}
> 12:53:41,088 INFO (testng-ThreeNodesReplicatedSplitAndMergeTest:[]) [JGroupsTransport] ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[NodeA-44116|10] (3) [NodeA-44116, NodeB-58097, NodeC-3920], 2 subgroups: [NodeA-44116|8] (2) [NodeA-44116, NodeB-58097], [NodeC-3920|9] (1) [NodeC-3920]
> 12:53:42,143 TRACE (OOB-1,NodeC-3920:[]) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}} [sender=NodeB-58097]
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [DefaultLockManager] Lock key=MagicKey#null{ced8b1f2@NodeC-3920/9} for owner=CommandUUID{address=NodeB-58097, id=48899}. timeout=10000 (MILLISECONDS)
> 12:53:42,144 TRACE (OOB-1,NodeC-3920:[]) [InfinispanLock] LockPlaceHolder{lockState=ACQUIRED, owner=CommandUUID{address=NodeB-58097, id=48899}} successfully acquired the lock.
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@67bb1b50]
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Checking availability for key=MagicKey#null{ced8b1f2@NodeC-3920/9}, status=DEGRADED_MODE
> 12:53:42,144 TRACE (remote-thread-NodeC-p36993-t5:[]) [PartitionHandlingManagerImpl] Partition is in DEGRADED_MODE mode, access is not allowed for key MagicKey#null{ced8b1f2@NodeC-3920/9}
> 12:53:42,144 ERROR (remote-thread-NodeC-p36993-t5:[]) [InvocationContextInterceptor] ISPN000136: Execution error
> org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'MagicKey#null{ced8b1f2@NodeC-3920/9}' is not available. Not all owners are in this partition
> 12:53:42,144 WARN (remote-thread-NodeC-p36993-t5:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=MagicKey#null{ced8b1f2@NodeC-3920/9}, value=v22, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}}
> {noformat}
> Of course, knowing that {{C}} is in degraded mode, perhaps we should not try to acquire the lock at all. But then we'd move even more responsibility into the invocation handler. There's also the possibility of having exceptions caused by -topology changes or- programming errors, so a solution that handles any exception would be preferable.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 5 months