[JBoss JIRA] (ISPN-4998) Infinispan 7 MapReduce giving inconsistent results
by Vijay Bhoomireddy (JIRA)
[ https://issues.jboss.org/browse/ISPN-4998?page=com.atlassian.jira.plugin.... ]
Vijay Bhoomireddy commented on ISPN-4998:
-----------------------------------------
Can anyone please enlighten me on what would be timeline in general to look into a high priority major / blocker issue?
> Infinispan 7 MapReduce giving inconsistent results
> --------------------------------------------------
>
> Key: ISPN-4998
> URL: https://issues.jboss.org/browse/ISPN-4998
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Final
> Reporter: Vijay Bhoomireddy
> Priority: Blocker
>
> Hi,
> We are using Infinispan Map Reduce for processing our datasets that are spread across nodes in the cluster. We are seeing some surprising results when Infinispan 7 is used. To provide the context, our input data contains 10965 records. When Map Reduce from Infinispan 6.0.2 is used, both Mapper and Reducer are seeing 10965 records and are processing the same. However, with the same code and input data, Infinispan 7 MR gives different results for different invocations of the program. For one run, it gave output as 10902 records and the next run it gave 10872 records. Results seem inconsistent across invocations.
> Not sure is this is an issue with the version7 of the framework. Any help would be greatly appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4998) Infinispan 7 MapReduce giving inconsistent results
by Vijay Bhoomireddy (JIRA)
[ https://issues.jboss.org/browse/ISPN-4998?page=com.atlassian.jira.plugin.... ]
Vijay Bhoomireddy updated ISPN-4998:
------------------------------------
Priority: Blocker (was: Major)
> Infinispan 7 MapReduce giving inconsistent results
> --------------------------------------------------
>
> Key: ISPN-4998
> URL: https://issues.jboss.org/browse/ISPN-4998
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Final
> Reporter: Vijay Bhoomireddy
> Priority: Blocker
>
> Hi,
> We are using Infinispan Map Reduce for processing our datasets that are spread across nodes in the cluster. We are seeing some surprising results when Infinispan 7 is used. To provide the context, our input data contains 10965 records. When Map Reduce from Infinispan 6.0.2 is used, both Mapper and Reducer are seeing 10965 records and are processing the same. However, with the same code and input data, Infinispan 7 MR gives different results for different invocations of the program. For one run, it gave output as 10902 records and the next run it gave 10872 records. Results seem inconsistent across invocations.
> Not sure is this is an issue with the version7 of the framework. Any help would be greatly appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5007) Enhance the distribution script to detect missing artifacts
by Ion Savin (JIRA)
[ https://issues.jboss.org/browse/ISPN-5007?page=com.atlassian.jira.plugin.... ]
Ion Savin updated ISPN-5007:
----------------------------
Status: Open (was: New)
> Enhance the distribution script to detect missing artifacts
> -----------------------------------------------------------
>
> Key: ISPN-5007
> URL: https://issues.jboss.org/browse/ISPN-5007
> Project: Infinispan
> Issue Type: Feature Request
> Components: Build process
> Affects Versions: 7.0.2.Final
> Reporter: Ion Savin
> Assignee: Ion Savin
>
> The build should fail if the maven artifacts to be copied into the distribution jar are not found.
> To reproduce the issue apply this patch (the build will succeed but the distribution .zip will not contain the uberjars):
> {noformat}
> diff --git a/all/pom.xml b/all/pom.xml
> index cecd6c6..779ed97 100644
> --- a/all/pom.xml
> +++ b/all/pom.xml
> @@ -25,7 +25,7 @@
> <executions>
> <execution>
> <id>mrproper</id>
> - <phase>prepare-package</phase>
> + <phase>initialize</phase>
> <goals>
> <goal>clean</goal>
> </goals>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5007) Enhance the distribution script to detect missing artifacts
by Ion Savin (JIRA)
[ https://issues.jboss.org/browse/ISPN-5007?page=com.atlassian.jira.plugin.... ]
Ion Savin updated ISPN-5007:
----------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/3094
> Enhance the distribution script to detect missing artifacts
> -----------------------------------------------------------
>
> Key: ISPN-5007
> URL: https://issues.jboss.org/browse/ISPN-5007
> Project: Infinispan
> Issue Type: Feature Request
> Components: Build process
> Affects Versions: 7.0.2.Final
> Reporter: Ion Savin
> Assignee: Ion Savin
>
> The build should fail if the maven artifacts to be copied into the distribution jar are not found.
> To reproduce the issue apply this patch (the build will succeed but the distribution .zip will not contain the uberjars):
> {noformat}
> diff --git a/all/pom.xml b/all/pom.xml
> index cecd6c6..779ed97 100644
> --- a/all/pom.xml
> +++ b/all/pom.xml
> @@ -25,7 +25,7 @@
> <executions>
> <execution>
> <id>mrproper</id>
> - <phase>prepare-package</phase>
> + <phase>initialize</phase>
> <goals>
> <goal>clean</goal>
> </goals>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-5007) Enhance the distribution script to detect missing artifacts
by Ion Savin (JIRA)
Ion Savin created ISPN-5007:
-------------------------------
Summary: Enhance the distribution script to detect missing artifacts
Key: ISPN-5007
URL: https://issues.jboss.org/browse/ISPN-5007
Project: Infinispan
Issue Type: Feature Request
Components: Build process
Affects Versions: 7.0.2.Final
Reporter: Ion Savin
Assignee: Ion Savin
The build should fail if the maven artifacts to be copied into the distribution jar are not found.
To reproduce the issue apply this patch (the build will succeed but the distribution .zip will not contain the uberjars):
{noformat}
diff --git a/all/pom.xml b/all/pom.xml
index cecd6c6..779ed97 100644
--- a/all/pom.xml
+++ b/all/pom.xml
@@ -25,7 +25,7 @@
<executions>
<execution>
<id>mrproper</id>
- <phase>prepare-package</phase>
+ <phase>initialize</phase>
<goals>
<goal>clean</goal>
</goals>
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4995) ClusteredGet served for non-member of CH
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4995?page=com.atlassian.jira.plugin.... ]
Radim Vansa edited comment on ISPN-4995 at 11/21/14 7:07 PM:
-------------------------------------------------------------
You're right, my argumentation was flawed. I was also mixing PRAM and causality. We need to redefine the 'all processes see updates from one process in the same order as this has executed them' with version for non-atomic updates. My request is this:
* before update is finished, other processes can see both old and new value
* after the update is finished, other processes can see only the new value
* update can start only after previous update was finished
If you'd rather promote eventually consistent system, there should be a way how to find out whether given entry is convergent. Or we could define that each entry converges within some timeout.
Any documentation about cache consistency is missing.
was (Author: rvansa):
You're right, my argumentation was flawed. I was also mixing PRAM and causality. We need to redefine the 'all processes see updates from one process in the same order as this has executed them' with version for non-atomic updates. My request is this:
* before update is finished, other processes can see both old and new value
* after the update is finished, other processes can see only the new value
* update can start only after previous update was finished
If you'd rather promote eventually consistent system, there should be a way how to find out whether given entry is convergent. Or we could define that each entry converges within some timeout.
I think that the documentation about cache consistency is missing.
> ClusteredGet served for non-member of CH
> ----------------------------------------
>
> Key: ISPN-4995
> URL: https://issues.jboss.org/browse/ISPN-4995
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Reporter: Radim Vansa
> Priority: Critical
>
> When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen that when one thread does
> {code}
> put(K1, V1);
> put(K2, V2)
> {code}
> and another gets
> {code}
> get(K2) -> V2
> get(K1) -> V0 (some old value)
> {code}
> edg-perf01, 02 and 03 share this view and topology:
> {code}
> 04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779) edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is {edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
> 04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true, becameCoordinator = false
> 04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new topology = CacheTopology{id=16, rebalanceId=4, currentC
> H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170, edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
> {code}
> Later, edg-perf02 and edg-perf03 get new view and install a new topology, where edg-perf01 does not exist:
> {code}
> 04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
> 04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false, becameCoordinator = false
> 04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117, joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512, owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264], throwable=null, viewId=9}[sender=edg-perf02-45117]
> {code}
> After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only by edg-perf03 - this key servers as K1 in example above. It is not backed up to edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get any new view (this is a log from edg-perf03) :
> {code}
> 04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45) edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0, 169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> {code}
> Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
> Now a merge view with all 4 nodes is installed:
> {code}
> 04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264, edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2) [edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779, edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989], [edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|6] (1) [edg-perf01-63779]
> 04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true, becameCoordinator = false
> {code}
> edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the correct answer (operationId=650):
> {code}
> 04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1) Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is {edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue {value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
> {code}
> However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from local data container as no CH update/rebalance happened so far:
> {code}
> 04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl] (BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB, status=AVAILABLE
> 04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker] (BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411 (key_00000000000020DB)
> 04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1) Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779, edg-perf03-6264, local=true, uncertain=false, container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3: 0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1, isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false, lifespan=-1, maxIdle=-1], segmentId=173
> {code}
> Note that this was found on branch https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4995) ClusteredGet served for non-member of CH
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4995?page=com.atlassian.jira.plugin.... ]
Radim Vansa edited comment on ISPN-4995 at 11/21/14 7:07 PM:
-------------------------------------------------------------
You're right, my argumentation was flawed. I was also mixing PRAM and causality. We need to redefine the 'all processes see updates from one process in the same order as this has executed them' with version for non-atomic updates. My request is this:
* before update is finished, other processes can see both old and new value
* after the update is finished, other processes can see only the new value
* update can start only after previous update was finished
If you'd rather promote eventually consistent system, there should be a way how to find out whether given entry is convergent. Or we could define that each entry converges within some timeout.
I think that the documentation about cache consistency is missing.
was (Author: rvansa):
You're right, my argumentation was flawed. I was also mixing PRAM and causality. We need to redefine the 'all processes see updates from one process in the same order as this has executed them' with version for non-atomic updates. My request is this:
* before update is finished, other processes can see both old and new value
* after the update is finished, other processes can see only the new value
* update can start only after previous update was finished
If you'd rather promote eventually consistent system, there should be a way how to find out whether given entry is convergent.
> ClusteredGet served for non-member of CH
> ----------------------------------------
>
> Key: ISPN-4995
> URL: https://issues.jboss.org/browse/ISPN-4995
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Reporter: Radim Vansa
> Priority: Critical
>
> When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen that when one thread does
> {code}
> put(K1, V1);
> put(K2, V2)
> {code}
> and another gets
> {code}
> get(K2) -> V2
> get(K1) -> V0 (some old value)
> {code}
> edg-perf01, 02 and 03 share this view and topology:
> {code}
> 04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779) edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is {edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
> 04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true, becameCoordinator = false
> 04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new topology = CacheTopology{id=16, rebalanceId=4, currentC
> H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170, edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
> {code}
> Later, edg-perf02 and edg-perf03 get new view and install a new topology, where edg-perf01 does not exist:
> {code}
> 04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
> 04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false, becameCoordinator = false
> 04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117, joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512, owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264], throwable=null, viewId=9}[sender=edg-perf02-45117]
> {code}
> After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only by edg-perf03 - this key servers as K1 in example above. It is not backed up to edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get any new view (this is a log from edg-perf03) :
> {code}
> 04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45) edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0, 169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> {code}
> Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
> Now a merge view with all 4 nodes is installed:
> {code}
> 04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264, edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2) [edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779, edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989], [edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|6] (1) [edg-perf01-63779]
> 04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true, becameCoordinator = false
> {code}
> edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the correct answer (operationId=650):
> {code}
> 04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1) Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is {edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue {value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
> {code}
> However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from local data container as no CH update/rebalance happened so far:
> {code}
> 04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl] (BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB, status=AVAILABLE
> 04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker] (BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411 (key_00000000000020DB)
> 04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1) Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779, edg-perf03-6264, local=true, uncertain=false, container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3: 0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1, isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false, lifespan=-1, maxIdle=-1], segmentId=173
> {code}
> Note that this was found on branch https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4995) ClusteredGet served for non-member of CH
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4995?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-4995:
-----------------------------------
You're right, my argumentation was flawed. I was also mixing PRAM and causality. We need to redefine the 'all processes see updates from one process in the same order as this has executed them' with version for non-atomic updates. My request is this:
* before update is finished, other processes can see both old and new value
* after the update is finished, other processes can see only the new value
* update can start only after previous update was finished
If you'd rather promote eventually consistent system, there should be a way how to find out whether given entry is convergent.
> ClusteredGet served for non-member of CH
> ----------------------------------------
>
> Key: ISPN-4995
> URL: https://issues.jboss.org/browse/ISPN-4995
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Reporter: Radim Vansa
> Priority: Critical
>
> When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen that when one thread does
> {code}
> put(K1, V1);
> put(K2, V2)
> {code}
> and another gets
> {code}
> get(K2) -> V2
> get(K1) -> V0 (some old value)
> {code}
> edg-perf01, 02 and 03 share this view and topology:
> {code}
> 04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779) edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is {edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
> 04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true, becameCoordinator = false
> 04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new topology = CacheTopology{id=16, rebalanceId=4, currentC
> H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170, edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
> {code}
> Later, edg-perf02 and edg-perf03 get new view and install a new topology, where edg-perf01 does not exist:
> {code}
> 04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
> 04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false, becameCoordinator = false
> 04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117, joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512, owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264], throwable=null, viewId=9}[sender=edg-perf02-45117]
> {code}
> After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only by edg-perf03 - this key servers as K1 in example above. It is not backed up to edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get any new view (this is a log from edg-perf03) :
> {code}
> 04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45) edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0, 169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> {code}
> Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
> Now a merge view with all 4 nodes is installed:
> {code}
> 04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264, edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2) [edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779, edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989], [edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|6] (1) [edg-perf01-63779]
> 04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true, becameCoordinator = false
> {code}
> edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the correct answer (operationId=650):
> {code}
> 04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1) Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is {edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue {value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
> {code}
> However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from local data container as no CH update/rebalance happened so far:
> {code}
> 04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl] (BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB, status=AVAILABLE
> 04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker] (BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411 (key_00000000000020DB)
> 04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1) Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779, edg-perf03-6264, local=true, uncertain=false, container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3: 0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1, isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false, lifespan=-1, maxIdle=-1], segmentId=173
> {code}
> Note that this was found on branch https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4969) Stopping a cache will stop all KeyAffinityServices created for other caches in the cache manager
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4969?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4969:
-----------------------------------------------
Paul Ferraro <paul.ferraro(a)redhat.com> changed the Status of [bug 1163820|https://bugzilla.redhat.com/show_bug.cgi?id=1163820] from NEW to POST
> Stopping a cache will stop all KeyAffinityServices created for other caches in the cache manager
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-4969
> URL: https://issues.jboss.org/browse/ISPN-4969
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 5.2.8.Final, 6.0.2.Final, 7.0.0.Final
> Reporter: Paul Ferraro
> Assignee: Paul Ferraro
> Priority: Critical
> Fix For: 7.0.1.Final, 5.2.9.Final, 6.0.3.Final
>
>
> We've had several reports in the WildFly forums of application runtime failures following undeployment of a separate application.
> WF creates a cache instance for each web application within the same cache container. However, KeyAffinityServiceImpl registers a cache manager listener that calls stop() on a @CacheStoppedEvent. However, this event might be triggered by any cache, not necessarily the cache with to which the KeyAffinityService is associated.
> The KeyAffinityServiceImpl.handleCacheStopped(CacheStoppedEvent) should only call stop() if the event.getCacheName() equals the name of the cache to which the affinity service is associated.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months