[JBoss JIRA] (ISPN-4979) CacheStatusResponse map uses too much memory
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-4979?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-4979:
-------------------------------------
I have added in reuse of CacheTopologies that are equal in the returned CacheStatusResponse and in my test this reduce the overall size by a substantial amount with both changes (my small test reduced size anywhere from 20-35 times smaller). This optimization will work best when you have a number of caches that share the same number of segments and owner nodes and/or if you have a large amount of nodes.
There is still an issue where a couple tests have an issue with topology changes that needs to be investigated.
> CacheStatusResponse map uses too much memory
> --------------------------------------------
>
> Key: ISPN-4979
> URL: https://issues.jboss.org/browse/ISPN-4979
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.1.0.Final
>
>
> When the cluster is large and there are a log of caches, the {{CacheStatusResponse}} map on the new coordinator can get quite large. One of the problems that seems to be that the addresses in {{DefaultConsistentHash}} are duplicated on serialization, so the deserialized version occupies more memory.
> We need to investigate why the objects are not "shared" by the River marshaller, and maybe work around the problem by de-duplicating the addresses in the externalizer.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4983) Public API for tracking completion of Infinispan work for a given user transaction
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4983?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4983:
------------------------------
Fix Version/s: 7.1.0.Alpha1
> Public API for tracking completion of Infinispan work for a given user transaction
> ----------------------------------------------------------------------------------
>
> Key: ISPN-4983
> URL: https://issues.jboss.org/browse/ISPN-4983
> Project: Infinispan
> Issue Type: Feature Request
> Reporter: Randall Hauch
> Labels: modeshape
> Fix For: 7.1.0.Alpha1
>
>
> When using Infinispan with user transactions, Infinispan will persist the changes to the cache store using a synchronization on the user transaction. This means the persistence operation begins when the user transaction has committed. However, components using Infinispan will likely want to know when Infinispan's work has completed for a given transaction.
> In Infinispan 6, it was possible to do this by registering a transaction listener:
> {code}
> org.infinispan.Cache cache = ...
> javax.transaction.Transaction activeTransaction = ...
> org.infinispan.transaction.TransactionTable txnTable = cache.getAdvancedCache().getComponentRegistry().getComponent(TransactionTable.class);
> org.infinispan.transaction.xa.GlobalTransaction ispnTxID = txnTable.getLocalTransaction(activeTransaction).getGlobalTransaction();
> {code}
> We'd then use the {{GlobalTransaction}} in our {{@Listener}}:
> {code}
> @Listener
> class TxnListener {
> @TransactionCompleted
> public void transactionCompleted( TransactionCompletedEvent event ) {
> if ( !event.isOriginLocal() ) return;
> GlobalTransaction eventIspnTransaction = event.getGlobalTransaction();
> if (eventIspnTransaction == null ||
> ispnTxID.getId() != eventIspnTransaction.getId()) return;
> if ( !event.isSuccessful() ) {
> // do stuff
> } else {
> // do other stuff
> }
> }
> {code}
> However, this is no longer possible in Infinispan 7 since these classes were moved to an "impl" package.
> Can we please have a public API to be notified when Infinispan has complete its work for a specific user transaction? It doesn't have to be like it was in 6, but ModeShape needs something (see MODE-2353 for details).
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4983) Public API for tracking completion of Infinispan work for a given user transaction
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4983?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo reassigned ISPN-4983:
---------------------------------
Assignee: Pedro Ruivo
> Public API for tracking completion of Infinispan work for a given user transaction
> ----------------------------------------------------------------------------------
>
> Key: ISPN-4983
> URL: https://issues.jboss.org/browse/ISPN-4983
> Project: Infinispan
> Issue Type: Feature Request
> Reporter: Randall Hauch
> Assignee: Pedro Ruivo
> Labels: modeshape
> Fix For: 7.1.0.Alpha1
>
>
> When using Infinispan with user transactions, Infinispan will persist the changes to the cache store using a synchronization on the user transaction. This means the persistence operation begins when the user transaction has committed. However, components using Infinispan will likely want to know when Infinispan's work has completed for a given transaction.
> In Infinispan 6, it was possible to do this by registering a transaction listener:
> {code}
> org.infinispan.Cache cache = ...
> javax.transaction.Transaction activeTransaction = ...
> org.infinispan.transaction.TransactionTable txnTable = cache.getAdvancedCache().getComponentRegistry().getComponent(TransactionTable.class);
> org.infinispan.transaction.xa.GlobalTransaction ispnTxID = txnTable.getLocalTransaction(activeTransaction).getGlobalTransaction();
> {code}
> We'd then use the {{GlobalTransaction}} in our {{@Listener}}:
> {code}
> @Listener
> class TxnListener {
> @TransactionCompleted
> public void transactionCompleted( TransactionCompletedEvent event ) {
> if ( !event.isOriginLocal() ) return;
> GlobalTransaction eventIspnTransaction = event.getGlobalTransaction();
> if (eventIspnTransaction == null ||
> ispnTxID.getId() != eventIspnTransaction.getId()) return;
> if ( !event.isSuccessful() ) {
> // do stuff
> } else {
> // do other stuff
> }
> }
> {code}
> However, this is no longer possible in Infinispan 7 since these classes were moved to an "impl" package.
> Can we please have a public API to be notified when Infinispan has complete its work for a specific user transaction? It doesn't have to be like it was in 6, but ModeShape needs something (see MODE-2353 for details).
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4766) Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4766?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4766:
-----------------------------------------------
Roman Macor <rmacor(a)redhat.com> changed the Status of [bug 1148723|https://bugzilla.redhat.com/show_bug.cgi?id=1148723] from ON_QA to VERIFIED
> Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
> ------------------------------------------------------------------------------------------
>
> Key: ISPN-4766
> URL: https://issues.jboss.org/browse/ISPN-4766
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.0.Beta2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> When the joiner becomes the coordinator, it tries to recover the current cache topologies, but it receives just one expected member and no current topology. This causes a NPE in ClusterCacheStatus:
> {noformat}
> 22:51:49,547 ERROR (transport-thread-NodeB-p21124-t1:) [ClusterCacheStatus] ISPN000228: Failed to recover cache dist state after the current node became the coordinator
> java.lang.NullPointerException
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:104)
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:452)
> at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:260)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:180)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:427)
> {noformat}
> The LocalTopologyManagerImpl waits a bit after receiving the SuspectException and tries again, but this time it receives a {{null}} initial topology, causing another NPE:
> {noformat}
> 22:51:51,319 DEBUG (testng-GlobalKeySetTaskTest:) [LocalTopologyManagerImpl] Error sending join request for cache dist to coordinator
> java.lang.NullPointerException
> at org.infinispan.topology.LocalTopologyManagerImpl.resetLocalTopologyBeforeRebalance(LocalTopologyManagerImpl.java:222)
> at org.infinispan.topology.LocalTopologyManagerImpl.handleTopologyUpdate(LocalTopologyManagerImpl.java:191)
> at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:105)
> at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
> {noformat}
> This keeps going on until the state transfer timeout expires.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4995) ClusteredGet served for non-member of CH
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4995?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4995:
------------------------------------
It's not really possible to guarantee the proper ordering for get results, even without a merge: http://distributedthoughts.wordpress.com/2013/09/08/eventual-consistency/
To apply their example to our non-transactional cache, let's say we have a cluster with nodes {{ABC}}, and {{owners(k) = AB}}.
A {{put(k, v1)}} operation will update the value on node B when it receives the forwarded {{put(k, v1}} command from A, and on node B when it receives the response from B.
If node C issues two {{get(k)}} operations in sequence, the first might receive a response from B after it sent the {{put}} response to A, and the second might receive a response from A before receiving the {{put}} response from B.
In this particular scenario, we could send an OutdatedTopologyException to block the get operation on perf04 until it receives the merged topology, like you suggested on IRC. But that would mean the get operation becomes blocking, and that's against the spirit of Infinispan as a cache.
Besides, we have another level of indirection: if you consider an external client, when the cluster is split in two partitions {{ABC}} and {{D}} but D doesn't know it yet, it could perfectly well read the old value of K1 from D and then the new value of K2 from A. So I'd rather not try to fix this.
I'm more concerned that perf04 reports that it successfully updated the value of both keys when both owners see a JGroups view in which it is not a member. How are you simulating network partitions? I remember JGroups has to allow unicast messages to non-members, otherwise GMS and MERGE2/3 wouldn't work, but I think we could filter those out - unless it turns out to be too expensive.
> ClusteredGet served for non-member of CH
> ----------------------------------------
>
> Key: ISPN-4995
> URL: https://issues.jboss.org/browse/ISPN-4995
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Reporter: Radim Vansa
> Priority: Critical
>
> When nodes accept ClusteredGetCommand from node that is not member of CH, it can happen that when one thread does
> {code}
> put(K1, V1);
> put(K2, V2)
> {code}
> and another gets
> {code}
> get(K2) -> V2
> get(K1) -> V0 (some old value)
> {code}
> edg-perf01, 02 and 03 share this view and topology:
> {code}
> 04:40:08,714 TRACE [org.jgroups.protocols.FD_SOCK] (INT-8,edg-perf01-63779) edg-perf01-63779: i-have-sock: edg-perf02-45117 --> 172.18.1.3:37476 (cache is {edg-perf01-63779=172.18.1.1:40099, edg-perf02-45117=172.18.1.3:37476})
> 04:40:08,715 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t6) Received new cluster view: 8, isCoordinator = true, becameCoordinator = false
> 04:40:11,203 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p2-t1) Updating local consistent hash(es) for cache testCache: new topology = CacheTopology{id=16, rebalanceId=4, currentC
> H=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf02-45117: 171+170, edg-perf03-6264: 171+171, edg-perf01-63779: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf02-45117, edg-perf03-6264, edg-perf01-63779]}
> {code}
> Later, edg-perf02 and edg-perf03 get new view and install a new topology, where edg-perf01 does not exist:
> {code}
> 04:41:13,681 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf03-6264) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], 1 subgroups: [edg-perf04-10989|7] (1) [edg-perf04-10989]
> 04:41:13,681 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t22) Received new cluster view: 9, isCoordinator = false, becameCoordinator = false
> 04:41:13,760 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p3-t32) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf02-45117, joinInfo=null, topologyId=18, rebalanceId=4, currentCH=DefaultConsistentHash{ns = 512, owners = (2)[edg-perf02-45117: 256+85, edg-perf03-6264: 256+86]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf02-45117, edg-perf03-6264], throwable=null, viewId=9}[sender=edg-perf02-45117]
> {code}
> After that, edg-perf04 writes to {{key_00000000000020DB}} which is currently owned only by edg-perf03 - this key servers as K1 in example above. It is not backed up to edg-perf01, but edg-perf01 still thinks it's an owner of this key as it did not get any new view (this is a log from edg-perf03) :
> {code}
> 04:41:30,884 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (remote-thread--p3-t45) edg-perf03-6264 invoking PutKeyValueCommand{key=key_00000000000020DB, value=[33 #4: 0, 169, 284, 634, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [edg-perf03-6264] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> {code}
> Later, edg-perf04 writes to another key {{stressor_33}} (K2 in the example) value with operationId=650 (previous value is 600) which is replicated to edg-perf02 and edg-perf03.
> Now a merge view with all 4 nodes is installed:
> {code}
> 04:41:31,258 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf01-63779) ISPN000093: Received new, MERGED cluster view for channel default: MergeView::[edg-perf01-63779|10] (4) [edg-perf01-63779, edg-perf03-6264, edg-perf02-45117, edg-perf04-10989], 6 subgroups: [edg-perf02-45117|7] (2) [edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|4] (2) [edg-perf01-63779, edg-perf02-45117], [edg-perf02-45117|9] (3) [edg-perf02-45117, edg-perf03-6264, edg-perf04-10989], [edg-perf03-6264|4] (2) [edg-perf03-6264, edg-perf04-10989], [edg-perf01-63779|8] (3) [edg-perf01-63779, edg-perf02-45117, edg-perf03-6264], [edg-perf01-63779|6] (1) [edg-perf01-63779]
> 04:41:31,258 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t2) Received new cluster view: 10, isCoordinator = true, becameCoordinator = false
> {code}
> edg-perf01 now issues a remote get to edg-perf02 for key stressor_33 and receives the correct answer (operationId=650):
> {code}
> 04:41:32,494 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (BackgroundOps-Checker-1) Response(s) to ClusteredGetCommand{key=stressor_33, flags=null} is {edg-perf02-45117=SuccessfulResponse{responseValue=ImmortalCacheValue {value=LastOperation{operationId=650, seed=0000A15A4C2DD25A}}} }
> {code}
> However, when edg-perf01 reads {{key_00000000000020DB}}, it loads the old value from local data container as no CH update/rebalance happened so far:
> {code}
> 04:41:32,496 TRACE [org.infinispan.partitionhandling.impl.PartitionHandlingManagerImpl] (BackgroundOps-Checker-1) Checking availability for key=key_00000000000020DB, status=AVAILABLE
> 04:41:32,497 ERROR [org.radargun.stages.cache.background.LogChecker] (BackgroundOps-Checker-1) Missing operation 634 for thread 33 on key 8411 (key_00000000000020DB)
> 04:41:32,499 DEBUG [org.radargun.service.InfinispanDebugable] (BackgroundOps-Checker-1) Debug info for key testCache key_00000000000020DB: owners=edg-perf01-63779, edg-perf03-6264, local=true, uncertain=false, container.key_00000000000020DB=ImmortalCacheEntry[key=key_00000000000020DB, value=[33 #3: 0, 169, 284, ], created=-1, isCreated=false, lastUsed=-1, isChanged=false, expires=-1, isExpired=false, canExpire=false, isEvicted=true, isRemoved=false, isValid=false, lifespan=-1, maxIdle=-1], segmentId=173
> {code}
> Note that this was found on branch https://github.com/infinispan/infinispan/pull/3062/files trying to fix ISPN-4949.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months