[JBoss JIRA] (ISPN-2772) Implement REPLICATED mode as a degenerated DISTRIBUTED mode (nowOwners>=clusterSize)
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2772?page=com.atlassian.jira.plugin.... ]
Work on ISPN-2772 started by Adrian Nistor.
> Implement REPLICATED mode as a degenerated DISTRIBUTED mode (nowOwners>=clusterSize)
> ------------------------------------------------------------------------------------
>
> Key: ISPN-2772
> URL: https://issues.jboss.org/browse/ISPN-2772
> Project: Infinispan
> Issue Type: Feature Request
> Affects Versions: 5.2.0.Final
> Reporter: Mircea Markus
> Assignee: Adrian Nistor
> Priority: Critical
> Fix For: 5.3.0.Final
>
>
> This has already been done in the case of state transfer, where the distribution state transfer code is reused for replicated caches as well.
> The main reason behind this improvement is to simplify/reduce the code. Also there will be some additional benefits:
> - ATM in replicated mode, the JGroups coordinator always plays the role of main lock owner. The coordinator might get overwhelmed as it has to process the additional TxCompletionNotificationCommand on every transaction (direct consequence of being main lock owner). OTOH in distributed mode, the lock owner is spread between the cluster members.
> As an optimisation, on REPL mode, we can use multicasting (when on UDP) for message sending.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2950) In distributed mode cache store data should be read through the main data owner (vs directly from the store)
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2950?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-2950:
-------------------------------------
Umm, sorry, I completely misread the part about deprecating ClusterCacheLoader ... I actually read ClusterCacheLoaderInterceptor, which indeed needs a small fix for ISPN-2974. Please ignore my previous comment :)
> In distributed mode cache store data should be read through the main data owner (vs directly from the store)
> ------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2950
> URL: https://issues.jboss.org/browse/ISPN-2950
> Project: Infinispan
> Issue Type: Feature Request
> Components: Loaders and Stores
> Reporter: Sanne Grinovero
> Assignee: Mircea Markus
> Priority: Critical
> Fix For: 5.3.0.Alpha1, 5.3.0.Final
>
>
> Dist cache with a cache store (shared or not), k owned by \{N1, N2\}. k is read on N3. What currently happens at this stage, if k is not present in N3's memory (likely unless L1 is configured), the N3's cache store is queried and data is loaded from there. This has several drawbacks:
> - the data might already be in the memory of the owner node (N1,N2) so reading it from the disk is highly inefficient. Especially for hot data: data requested from various nodes at the same time (see also mailing list discussion around lucene query performance depending on this)
> - if this is a local cache store, it might contain stale data which would be returned to the user
> - for async configured cache store this would result in dirty reads, given that a change might be in the async store's memory but not in the store at the moment when it is in read by N3. (Note that using async stores still leaves place to inconsistencies when a node leaves, e.g. because of node crashing before managing to flush the async store.)
> This JIRA is about changing the distribution mode: when asked for a specific key, a node would only touch a cache store if it is an owner of that key, otherwise would first go to the main owner of the key to read the value from there. The ClusterCacheLoader should be deprecated as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2986) Intermittent failure to start new nodes during heavy write load
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2986?page=com.atlassian.jira.plugin.... ]
Dan Berindei resolved ISPN-2986.
--------------------------------
Resolution: Partially Completed
Increasing the number of OOB threads fixed the issue. ISPN-2849 will improve the situation with a lower number of OOB threads.
> Intermittent failure to start new nodes during heavy write load
> ---------------------------------------------------------------
>
> Key: ISPN-2986
> URL: https://issues.jboss.org/browse/ISPN-2986
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache, Server
> Affects Versions: 5.2.5.Final
> Environment: 4 servers running linux 2.6.32-220.13.1.el6.x86_64 with 2x QuadCore 2.4ghz CPU's
> Gigabit ethernet, same switch.
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
> Reporter: Marc Bridner
> Assignee: Tristan Tarrant
> Attachments: logs.zip, test-infinispan.xml, test-jgroups.xml, test.infinispan.zip
>
>
> When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked.
> Setup is as follows:
> 3 servers, 1 client
> * dl380x2385, 10.64.106.21, client
> * dl380x2384, 10.64.106.20, first node
> * dl380x2383, 10.64.106.19, second node
> * dl380x2382, 10.64.106.18, third node
> 2 caches, initial state transfer off, transactions on, config is attached.
> Small app that triggers the problem is also attached.
> Steps to reproduce:
> 1. Start first node
> 2. Start client, wait for counter to reach 50000 (in client)
> 3. Start second node. 10% chance it'll fail.
> 4. Wait for counter to reach 100000 in client.
> 5. Start third node, 50% chance it'll fail.
> If it doesn't fail, terminate everything and start over.
> I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know.
> I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work.
> Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2950) In distributed mode cache store data should be read through the main data owner (vs directly from the store)
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-2950?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero commented on ISPN-2950:
---------------------------------------
Cool Adrian, looking forward to it!
Mircea, I have indeed been confusing ClusterCacheLoader vs RemoteCacheLoader but I think now they actually serve the same purpose, just one is meant to load from a different cache on the same embedded CacheManager, while the other uses Hot Rod ?
Using either to workaround the performance issue is a doubtful workaround but I don't see the relation with removing/deprecating any of them.
> In distributed mode cache store data should be read through the main data owner (vs directly from the store)
> ------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2950
> URL: https://issues.jboss.org/browse/ISPN-2950
> Project: Infinispan
> Issue Type: Feature Request
> Components: Loaders and Stores
> Reporter: Sanne Grinovero
> Assignee: Mircea Markus
> Priority: Critical
> Fix For: 5.3.0.Alpha1, 5.3.0.Final
>
>
> Dist cache with a cache store (shared or not), k owned by \{N1, N2\}. k is read on N3. What currently happens at this stage, if k is not present in N3's memory (likely unless L1 is configured), the N3's cache store is queried and data is loaded from there. This has several drawbacks:
> - the data might already be in the memory of the owner node (N1,N2) so reading it from the disk is highly inefficient. Especially for hot data: data requested from various nodes at the same time (see also mailing list discussion around lucene query performance depending on this)
> - if this is a local cache store, it might contain stale data which would be returned to the user
> - for async configured cache store this would result in dirty reads, given that a change might be in the async store's memory but not in the store at the moment when it is in read by N3. (Note that using async stores still leaves place to inconsistencies when a node leaves, e.g. because of node crashing before managing to flush the async store.)
> This JIRA is about changing the distribution mode: when asked for a specific key, a node would only touch a cache store if it is an owner of that key, otherwise would first go to the main owner of the key to read the value from there. The ClusterCacheLoader should be deprecated as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2982) CLONE - State transfer does not end because some segments are erroneously reported as unreceived
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2982?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-2982:
-------------------------------------
[~ittayd] Does it still happen after increasing the number of threads?
> CLONE - State transfer does not end because some segments are erroneously reported as unreceived
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-2982
> URL: https://issues.jboss.org/browse/ISPN-2982
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.1.Final
> Reporter: Ittay Dror
> Assignee: Adrian Nistor
> Priority: Critical
> Attachments: jgroups.xml
>
>
> Hard to reproduce. I lost the last log where this was visible but still have a stack trace:
> org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
> at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
> at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:197)
> at org.infinispan.CacheImpl.start(CacheImpl.java:517)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:689)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
> at org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:126)
> at org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:574)
> Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache LuceneIndexesMetadata on PersistentStateTransferQueryDistributedIndexTest-NodeC-6067
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:199)
> at sun.reflect.GeneratedMethodAccessor139.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
> ... 10 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2950) In distributed mode cache store data should be read through the main data owner (vs directly from the store)
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2950?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-2950:
-------------------------------------
[~sannegrinovero], [~mircea.markus] The ClusterCacheLoader is also responsible for causing ISPN-2974 which should be fixed on 5.2 because it impacts ModeShape. So before deciding to remove it in 5.3 lets at least fix in 5.2 the part that is needed for ISPN-2974. I have a fix for it and I'm very close to a PR.
> In distributed mode cache store data should be read through the main data owner (vs directly from the store)
> ------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2950
> URL: https://issues.jboss.org/browse/ISPN-2950
> Project: Infinispan
> Issue Type: Feature Request
> Components: Loaders and Stores
> Reporter: Sanne Grinovero
> Assignee: Mircea Markus
> Priority: Critical
> Fix For: 5.3.0.Alpha1, 5.3.0.Final
>
>
> Dist cache with a cache store (shared or not), k owned by \{N1, N2\}. k is read on N3. What currently happens at this stage, if k is not present in N3's memory (likely unless L1 is configured), the N3's cache store is queried and data is loaded from there. This has several drawbacks:
> - the data might already be in the memory of the owner node (N1,N2) so reading it from the disk is highly inefficient. Especially for hot data: data requested from various nodes at the same time (see also mailing list discussion around lucene query performance depending on this)
> - if this is a local cache store, it might contain stale data which would be returned to the user
> - for async configured cache store this would result in dirty reads, given that a change might be in the async store's memory but not in the store at the moment when it is in read by N3. (Note that using async stores still leaves place to inconsistencies when a node leaves, e.g. because of node crashing before managing to flush the async store.)
> This JIRA is about changing the distribution mode: when asked for a specific key, a node would only touch a cache store if it is an owner of that key, otherwise would first go to the main owner of the key to read the value from there. The ClusterCacheLoader should be deprecated as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months