April 2013 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-2848) Use the new bundling mechanism from JGroups 3.3.0

by Pedro Ruivo (JIRA)

[ https://issues.jboss.org/browse/ISPN-2848?page=com.atlassian.jira.plugin.... ] Pedro Ruivo updated ISPN-2848: ------------------------------ Status: Pull Request Sent (was: Coding In Progress) Git Pull Request: https://github.com/infinispan/infinispan/pull/1743 > Use the new bundling mechanism from JGroups 3.3.0 > ------------------------------------------------- > > Key: ISPN-2848 > URL: https://issues.jboss.org/browse/ISPN-2848 > Project: Infinispan > Issue Type: Feature Request > Reporter: Mircea Markus > Assignee: Pedro Ruivo > Fix For: 5.3.0.Beta1, 5.3.0.Final > > > See the "Message bundling" as described here: https://docspace.corp.redhat.com/docs/DOC-134411 > It makes sense to remove the DONT_BUNDLE flag entirely: as per [~belaban] even for a single sync request, the performance with bundling enabled is the same as without bundling. This change should be validated by QA performance wise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2986) Intermittent failure to start new nodes during heavy write load

by Marc Bridner (JIRA)

[ https://issues.jboss.org/browse/ISPN-2986?page=com.atlassian.jira.plugin.... ] Marc Bridner commented on ISPN-2986: ------------------------------------ Ahh.. interesting. I'll bump it up to 200ish tomorrow and see if that fixes it and get back to you. > Intermittent failure to start new nodes during heavy write load > --------------------------------------------------------------- > > Key: ISPN-2986 > URL: https://issues.jboss.org/browse/ISPN-2986 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache, Server > Affects Versions: 5.2.5.Final > Environment: 4 servers running linux 2.6.32-220.13.1.el6.x86_64 with 2x QuadCore 2.4ghz CPU's > Gigabit ethernet, same switch. > java version "1.7.0" > Java(TM) SE Runtime Environment (build 1.7.0-b147) > Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) > Reporter: Marc Bridner > Assignee: Tristan Tarrant > Attachments: logs.zip, test-infinispan.xml, test-jgroups.xml, test.infinispan.zip > > > When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked. > Setup is as follows: > 3 servers, 1 client > * dl380x2385, 10.64.106.21, client > * dl380x2384, 10.64.106.20, first node > * dl380x2383, 10.64.106.19, second node > * dl380x2382, 10.64.106.18, third node > 2 caches, initial state transfer off, transactions on, config is attached. > Small app that triggers the problem is also attached. > Steps to reproduce: > 1. Start first node > 2. Start client, wait for counter to reach 50000 (in client) > 3. Start second node. 10% chance it'll fail. > 4. Wait for counter to reach 100000 in client. > 5. Start third node, 50% chance it'll fail. > If it doesn't fail, terminate everything and start over. > I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know. > I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work. > Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2986) Intermittent failure to start new nodes during heavy write load

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2986?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-2986: ------------------------------------ The number of OOB threads in your JGroups configuration is too small. While a node is joining, the other nodes will forward all commands to the joiner, but the joiner can't process those commands until it receives the cache topology from the coordinator. When there are few OOB threads, the forwarded commands can block all the threads and the joiner will never be able to process the cache topology from the coordinator - a deadlock. With the ISPN-2808 fix, commands are moved to a "remote commands" thread pool, but that thread pool still "overflows" to the OOB thread pool, so it has to have an available thread for each incoming command in order to avoid the deadlock. We are currently working on ISPN-2849, which will avoid the deadlock by not scheduling commands to the remote commands thread pool until the proper topology has been installed (thus freeing the threads for installing the topology). > Intermittent failure to start new nodes during heavy write load > --------------------------------------------------------------- > > Key: ISPN-2986 > URL: https://issues.jboss.org/browse/ISPN-2986 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache, Server > Affects Versions: 5.2.5.Final > Environment: 4 servers running linux 2.6.32-220.13.1.el6.x86_64 with 2x QuadCore 2.4ghz CPU's > Gigabit ethernet, same switch. > java version "1.7.0" > Java(TM) SE Runtime Environment (build 1.7.0-b147) > Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) > Reporter: Marc Bridner > Assignee: Tristan Tarrant > Attachments: logs.zip, test-infinispan.xml, test-jgroups.xml, test.infinispan.zip > > > When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked. > Setup is as follows: > 3 servers, 1 client > * dl380x2385, 10.64.106.21, client > * dl380x2384, 10.64.106.20, first node > * dl380x2383, 10.64.106.19, second node > * dl380x2382, 10.64.106.18, third node > 2 caches, initial state transfer off, transactions on, config is attached. > Small app that triggers the problem is also attached. > Steps to reproduce: > 1. Start first node > 2. Start client, wait for counter to reach 50000 (in client) > 3. Start second node. 10% chance it'll fail. > 4. Wait for counter to reach 100000 in client. > 5. Start third node, 50% chance it'll fail. > If it doesn't fail, terminate everything and start over. > I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know. > I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work. > Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2986) Intermittent failure to start new nodes during heavy write load

by Marc Bridner (JIRA)

[ https://issues.jboss.org/browse/ISPN-2986?page=com.atlassian.jira.plugin.... ] Marc Bridner updated ISPN-2986: ------------------------------- Description: When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked. Setup is as follows: 3 servers, 1 client * dl380x2385, 10.64.106.21, client * dl380x2384, 10.64.106.20, first node * dl380x2383, 10.64.106.19, second node * dl380x2382, 10.64.106.18, third node 2 caches, initial state transfer off, transactions on, config is attached. Small app that triggers the problem is also attached. Steps to reproduce: 1. Start first node 2. Start client, wait for counter to reach 50000 (in client) 3. Start second node. 10% chance it'll fail. 4. Wait for counter to reach 100000 in client. 5. Start third node, 50% chance it'll fail. If it doesn't fail, terminate everything and start over. I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know. I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work. Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982. was: When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked. Setup is as follows: 3 servers, 1 client * dl30x2385, 10.64.106.21, client * dl30x2384, 10.64.106.20, first node * dl30x2383, 10.64.106.19, second node * dl30x2382, 10.64.106.18, third node 2 caches, initial state transfer off, transactions on, config is attached. Small app that triggers the problem is also attached. Steps to reproduce: 1. Start first node 2. Start client, wait for counter to reach 50000 (in client) 3. Start second node. 10% chance it'll fail. 4. Wait for counter to reach 100000 in client. 5. Start third node, 50% chance it'll fail. If it doesn't fail, terminate everything and start over. I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know. I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work. Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982. > Intermittent failure to start new nodes during heavy write load > --------------------------------------------------------------- > > Key: ISPN-2986 > URL: https://issues.jboss.org/browse/ISPN-2986 > Project: Infinispan > Issue Type: Bug > Components: Distributed Cache, Server > Affects Versions: 5.2.5.Final > Environment: 4 servers running linux 2.6.32-220.13.1.el6.x86_64 with 2x QuadCore 2.4ghz CPU's > Gigabit ethernet, same switch. > java version "1.7.0" > Java(TM) SE Runtime Environment (build 1.7.0-b147) > Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) > Reporter: Marc Bridner > Assignee: Tristan Tarrant > Attachments: logs.zip, test-infinispan.xml, test-jgroups.xml, test.infinispan.zip > > > When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked. > Setup is as follows: > 3 servers, 1 client > * dl380x2385, 10.64.106.21, client > * dl380x2384, 10.64.106.20, first node > * dl380x2383, 10.64.106.19, second node > * dl380x2382, 10.64.106.18, third node > 2 caches, initial state transfer off, transactions on, config is attached. > Small app that triggers the problem is also attached. > Steps to reproduce: > 1. Start first node > 2. Start client, wait for counter to reach 50000 (in client) > 3. Start second node. 10% chance it'll fail. > 4. Wait for counter to reach 100000 in client. > 5. Start third node, 50% chance it'll fail. > If it doesn't fail, terminate everything and start over. > I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know. > I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work. > Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ISPN-2982. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2987) Fix for ISPN-2808: BaseRpcCommand should provide an implementation of the new method

by Pedro Ruivo (JIRA)

[ https://issues.jboss.org/browse/ISPN-2987?page=com.atlassian.jira.plugin.... ] Work on ISPN-2987 started by Pedro Ruivo. > Fix for ISPN-2808: BaseRpcCommand should provide an implementation of the new method > ------------------------------------------------------------------------------------ > > Key: ISPN-2987 > URL: https://issues.jboss.org/browse/ISPN-2987 > Project: Infinispan > Issue Type: Bug > Components: RPC > Affects Versions: 5.3.0.Alpha1 > Reporter: Pedro Ruivo > Assignee: Pedro Ruivo > Priority: Blocker > Fix For: 5.3.0.Alpha1 > > > It has broken the integration with other projects -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2974) DeltaAware based fine-grained replication corrupts cache data, if eviction is enabled

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-2974?page=com.atlassian.jira.plugin.... ] Adrian Nistor updated ISPN-2974: -------------------------------- Fix Version/s: 5.3.0.Alpha1 > DeltaAware based fine-grained replication corrupts cache data, if eviction is enabled > ------------------------------------------------------------------------------------- > > Key: ISPN-2974 > URL: https://issues.jboss.org/browse/ISPN-2974 > Project: Infinispan > Issue Type: Bug > Affects Versions: 5.2.1.Final, 5.2.5.Final > Reporter: Horia Chiorean > Assignee: Adrian Nistor > Priority: Critical > Fix For: 5.3.0.Alpha1 > > > When using a custom {{DeltaAware}} implementation in a cluster with 2 replicated nodes with eviction enabled, data transferred from one node (the writer) to the another (the reader) causes data stored on this node and evicted at the time of the change, to be rewritten with whatever the partial latest delta was. > In more detail: > * configure 2 nodes in replicated mode, with eviction enabled > * consider NodeA the writer and NodeB the reader > * NodeA inserts some data (custom entries) into the cache > * NodeB correctly receives via state transfer the initial data > * NodeA loads & partially updates some information about an entry which was not in the cache - was evicted previously > * NodeB receives the partial delta with the changes from NodeA, but *instead of merging* with whatever is stored in the persistent store, *replaces the entire entry in the cache*, leaving it in effect with "partial/corrupt information" > If eviction is not enabled, everything works as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2988) Invalid Transport.isCoordinator() usages in ClusteredActivationInterceptor and ClusteredCacheLoaderInterceptor in REPL mode

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-2988?page=com.atlassian.jira.plugin.... ] Work on ISPN-2988 started by Adrian Nistor. > Invalid Transport.isCoordinator() usages in ClusteredActivationInterceptor and ClusteredCacheLoaderInterceptor in REPL mode > --------------------------------------------------------------------------------------------------------------------------- > > Key: ISPN-2988 > URL: https://issues.jboss.org/browse/ISPN-2988 > Project: Infinispan > Issue Type: Bug > Components: Loaders and Stores > Affects Versions: 5.2.5.Final > Reporter: Adrian Nistor > Assignee: Adrian Nistor > Fix For: 5.3.0.Alpha1 > > > Transport.isCoordinator() is used to determine if we are the primary owner but this usage is no longer valid since the introduction of asymmetric caches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2988) Invalid Transport.isCoordinator() usages in ClusteredActivationInterceptor and ClusteredCacheLoaderInterceptor in REPL mode

by Adrian Nistor (JIRA)

Adrian Nistor created ISPN-2988: ----------------------------------- Summary: Invalid Transport.isCoordinator() usages in ClusteredActivationInterceptor and ClusteredCacheLoaderInterceptor in REPL mode Key: ISPN-2988 URL: https://issues.jboss.org/browse/ISPN-2988 Project: Infinispan Issue Type: Bug Components: Loaders and Stores Affects Versions: 5.2.5.Final Reporter: Adrian Nistor Assignee: Adrian Nistor Fix For: 5.3.0.Alpha1 Transport.isCoordinator() is used to determine if we are the primary owner but this usage is no longer valid since the introduction of asymmetric caches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2982) CLONE - State transfer does not end because some segments are erroneously reported as unreceived

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2982?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-2982: ------------------------------------ Ittay, unfortunately this stack trace is very generic so we can't tell anything from it. Logs with TRACE enabled for org.infinispan.topology and org.infinispan.statetransfer would be very helpful. The number of transport threads does seem a bit low, however. Please try with at least 6 threads (the value we use in the test suite), that might fix your problem. > CLONE - State transfer does not end because some segments are erroneously reported as unreceived > ------------------------------------------------------------------------------------------------ > > Key: ISPN-2982 > URL: https://issues.jboss.org/browse/ISPN-2982 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 5.2.1.Final > Reporter: Ittay Dror > Assignee: Adrian Nistor > Priority: Critical > Attachments: jgroups.xml > > > Hard to reproduce. I lost the last log where this was visible but still have a stack trace: > org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl > at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205) > at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879) > at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650) > at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639) > at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542) > at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:197) > at org.infinispan.CacheImpl.start(CacheImpl.java:517) > at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:689) > at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652) > at org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:126) > at org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:574) > Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache LuceneIndexesMetadata on PersistentStateTransferQueryDistributedIndexTest-NodeC-6067 > at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:199) > at sun.reflect.GeneratedMethodAccessor139.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203) > ... 10 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2982) CLONE - State transfer does not end because some segments are erroneously reported as unreceived

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-2982?page=com.atlassian.jira.plugin.... ] Adrian Nistor updated ISPN-2982: -------------------------------- Fix Version/s: (was: 5.2.0.Beta4) Affects Version/s: 5.2.1.Final (was: 5.2.0.Beta1) > CLONE - State transfer does not end because some segments are erroneously reported as unreceived > ------------------------------------------------------------------------------------------------ > > Key: ISPN-2982 > URL: https://issues.jboss.org/browse/ISPN-2982 > Project: Infinispan > Issue Type: Bug > Components: State transfer > Affects Versions: 5.2.1.Final > Reporter: Ittay Dror > Assignee: Adrian Nistor > Priority: Critical > Attachments: jgroups.xml > > > Hard to reproduce. I lost the last log where this was visible but still have a stack trace: > org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl > at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205) > at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879) > at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650) > at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639) > at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542) > at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:197) > at org.infinispan.CacheImpl.start(CacheImpl.java:517) > at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:689) > at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652) > at org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:126) > at org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:574) > Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache LuceneIndexesMetadata on PersistentStateTransferQueryDistributedIndexTest-NodeC-6067 > at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:199) > at sun.reflect.GeneratedMethodAccessor139.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203) > ... 10 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

11 years, 9 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues April 2013