July 2014 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-2240) Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2240?page=com.atlassian.jira.plugin.... ] Dan Berindei closed ISPN-2240. ------------------------------ Resolution: Cannot Reproduce Bug I ran the test on 6.0.x and 5.3.x and I didn't get any TimeoutExceptions. > Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key > ---------------------------------------------------------------------------------------------- > > Key: ISPN-2240 > URL: https://issues.jboss.org/browse/ISPN-2240 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Transactions > Affects Versions: 4.0.0.ALPHA1, 5.1.6.FINAL > Reporter: Robert Stupp > Assignee: Mircea Markus > Priority: Critical > Fix For: 7.0.0.Beta1, 7.0.0.Final > > Attachments: ISPN-2240_fix_TimeoutExceptions.patch, somehow.zip > > > Hi, > I've encountered a lot of TimeoutExceptions just running a load test against an infinispan cluster. > I tracked down the reason and found out, that the code in org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainer#releaseLock() causes these superfluous TimeoutExceptions. > A small test case (which just prints out timeouts, too late timeouts and "paints" a lot of dots to the console - more dots/second on the console means better throughput ;-) > In a short test I extended the class ReentrantPerEntryLockContainer and changed the implementation of releaseLock() as follows: > {noformat} > public void releaseLock(Object lockOwner, Object key) { > ReentrantLock l = locks.get(key); > if (l != null) { > if (!l.isHeldByCurrentThread()) > throw new IllegalStateException("Lock for [" + key + "] not held by current thread " + Thread.currentThread()); > while (l.isHeldByCurrentThread()) > unlock(l, lockOwner); > if (!l.hasQueuedThreads()) > locks.remove(key); > } > else > throw new IllegalStateException("No lock for [" + key + ']'); > } > {noformat} > The main improvement is that locks are not removed from the concurrent map as long as other threads are waiting on that lock. > If the lock is removed from the map while other threads are waiting for it, they may run into timeouts and force TimeoutExceptions to the client. > The above methods "paints more dots per second" - means: it gives a better throughput for concurrent accesses to the same key. > The re-implemented method should also fix some replication timeout exceptions. > Please, please add this to 5.1.7, if possible. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2240) Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key

by Robert Stupp (JIRA)

[ https://issues.jboss.org/browse/ISPN-2240?page=com.atlassian.jira.plugin.... ] Robert Stupp commented on ISPN-2240: ------------------------------------ We don't need it at the moment - thanks. > Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key > ---------------------------------------------------------------------------------------------- > > Key: ISPN-2240 > URL: https://issues.jboss.org/browse/ISPN-2240 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Transactions > Affects Versions: 4.0.0.ALPHA1, 5.1.6.FINAL > Reporter: Robert Stupp > Assignee: Mircea Markus > Priority: Critical > Fix For: 7.0.0.Beta1, 7.0.0.Final > > Attachments: ISPN-2240_fix_TimeoutExceptions.patch, somehow.zip > > > Hi, > I've encountered a lot of TimeoutExceptions just running a load test against an infinispan cluster. > I tracked down the reason and found out, that the code in org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainer#releaseLock() causes these superfluous TimeoutExceptions. > A small test case (which just prints out timeouts, too late timeouts and "paints" a lot of dots to the console - more dots/second on the console means better throughput ;-) > In a short test I extended the class ReentrantPerEntryLockContainer and changed the implementation of releaseLock() as follows: > {noformat} > public void releaseLock(Object lockOwner, Object key) { > ReentrantLock l = locks.get(key); > if (l != null) { > if (!l.isHeldByCurrentThread()) > throw new IllegalStateException("Lock for [" + key + "] not held by current thread " + Thread.currentThread()); > while (l.isHeldByCurrentThread()) > unlock(l, lockOwner); > if (!l.hasQueuedThreads()) > locks.remove(key); > } > else > throw new IllegalStateException("No lock for [" + key + ']'); > } > {noformat} > The main improvement is that locks are not removed from the concurrent map as long as other threads are waiting on that lock. > If the lock is removed from the map while other threads are waiting for it, they may run into timeouts and force TimeoutExceptions to the client. > The above methods "paints more dots per second" - means: it gives a better throughput for concurrent accesses to the same key. > The re-implemented method should also fix some replication timeout exceptions. > Please, please add this to 5.1.7, if possible. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4424) getCacheEntry is not safe

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4424?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4424: ----------------------------------------------- Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1110647|https://bugzilla.redhat.com/show_bug.cgi?id=1110647] from ON_QA to VERIFIED > getCacheEntry is not safe > ------------------------- > > Key: ISPN-4424 > URL: https://issues.jboss.org/browse/ISPN-4424 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Remote Protocols > Affects Versions: 6.0.2.Final, 7.0.0.Alpha4 > Reporter: Galder Zamarreño > Assignee: Galder Zamarreño > Fix For: 7.0.0.Alpha5, 7.0.0.Final > > > Versioned update with a multi threaded Hot Rod client results in inconsistency. Some replaceWithVersion return true ignoring a version update executed in another thread. Here's a log excerpt of a concurrency stress test: > ``` > 2014-06-20 16:16:56,798 INFO [PutFromNull] (pool-7-thread-10) count=462,prev=462,new=463 > 2014-06-20 16:16:56,820 INFO [PutFromNull] (pool-7-thread-9) count=463,prev=463,new=464 > 2014-06-20 16:16:56,831 INFO [PutFromNull] (pool-7-thread-2) count=464,prev=463,new=464 > 2014-06-20 16:16:56,845 INFO [PutFromNull] (pool-7-thread-9) count=465,prev=464,new=465 > ``` > Here you see two threads applying the same replacement, from 463 to 464. > The issue appears a result of a race condition in Hot Rod server's protocol decoder. When replaceIfUmodified is received, the cache entry is retrieved to verify whether the version in the server and the version sent in the command match. However, the cache entry retrieved is mutable, and the value could change midway through this operation as a result of another thread updating the value. Please find below some log snippets showing this. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4467) keySet operation via HotRod in compatibility mode throws ClassCastException

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4467?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4467: ----------------------------------------------- Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1112886|https://bugzilla.redhat.com/show_bug.cgi?id=1112886] from ON_QA to VERIFIED > keySet operation via HotRod in compatibility mode throws ClassCastException > --------------------------------------------------------------------------- > > Key: ISPN-4467 > URL: https://issues.jboss.org/browse/ISPN-4467 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Affects Versions: 7.0.0.Alpha1 > Reporter: Martin Gencur > Assignee: Galder Zamarreño > Fix For: 7.0.0.Alpha5 > > > The HotRod client's keySet() operation throws ClassCastException due to the following reason: > When Encoder2x.scala in its writeResponse() method decodes the operation as BulkGetKeysResponse, it runs a Map/Reduce job that returns a set of keys in the whole cluster. > The operation returns a set of "unmarshalled" (cos we're in compatibility mode) entries. However, Scala infers the type of individual entries as "Bytes" which is an alias for Array[Byte]. > As a result, when this iterator from this key set is retrieved, it is not possible to iterate through the entries because Scala automatically tries to convert each (unmarshalled) entry into a byte array, which results in the exception. > This line results in throwing CCE: > https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/ma... > -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4438) Entry is not properly unmarshalled by HotRod client in compatibility mode when L1 enabled

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4438?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4438: ----------------------------------------------- Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1112886|https://bugzilla.redhat.com/show_bug.cgi?id=1112886] from ON_QA to VERIFIED > Entry is not properly unmarshalled by HotRod client in compatibility mode when L1 enabled > ----------------------------------------------------------------------------------------- > > Key: ISPN-4438 > URL: https://issues.jboss.org/browse/ISPN-4438 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Affects Versions: 7.0.0.Alpha4 > Reporter: Martin Gencur > Assignee: Martin Gencur > Fix For: 7.0.0.Alpha5 > > > When a distributed cache is used together with compatibility mode and L1 is enabled. The entry being returned to the client is not unmarshalled when it is found in the L1 cache. Unmarshalling works fine if the entry is retrieved from a remote node (not found in L1). -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4437) L1 cache is enabled by default in server

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4437?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4437: ----------------------------------------------- Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1112886|https://bugzilla.redhat.com/show_bug.cgi?id=1112886] from ON_QA to VERIFIED > L1 cache is enabled by default in server > ---------------------------------------- > > Key: ISPN-4437 > URL: https://issues.jboss.org/browse/ISPN-4437 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Server > Affects Versions: 7.0.0.Alpha4 > Reporter: Martin Gencur > Assignee: Martin Gencur > Fix For: 7.0.0.Alpha5 > > > When a distributed cache is used in the server, L1 cache is enabled and the lifespan is 10 mins even if L1 is not configured in standalone.xml file. > This is not an expected behaviour and should be changed. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-2240) Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-2240?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-2240: ------------------------------------ [~snazy] there seems to be a problem with the attached test: it locks a key twice, but it only releases it once. ReentrantPerEntryLockContainer expects every {{lock(owner, key, timeout, unit)}} call to be paired with a {{unlock(owner, key)}} call, so the test never releases any locks. Once I changed that, the test worked beautifully with 7.0.0.Alpha4. I haven't checked older versions yet, do you still need to work with 5.3/6.0 for some reason? > Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key > ---------------------------------------------------------------------------------------------- > > Key: ISPN-2240 > URL: https://issues.jboss.org/browse/ISPN-2240 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Transactions > Affects Versions: 4.0.0.ALPHA1, 5.1.6.FINAL > Reporter: Robert Stupp > Assignee: Mircea Markus > Priority: Critical > Fix For: 7.0.0.Beta1, 7.0.0.Final > > Attachments: ISPN-2240_fix_TimeoutExceptions.patch, somehow.zip > > > Hi, > I've encountered a lot of TimeoutExceptions just running a load test against an infinispan cluster. > I tracked down the reason and found out, that the code in org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainer#releaseLock() causes these superfluous TimeoutExceptions. > A small test case (which just prints out timeouts, too late timeouts and "paints" a lot of dots to the console - more dots/second on the console means better throughput ;-) > In a short test I extended the class ReentrantPerEntryLockContainer and changed the implementation of releaseLock() as follows: > {noformat} > public void releaseLock(Object lockOwner, Object key) { > ReentrantLock l = locks.get(key); > if (l != null) { > if (!l.isHeldByCurrentThread()) > throw new IllegalStateException("Lock for [" + key + "] not held by current thread " + Thread.currentThread()); > while (l.isHeldByCurrentThread()) > unlock(l, lockOwner); > if (!l.hasQueuedThreads()) > locks.remove(key); > } > else > throw new IllegalStateException("No lock for [" + key + ']'); > } > {noformat} > The main improvement is that locks are not removed from the concurrent map as long as other threads are waiting on that lock. > If the lock is removed from the map while other threads are waiting for it, they may run into timeouts and force TimeoutExceptions to the client. > The above methods "paints more dots per second" - means: it gives a better throughput for concurrent accesses to the same key. > The re-implemented method should also fix some replication timeout exceptions. > Please, please add this to 5.1.7, if possible. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4484) Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4484?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4484: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1116969|https://bugzilla.redhat.com/show_bug.cgi?id=1116969] from NEW to MODIFIED > Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command > ------------------------------------------------------------------------ > > Key: ISPN-4484 > URL: https://issues.jboss.org/browse/ISPN-4484 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Core, State Transfer > Affects Versions: 6.0.2.Final > Reporter: Dan Berindei > Assignee: Dan Berindei > Priority: Critical > Fix For: 7.0.0.Alpha5 > > > This appeared during the 32-nodes elasticity test in the Hyperion environment. > Just as apex947 left, it started a rebalance, which apex948 dutifully cancelled as it became the new coordinator. apex949 had already requested segments from apex959, so it sent a StateRequestCommand(CANCEL_STATE_TRANSFER) asynchronously to apex959. Then apex948 started a new rebalance, and apex949 asked apex959 for the same segments. When apex959 finally received the cancel request, it didn't check the topology id and it incorrectly cancelled the outbound transfer to apex949. > The solution would be to verify the topology id in the CANCEL_STATE_TRANSFER command before cancelling the transfer. I also think we can avoid sending the cancel command completely in this case, and only send it as we are about to stop. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4480) Messages sent to leavers can clog the JGroups bundler thread

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4480?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4480: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1116965|https://bugzilla.redhat.com/show_bug.cgi?id=1116965] from NEW to MODIFIED > Messages sent to leavers can clog the JGroups bundler thread > ------------------------------------------------------------ > > Key: ISPN-4480 > URL: https://issues.jboss.org/browse/ISPN-4480 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Core > Affects Versions: 6.0.2.Final > Reporter: Dan Berindei > Assignee: Dan Berindei > > In a stress test that repeatedly kills nodes while performing read/write operations, the TransferQueueBundler thread seems to spend a lot of time waiting for physical addresses: > {noformat} > 06:40:10,316 WARN [org.radargun.utils.Utils] (pool-5-thread-1) Stack for thread TransferQueueBundler,default,apex953-14666: > java.lang.Thread.sleep(Native Method) > org.jgroups.util.Util.sleep(Util.java:1504) > org.jgroups.util.Util.sleepRandom(Util.java:1574) > org.jgroups.protocols.TP.sendToSingleMember(TP.java:1685) > org.jgroups.protocols.TP.doSend(TP.java:1670) > org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2476) > org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2392) > org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2383) > java.lang.Thread.run(Thread.java:744) > {noformat} > There are 2 bugs related to this already fixed in JGroups 3.5.0.Beta2+: JGRP-1814, JGRP-1815 > There is also a special case where the physical address could be removed from the cache too soon, exacerbating the effect of JGRP-1815: JGRP-1858 > We can work around the problem by changing the JGroups configuration: > * TP.logical_addr_cache_expiration=86400000 > ** Only expire addresses after 1 day > * TP.physical_addr_max_fetch_attempts=1 > ** Sleep for only 20ms waiting for the physical address (default 3 - 1500ms) > * UNICAST3_conn_close_timeout=10000 > ** Drop the pending messages to leavers sooner -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4154) Cancelled segment transfer causes future entry transfer to be ignored

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4154?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4154: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1116963|https://bugzilla.redhat.com/show_bug.cgi?id=1116963] from NEW to MODIFIED > Cancelled segment transfer causes future entry transfer to be ignored > --------------------------------------------------------------------- > > Key: ISPN-4154 > URL: https://issues.jboss.org/browse/ISPN-4154 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: State Transfer > Affects Versions: 7.0.0.Alpha1 > Reporter: Radim Vansa > Assignee: Dan Berindei > Priority: Critical > > Distributed transactional cache. > 1) Coordinator is gracefully leaving the cluster, sends a REBALANCE_START with topologyId 14, ST begins. > 2) Node receives chunk from segment X, writes entry K=V to the container. > 3) New coordinator jumps in with CH_UPDATE topology 16 > 4) Node receives CANCEL_STATE_TRANSFER and cancels transfer of segment X, invalidating K. In CommitManager, this operation is tracked and DiscardPolicy is set to DISCARD_STATE_TRANSFER for key K. > 5) New coordinator starts rebalance with topology 17 > 6) Node starts new ST for segment X > 7) Node receives the X: K=V, but in CommitManager it finds out that the policy is set to DISCARD_STATE_TRANSFER and ignores this update. > Result: entry value is lost on some node. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

10 years, 6 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues July 2014