July 2014 - infinispan-dev - Jboss List Archives

by Sanne Grinovero

I see the ParserRegistry was changed quite a bit; in Infinispan 6 it allowed to specify a different classloader for some operations, now it only takes a classloader during construction time. For WildFly/JBoss users, it is quite common that the configuration file we want parsed is not in the same classloader that the ParserRegistry needs to lookup its own parser components (as its design uses the ServiceLoader to discover components of the parser). This is especially true when Infinispan is not … [View More]

10 years, 5 months

4
11
0 / 0

minutes from the monitoring&management meeting

by Mircea Markus

Hi, Tristan, Sanne, Gustavo and I meetlast week to discuss a) Infinispan usability and b) monitoring and management. Minutes attached. https://docs.google.com/document/d/1dIxH0xTiYBHH6_nkqybc13_zzW9gMIcaF_GX5... Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 7 months

6
22
0 / 0

Benchmarking Hadoop/Infinispan M/R implementation

by Vladimir Blagojevic

Pedro/Gustavo, How do you plan to benchmark our Hadoop implementation? It seems TeraSort benchmark suite is an interesting option. Maybe not using 1 TB data set right away, but eventually, why not? Especially now that we can easily run 500 nodes cluster on GCE. I would love to see if we can, when you guys start benchmarking our Hadoop impl, give TeraSort a run on a regular Map/Reduce implementation as well. What do you think? Vladimir [1] http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testi...

10 years, 8 months

2
2
0 / 0

DIST-SYNC, put(), a problem and a solution

by Bela Ban

Hi guys, sorry for the long post, but I do think I ran into an important problem and we need to fix it ... :-) I've spent the last couple of days running the IspnPerfTest [1] perftest on Google Compute Engine (GCE), and I've run into a problem with Infinispan. It is a design problem and can be mitigated by sizing thread pools correctly, but cannot be eliminated entirely. Symptom: -------- IspnPerfTest has every node in a cluster perform 20'000 requests on keys in range [1..20000]. 80% … [View More]of the requests are reads and 20% writes. By default, we have 25 requester threads per node and 100 nodes in a cluster, so a total of 2500 requester threads. The cache used is NON-TRANSACTIONAL / dist-sync / 2 owners: <namedCache name="clusteredCache"> <clustering mode="distribution"> <stateTransfer awaitInitialTransfer="true"/> <hash numOwners="2"/> <sync replTimeout="20000"/> </clustering> <transaction transactionMode="NON_TRANSACTIONAL" useEagerLocking="true" eagerLockSingleNode="true" /> <locking lockAcquisitionTimeout="5000" concurrencyLevel="1000" isolationLevel="READ_COMMITTED" useLockStriping="false" /> </namedCache> It has 2 owners, a lock acquisition timeout of 5s and a repl timeout of 20s. Lock stripting is off, so we have 1 lock per key. When I run the test, I always get errors like those below: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [19386] for requestor [Thread[invoker-3,5,main]]! Lock held by [Thread[OOB-194,ispn-perf-test,m5.1,5,main]] and org.infinispan.util.concurrent.TimeoutException: Node m8.1 timed out Investigation: ------------ When I looked at UNICAST3, I saw a lot of missing messages on the receive side and unacked messages on the send side. This caused me to look into the (mainly OOB) thread pools and - voila - maxed out ! I learned from Pedro that the Infinispan internal thread pool (with a default of 32 threads) can be configured, so I increased it to 300 and increased the OOB pools as well. This mitigated the problem somewhat, but when I increased the requester threads to 100, I had the same problem again. Apparently, the Infinispan internal thread pool uses a rejection policy of "run" and thus uses the JGroups (OOB) thread when exhausted. I learned (from Pedro and Mircea) that GETs and PUTs work as follows in dist-sync / 2 owners: - GETs are sent to the primary and backup owners and the first response received is returned to the caller. No locks are acquired, so GETs shouldn't cause problems. - A PUT(K) is sent to the primary owner of K - The primary owner (1) locks K (2) updates the backup owner synchronously *while holding the lock* (3) releases the lock Hypothesis ---------- (2) above is done while holding the lock. The sync update of the backup owner is done with the lock held to guarantee that the primary and backup owner of K have the same values for K. However, the sync update *inside the lock scope* slows things down (can it also lead to deadlocks?); there's the risk that the request is dropped due to a full incoming thread pool, or that the response is not received because of the same, or that the locking at the backup owner blocks for some time. If we have many threads modifying the same key, then we have a backlog of locking work against that key. Say we have 100 requester threads and a 100 node cluster. This means that we have 10'000 threads accessing keys; with 2'000 writers there's a big chance that some writers pick the same key at the same time. For example, if we have 100 threads accessing key K and it takes 3ms to replicate K to the backup owner, then the last of the 100 threads waits ~300ms before it gets a chance to lock K on the primary owner and replicate it as well. Just a small hiccup in sending the PUT to the primary owner, sending the modification to the backup owner, waitting for the response, or GC, and the delay will quickly become bigger. Verification ---------- To verify the above, I set numOwners to 1. This means that the primary owner of K does *not* send the modification to the backup owner, it only locks K, modifies K and unlocks K again. I ran the IspnPerfTest again on 100 nodes, with 25 requesters, and NO PROBLEM ! I then increased the requesters to 100, 150 and 200 and the test completed flawlessly ! Performance was around *40'000 requests per node per sec* on 4-core boxes ! Root cause --------- ******************* The root cause is the sync RPC of K to the backup owner(s) of K while the primary owner holds the lock for K. ******************* This causes a backlog of threads waiting for the lock and that backlog can grow to exhaust the thread pools. First the Infinispan internal thread pool, then the JGroups OOB thread pool. The latter causes retransmissions to get dropped, which compounds the problem... Goal ---- The goal is to make sure that primary and backup owner(s) of K have the same value for K. Simply sending the modification to the backup owner(s) asynchronously won't guarantee this, as modification messages might get processed out of order as they're OOB ! Suggested solution ---------------- The modification RPC needs to be invoked *outside of the lock scope*: - lock K - modify K - unlock K - send modification to backup owner(s) // outside the lock scope The primary owner puts the modification of K into a queue from where a separate thread/task removes it. The thread then invokes the PUT(K) on the backup owner(s). The queue has the modified keys in FIFO order, so the modifications arrive at the backup owner(s) in the right order. This requires that the way GET is implemented changes slightly: instead of invoking a GET on all owners of K, we only invoke it on the primary owner, then the next-in-line etc. The reason for this is that the backup owner(s) may not yet have received the modification of K. This is a better impl anyway (we discussed this before) becuse it generates less traffic; in the normal case, all but 1 GET requests are unnecessary. Improvement ----------- The above solution can be simplified and even made more efficient. Re-using concepts from IRAC [2], we can simply store the modified *keys* in the modification queue. The modification replication thread removes the key, gets the current value and invokes a PUT/REMOVE on the backup owner(s). Even better: a key is only ever added *once*, so if we have [5,2,17,3], adding key 2 is a no-op because the processing of key 2 (in second position in the queue) will fetch the up-to-date value anyway ! Misc ---- - Could we possibly use total order to send the updates in TO ? TBD (Pedro?) Thoughts ? [1] https://github.com/belaban/IspnPerfTest [2] https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-... -- Bela Ban, JGroups lead (http://www.jgroups.org) [View Less]

10 years, 8 months

5
20
0 / 0

Cache.size() on distributed caches?

by Alan Field

Hey, I have been looking at adding the ability to get the total size of a cache in RadarGun. The first implementation I coded used the distributed iterators in Infinispan 7.[1] I then realized that implementing getTotalSize() method using a distributed executor would allow the code in versions back to Infinispan 5.2. I have the code written, and I have been running some Jenkins jobs with Infinispan 6.0.1 Final to verify that the results are correct.[2] I use the RandomData stage to put data in … [View More]the cache. Here is what it writes in the log: 04:11:59,573 INFO [org.radargun.stages.cache.RandomDataStage] (main) Received responses from all 4 slaves. Durations [0 = 17.04 minutes, 1 = 18.36 minutes, 2 = 18.44 minutes, 3 = 18.58 minutes] 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Filled cache with String objects totaling 25% of the Java heap 04:11:59,574 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 0 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44900952 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 1 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44914319 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 2 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44906729 04:11:59,575 INFO [org.radargun.stages.cache.RandomDataStage] (main) Slave 3 wrote 479352 values to the cache with a total size of 958,704 kb; targetMemoryUse = 1,022,368 kb; countOfWordsInData = 44908687 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) The cache contains 1917408 values with a total size of 3,834,816 kb 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) 100 words were generated with a maximum length of 20 characters 04:11:59,576 INFO [org.radargun.stages.cache.RandomDataStage] (main) -------------------- These are the outputs from my getTotalSize() code: 04:11:59,591 INFO [org.radargun.service.Infinispan53CacheInfo] (main) org.radargun.service.Infinispan53CacheInfo$Cache.getTotalSize() for cache testCache 04:12:12,094 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.size() = 1917408 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().size() = 1917408 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getAdvancedCache().getCacheConfiguration().clustering().hash().numOwners() = 2 04:12:26,283 INFO [org.radargun.service.Infinispan53CacheInfo] (main) cache.getCacheManager().getMembers().size() = 4 04:12:41,955 INFO [org.radargun.stages.cache.ClearCacheStage] (main) Cache size = 3834800 The "Cache size =" message is from the results of my distributed executor, and the other messages are informational. These outputs show that calling cache size on a distributed cache returns the size of the entire cache including any passivated entries, not just the size of the cache on the local node. This breaks the code of my distributed executor, but mostly makes it unnecessary if I can just call cache.size(). Is this an expected change in behavior? Thanks, Alan [1] https://github.com/radargun/radargun/blob/master/plugins/infinispan70/src... [2] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-radargun-gettota... -- Alan Field Principal Quality Engineer - JBoss Data Grid T: (919) 890-8932 | Ext. 8148932 [View Less]

10 years, 8 months

4
6
0 / 0

Early Access builds for JDK 9 b24, JDK 8u20 b23 are available on java.net

by Rory O'Donnell Oracle, Dublin Ireland

Hi Galder, Early Access builds for JDK 9 b24 <https://jdk9.java.net/download/> and JDK 8u20 b23 <https://jdk8.java.net/download.html> are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. Rgds, Rory -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland

10 years, 8 months

1
0
0 / 0

JPA & OSGi.. still a long way to go.

by Sanne Grinovero

Hi all, I just noticed that "ISPN-4276 - Make JPA cache store work in Karaf" was resolved. I trust that a single instance might work now, but we need to take into considerations some limitations of running Hibernate in OSGi, in particular the caveats documented here: http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch17.html#d5e... The classloader being overriden by an OSGi deployment, it overrides a static which affects all instances of Hibernate running in a specified … [View More]

10 years, 8 months

2
1
0 / 0

A question and an observation

by Bela Ban

1: Observation: ------------- In my Infinispan perf test (IspnPerfTest), I used cache.getAdvancedCache().withFlags(...).put(key,value) in a tight loop. I've always thought that withFlags() was a fast operation, *but this is not the case* !! Once I changed this and predefined the 2 caches (sync and async) at the start, outside the loop, things got 10x faster ! So please change this if you made the same mistake ! 2. Question: ----------- In Infinispan 6, I defined my custom transport as … [View More]

10 years, 8 months

6
16
0 / 0

Infinispan 7.0.0.Alpha5 has been released

by Vladimir Blagojevic

Dear all, I am proud to announce that Infinispan 7.0.0.Alpha5 is out. There are numerous improvements and fixes included in this release. It is best to refer to release notes [1] for details. Regards, Vladimir [1] https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799&versi...

10 years, 8 months

3
4
0 / 0

Propagate the schema to the cachestore

by Emmanuel Bernard

A remark by Divya made me think of something. With Infinispan moving to the direction of ProtoBuf and schemas, cache store would greatly benefit from receiving in one shape or another that schema to transform a blob into something more structure depending on the underlying capability of the datastore. Has anyone explored that angle? Emmanuel

10 years, 8 months

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev July 2014