June 2014 - infinispan-dev - Jboss List Archives

Dropping support for older versions of Apache Lucene

by Sanne Grinovero

All, please note I've sent the pull request which removes support for Apache Lucene versions 2.x and 3.x [1] The currently recommended version is Apache Lucene 4.8.1; if someone has really strong needs to keep support for Lucene 3 please speak up now as the patch is rather large and complex. However the primary reason for us to drop this, is that it makes it very tricky and time consuming to develop a better integration with Lucene 4+ so I don't think it will be possible to maintain the support for the older version, unless someone volunteers for it. Note that the Query engine and all such extras we built on it, were already migrated to Lucene 4 so there is no loss of other functionality. Regards, Sanne 1 - https://github.com/infinispan/infinispan/pull/2676

10 years, 8 months

1
0
0 / 0

Dangers of getCacheEntry (ISPN-4424)

by Galder Zamarreño

Hi all, Last few days I’ve been having fun fixing [1] and the solution I’ve got to has some links to [2]. The problem is that when getCacheEntry() is called, the entry returned could be partially correct. Up until now, we’d send back a cache entry back to the client that the user could not modify, but it was linked to the data container cache entry, which is mutable and hence could change on the fly. As a result of this, the value could sometimes be updated but the version would belong to a previous value for example. A first attempt to fix it was to provide a true immutable cache entry view which was initialised on construction, but even here there was the chance of having value and version mistmatch, because getCacheEntry does not acquire locks, so an ongoing update could result in the cache entry being constructed when the update was half way, so, it would have the right value but the old version information. All this didn’t work well with the replaceIfUmodified logic in [3]. For example, you could easily get this situation: Current Cache contents: key=k1, value=A, version=1 T1: Hot Rod client calls: replace(k1, value=B, old-version=1) T2: Hot Rod client calls: replace(k1, value=B, old-version=1) T1: Server calls getCacheEntry and retrieves value=A,version=1 T1: Cached and stream versions match (both are 1), so call replace(k1, old-value=A, new-value=B) T1: Server updates value to B but version is still 1 T2: Server calls getCacheEntry and retrieves value=B,version=1 (wrong!) T1: Cached and stream versions match (both are 1), so call replace(k1, old-value=B, new-value=B) T1: Server updates version to 2 T1: Returns Success, replace worked T2: Returns Success, replace worked The end result is that cache contained B, but both replaces returned true. This is wrong and would fail to consistenly keep a counter in the value part. Imagine the value being a counter of number of times the replace succeeded. In the test developed by Takayoshi, N times replace() would return true, but the final value would be N-1 or N-2, or N-5 :| To fix this, I’ve been working with Dan on some solutions and we’ve taken inspiration of the new requirements appearing as a result of ISPN-2956. To be able to deal with partial application of conditional operations properly, transactional caches are needed. So, the solution that can be seen in [4] takes that, and creates a transaction around the replaceIfUmodified and forces the getCacheEntry() call to acquire lock via FORCE_WRITE_LOCK flag. This solves the issue explained above, but of course it has an impact of the performance. The test now runs about 1.5 or 2 times slower. This is probably the best that we can do in the 7.0 time frame, but there’s several things that could improve this: 1. True immutable entries in the cache. If the entries in the cache were truly immutable, there would be no risk of sending back a partially correct entry back to the client. 2. A cache replace method that does not compare objects based on equality (the current replace()), but a replace method that takes a function. The function could compare that the old entry’s version and the cached entry’s version match. The function would only be executed inside the inner container, with all the locks held properly. I already hinted something similar in [5]. Finally, this was not a problem when the value stored in Hot Rod was a ByteArrayValue wrapping the byte array and the version, because the value was treated atomically, and in hindsight, maybe adding getCacheEntry might have been premature, but this method has proven useful for other use cases too (rolling upgrades…etc). Cheers, [1] https://issues.jboss.org/browse/ISPN-4424 [2] https://issues.jboss.org/browse/ISPN-2956 [3] https://github.com/infinispan/infinispan/blob/master/server/core/src/main... [4] https://github.com/galderz/infinispan/tree/t_4424 [5] https://github.com/infinispan/infinispan/blob/master/core/src/main/java/o... -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz

10 years, 8 months

5
10
0 / 0

Test failures

by Sanne Grinovero

I'm having several failures, these are blocking our progress on Query. Should I disable them all? Sample output of three different runs: Failed tests: ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread locals st... StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPutIfAbsent:104->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265 » Runtime Tests run: 5430, Failures: 2, Errors: 0, Skipped: 0 Failed tests: ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread locals st... StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPutIfAbsent:104->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265 » Runtime L1StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPut:84->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265 » Runtime Tests run: 5430, Failures: 3, Errors: 0, Skipped: 0 Failed tests: ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread locals st... CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered:611->testIterationBeganAndSegmentNotComplete:510 expected [11] but found [6] Tests run: 5430, Failures: 2, Errors: 0, Skipped: 0

10 years, 8 months

3
7
0 / 0

Re: [infinispan-dev] CD datastore inserts slow

by Sanne Grinovero

Hi all, some context, as this conversation started offlist: performance problems with CapeDwarf's storage system, as you probably know based on Infinispan. So, storing some thousand entities in a single transaction takes a little more than 1 second, while running each put operation in a single transaction takes ~50 seconds. I'm afraid that's nothing new? When you use indexing, this is expected: it shines much better in parallel operations than in sequential, individual transactions. I expect some of the changes we have planned for Infinispan 7 to improve on this, but it's unlikely to get faster than a single order of magnitude. These are some issues which affected my previous performance tests: https://issues.jboss.org/browse/ISPN-3690 https://issues.jboss.org/browse/ISPN-1764 https://issues.jboss.org/browse/ISPN-3831 https://issues.jboss.org/browse/ISPN-3891 https://issues.jboss.org/browse/ISPN-3905 And of course the Hibernate Search and Lucene upgrades ;-) (in no particular order, as relevance depends on the test and on what metric I'm looking to improve) If you need better figures the best thing we can do is measure your specific test, and see if we need to add further tasks to my wishlist or if it's enough to prioritize these. Currently I think if you expect to use a transaction for each operation, there is nothing wrong in CD as unfortunately these figures match my expectations on a non-tuned system.. you could tune things a bit, like better sizing of packets for JGroups and network stack, but I wouldn't expect much higher figures without patching things in Infinispan. For the record, I'm more concerned that storing "a few thousand" entities in a single TX takes a full second: that's not expected, but my guess is that in this specific case you're not warming up the JVM nor repeating the test. Having batching enabled, I'd expect this to be in the order of millions / second. Sanne Previous discussion - sorry, the formatting got crazy: > 17:20:17,640 INFO [com.google.appengine.tck.benchmark.ObjectifyBenchmarkTest] (default task-1) >>>> Save [2000] time: 1149ms > 17:21:55,659 INFO [com.google.appengine.tck.benchmark.ObjectifyBenchmarkTest] (default task-1) >>>> Save [2000] time: 50675ms > The first time is when whole 2000 is inserted in single Tx, > the other one is when there is no existing Tx. > OK, in reality doing non-Tx for whole batch is a no go, > so on that side we're good. > But I would still like to see non-Tx do much better then 1sec vs. 50sec (for 2k entities). > -Ales > On 28 May 2014, at 17:20, Mircea Markus <mmarkus(a)redhat.com> wrote: > Can we isolate the tests somehow? > On May 28, 2014, at 15:53, Ales Justin <ales.justin(a)gmail.com> wrote: > Objectify is a popular GAE ORM, so it does plenty of persistence magic. > Hence my simple answer is no. > But it should be easy to take latest CapeDwarf release -- 2.0.0.CR2, > and run + debug things, and see why it's so slow. > Either Terry or me is gonna add this test to GAE TCK. > I'll shoot an email on how to run this against CapeDwarf once it's added. > Looking at this: > * https://gist.github.com/alesj/1d24ad24dfbef8b5e12c > It looks like a problem of Tx usage -- Tx per Cache::put. > This snapshot only shows the time spent within Infinispan. Out of this time, 9% is spent indeed within cache.put. How much time does the test spend doing non-infinispan related activities? Where I'm trying to get to is: are we sure Infinispan is the culprit? > It could be that our default config isn't suited to handle such things. > So we're open for any suggestions. > --- > I've added this test to GAE TCK: > * https://github.com/GoogleCloudPlatform/appengine-tck/blob/master/core/ben... > You need CapeDwarf 2.0.0.CR2: > * http://capedwarf.org/downloads/ > And clone GAE TCK: > * https://github.com/GoogleCloudPlatform/appengine-tck > Run CapeDwarf: > * https://github.com/capedwarf/capedwarf-blue/blob/master/README.md (see (5)) > Then run the ObjectifyBenchmarkTest: > * cd <GAE_TCK_HOME> > * mvn clean install > * cd core/benchmark > * mvn clean install -Pcapedwarf,benchmark > Ping me for any issues. > Thanks! > -Ales > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org)

10 years, 8 months

2
7
0 / 0

Reliability of return values

by Radim Vansa

Hi, recently I've stumbled upon one already expected behaviour (one instance is [1]), but which did not got much attention. In non-tx cache, when the primary owner fails after the request has been replicated to backup owner, the request is retried in the new topology. Then, the operation is executed on the new primary (the previous backup). The outcome has been already fixed in [2], but the return value may be wrong. For example, when we do a put, the return value for the second attempt will be the currently inserted value (although the entry was just created). Same situation may happen for other operations. Currently, it's not possible to return the correct value (because it has already been overwritten and we don't keep a history of values), but shouldn't we rather throw an exception if we were not able to fulfil the API contract? Radim [1] https://issues.jboss.org/browse/ISPN-2956 [2] https://issues.jboss.org/browse/ISPN-3422 -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

10 years, 8 months

5
15
0 / 0

stabilize test suite

by Mircea Markus

Hi, The test suite is allover, so please STOP integrating into master till the suite gets green. Also please start looking into the failures: in order not to overlap, reply to this email with what you're looking at. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 9 months

5
8
0 / 0

"Lightweight" security

by Tristan Tarrant

Hi all, This morning I have issued a rather large PR [1] which contains a new way to handle security. Because of the implications, I thought it best to explain the rationale behind it here. When a cache has been configured with authorization, retrieving such a cache returns an instance of SecureCache. SecureCache is a simple wrapper around a Cache which checks whether the "current user" has enough permissions to perform an operation. For us the "current user" is a Subject associated with the AccessControlContext, which basically means Subject.doAs(subject, ...). Occasionally, however, we have internal code which needs to do things with caches with "elevated privileges". As an example, the distexec code needs to check the cache configuration and obtain the RpcManager, operations which the current Subject might not have. In my initial implementation, this was done using the AccessController.doPrivileged(...) method which requires the presence of a SecurityManager to be able to grant the privileges to trusted code. Unfortunately a SecurityManager is a bit overkill (usability-wise and performance-wise) for our use-case, since its purpose is mainly to avoid running untrusted code (think an applet). But the typical "authorization"-enabled Infinispan application is running trusted code, and it just needs a way to identify one Subject from another when performing cache operations. So, I present to you the new org.infinispan.security.Security class which uses a much simpler approach: it identifies "privileged" code simply by its package name, using the caller stack in a way similar to what logging frameworks do. While this approach is "insecure" in the usual meaning of the word, since it can be easily subverted, it covers the common use-case in a much lighter and faster way. Obviously using a SecurityManager, if so inclined, is still supported and will be used to validate privileges if present. Tristan [1] https://github.com/infinispan/infinispan/pull/2629

10 years, 9 months

1
0
0 / 0

More commits, or squashing?

by Sanne Grinovero

A discussion which could interest more of you: https://github.com/infinispan/infinispan/pull/2613#issuecomment-45760469 There is no golden rule, but I believe it's highly desirable to give the sequence a logical flow which makes sense. In this specific situation, it also helped myself as I've been making slow progress fragmented over some months on it. For example, this single commit is a single sed script, whose source is included in the commit: https://github.com/Sanne/infinispan/commit/c4b22038761d38da21480d87a37abc... Guess what I need to do when rebasing? Also if you keep the flow "logical" and separate file rename/move from changes, git will help you massively on resolving conflicts on any change made concurrently on the same files. For example the WildFly 81 PR would have been much less painful, and I think would have been integrated way earlier as it wouldn't look as scary.. Finally, I want to win the commits competitions of course ;-) https://github.com/Sanne Cheers, Sanne

10 years, 9 months

4
3
0 / 0

Query integration tests performance and JGroups

by Gustavo Fernandes

Hi, While investigating some CI failures in query (timeouts), I've narrowed down the issue to the Jgroups protocol stack being used. Running a 'mvn clean install' in the query/ module takes about 6min (when timeout does not happen). If I run instead: mvn -Dtest.protocol.jgroups=udp clean install Time goes down to around 50s. Recent changes in core's jgoups-tcp.xml for the tests were the removal of the loopback=true and the modification of the bundler_type, but they don't seem to affect the outcome. FYI, taking a single test and stripping down from it everything but the cluster formation and data population (5 objects) leads to the cpu hotspot below, and it takes almost 1 minute I'd be happy to change the query tests to udp, but first would like to hear your thoughts about this issue Gustavo +----------------------------------------------------------------------------------+------------------+--------------------+ | Name | Time (ms) | Invocation Count | +----------------------------------------------------------------------------------+------------------+--------------------+ | +---java.net.SocketInputStream.read(byte[], int, int, int) | 101,742 100 % | 4,564 | | | | | | | +---java.net.SocketInputStream.read(byte[], int, int) | | | | | | | | | +---java.io.BufferedInputStream.fill() | | | | | | | | | +---java.io.BufferedInputStream.read() | | | | | | | | | +---java.io.DataInputStream.readInt() | | | | | | | | | +---org.jgroups.blocks.TCPConnectionMap$TCPConnection$Receiver.run() | | | | | | | | | +---java.lang.Thread.run() | | | +----------------------------------------------------------------------------------+------------------+--------------------+

10 years, 9 months

3
3
0 / 0

Leaking ThreadLocal

by Sanne Grinovero

Hi all, as spotted by the testsuite - which was rightfully complaining - the CHMv8 which we backported from the JDK has a static ThreadLocal that it uses internally. Our copy is having the same, and so Dan enhanced the "ThreadLocal detection" utility in the testsuite to "allow" specific threadlocal instances; nice as at least infinispan-core now has no more failures. Next the bigger question: this might be ok in the JDK - which obviously doesn't get redeployed - but is it ok in our stuff? I guess the answer is no.. would you all agree that the CHM instances should share an instance whose lifecycle has to be managed by the CacheManager ? Cheers, Sanne

10 years, 9 months

2
2
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev June 2014