Dropping support for older versions of Apache Lucene
by Sanne Grinovero
All,
please note I've sent the pull request which removes support for
Apache Lucene versions 2.x and 3.x [1]
The currently recommended version is Apache Lucene 4.8.1; if someone
has really strong needs to keep support for Lucene 3 please speak up
now as the patch is rather large and complex.
However the primary reason for us to drop this, is that it makes it
very tricky and time consuming to develop a better integration with
Lucene 4+ so I don't think it will be possible to maintain the support
for the older version, unless someone volunteers for it.
Note that the Query engine and all such extras we built on it, were
already migrated to Lucene 4 so there is no loss of other
functionality.
Regards,
Sanne
1 - https://github.com/infinispan/infinispan/pull/2676
10 years, 8 months
Dangers of getCacheEntry (ISPN-4424)
by Galder Zamarreño
Hi all,
Last few days I’ve been having fun fixing [1] and the solution I’ve got to has some links to [2].
The problem is that when getCacheEntry() is called, the entry returned could be partially correct. Up until now, we’d send back a cache entry back to the client that the user could not modify, but it was linked to the data container cache entry, which is mutable and hence could change on the fly. As a result of this, the value could sometimes be updated but the version would belong to a previous value for example.
A first attempt to fix it was to provide a true immutable cache entry view which was initialised on construction, but even here there was the chance of having value and version mistmatch, because getCacheEntry does not acquire locks, so an ongoing update could result in the cache entry being constructed when the update was half way, so, it would have the right value but the old version information.
All this didn’t work well with the replaceIfUmodified logic in [3]. For example, you could easily get this situation:
Current Cache contents: key=k1, value=A, version=1
T1: Hot Rod client calls: replace(k1, value=B, old-version=1)
T2: Hot Rod client calls: replace(k1, value=B, old-version=1)
T1: Server calls getCacheEntry and retrieves value=A,version=1
T1: Cached and stream versions match (both are 1), so call replace(k1, old-value=A, new-value=B)
T1: Server updates value to B but version is still 1
T2: Server calls getCacheEntry and retrieves value=B,version=1 (wrong!)
T1: Cached and stream versions match (both are 1), so call replace(k1, old-value=B, new-value=B)
T1: Server updates version to 2
T1: Returns Success, replace worked
T2: Returns Success, replace worked
The end result is that cache contained B, but both replaces returned true. This is wrong and would fail to consistenly keep a counter in the value part. Imagine the value being a counter of number of times the replace succeeded. In the test developed by Takayoshi, N times replace() would return true, but the final value would be N-1 or N-2, or N-5 :|
To fix this, I’ve been working with Dan on some solutions and we’ve taken inspiration of the new requirements appearing as a result of ISPN-2956. To be able to deal with partial application of conditional operations properly, transactional caches are needed. So, the solution that can be seen in [4] takes that, and creates a transaction around the replaceIfUmodified and forces the getCacheEntry() call to acquire lock via FORCE_WRITE_LOCK flag. This solves the issue explained above, but of course it has an impact of the performance. The test now runs about 1.5 or 2 times slower.
This is probably the best that we can do in the 7.0 time frame, but there’s several things that could improve this:
1. True immutable entries in the cache. If the entries in the cache were truly immutable, there would be no risk of sending back a partially correct entry back to the client.
2. A cache replace method that does not compare objects based on equality (the current replace()), but a replace method that takes a function. The function could compare that the old entry’s version and the cached entry’s version match. The function would only be executed inside the inner container, with all the locks held properly. I already hinted something similar in [5].
Finally, this was not a problem when the value stored in Hot Rod was a ByteArrayValue wrapping the byte array and the version, because the value was treated atomically, and in hindsight, maybe adding getCacheEntry might have been premature, but this method has proven useful for other use cases too (rolling upgrades…etc).
Cheers,
[1] https://issues.jboss.org/browse/ISPN-4424
[2] https://issues.jboss.org/browse/ISPN-2956
[3] https://github.com/infinispan/infinispan/blob/master/server/core/src/main...
[4] https://github.com/galderz/infinispan/tree/t_4424
[5] https://github.com/infinispan/infinispan/blob/master/core/src/main/java/o...
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
10 years, 8 months
Test failures
by Sanne Grinovero
I'm having several failures, these are blocking our progress on Query.
Should I disable them all?
Sample output of three different runs:
Failed tests:
ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread
locals st...
StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPutIfAbsent:104->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265
» Runtime
Tests run: 5430, Failures: 2, Errors: 0, Skipped: 0
Failed tests:
ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread
locals st...
StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPutIfAbsent:104->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265
» Runtime
L1StateTransferOverwriteTest>BaseTxStateTransferOverwriteTest.testStateTransferInBetweenPrepareCommitWithPut:84->BaseTxStateTransferOverwriteTest.doStateTransferInBetweenPrepareCommit:265
» Runtime
Tests run: 5430, Failures: 3, Errors: 0, Skipped: 0
Failed tests:
ThreadLocalLeakTest.testCheckThreadLocalLeaks:87 IllegalState Thread
locals st...
CacheNotifierImplInitialTransferDistTest.testCreateAfterIterationBeganAndSegmentNotCompleteValueOwnerClustered:611->testIterationBeganAndSegmentNotComplete:510
expected [11] but found [6]
Tests run: 5430, Failures: 2, Errors: 0, Skipped: 0
10 years, 8 months
Re: [infinispan-dev] CD datastore inserts slow
by Sanne Grinovero
Hi all,
some context, as this conversation started offlist:
performance problems with CapeDwarf's storage system, as you probably
know based on Infinispan.
So, storing some thousand entities in a single transaction takes a
little more than 1 second, while running each put operation in a
single transaction takes ~50 seconds.
I'm afraid that's nothing new?
When you use indexing, this is expected: it shines much better in
parallel operations than in sequential, individual transactions.
I expect some of the changes we have planned for Infinispan 7 to
improve on this, but it's unlikely to get faster than a single order
of magnitude.
These are some issues which affected my previous performance tests:
https://issues.jboss.org/browse/ISPN-3690
https://issues.jboss.org/browse/ISPN-1764
https://issues.jboss.org/browse/ISPN-3831
https://issues.jboss.org/browse/ISPN-3891
https://issues.jboss.org/browse/ISPN-3905
And of course the Hibernate Search and Lucene upgrades ;-)
(in no particular order, as relevance depends on the test and on what
metric I'm looking to improve)
If you need better figures the best thing we can do is measure your
specific test, and see if we need to add further tasks to my wishlist
or if it's enough to prioritize these.
Currently I think if you expect to use a transaction for each
operation, there is nothing wrong in CD as unfortunately these figures
match my expectations on a non-tuned system.. you could tune things a
bit, like better sizing of packets for JGroups and network stack, but
I wouldn't expect much higher figures without patching things in
Infinispan.
For the record, I'm more concerned that storing "a few thousand"
entities in a single TX takes a full second: that's not expected, but
my guess is that in this specific case you're not warming up the JVM
nor repeating the test. Having batching enabled, I'd expect this to be
in the order of millions / second.
Sanne
Previous discussion - sorry, the formatting got crazy:
> 17:20:17,640 INFO [com.google.appengine.tck.benchmark.ObjectifyBenchmarkTest] (default task-1) >>>> Save [2000] time: 1149ms
> 17:21:55,659 INFO [com.google.appengine.tck.benchmark.ObjectifyBenchmarkTest] (default task-1) >>>> Save [2000] time: 50675ms
> The first time is when whole 2000 is inserted in single Tx,
> the other one is when there is no existing Tx.
> OK, in reality doing non-Tx for whole batch is a no go,
> so on that side we're good.
> But I would still like to see non-Tx do much better then 1sec vs. 50sec (for 2k entities).
> -Ales
> On 28 May 2014, at 17:20, Mircea Markus <mmarkus(a)redhat.com> wrote:
> Can we isolate the tests somehow?
> On May 28, 2014, at 15:53, Ales Justin <ales.justin(a)gmail.com> wrote:
> Objectify is a popular GAE ORM, so it does plenty of persistence magic.
> Hence my simple answer is no.
> But it should be easy to take latest CapeDwarf release -- 2.0.0.CR2,
> and run + debug things, and see why it's so slow.
> Either Terry or me is gonna add this test to GAE TCK.
> I'll shoot an email on how to run this against CapeDwarf once it's added.
> Looking at this:
> * https://gist.github.com/alesj/1d24ad24dfbef8b5e12c
> It looks like a problem of Tx usage -- Tx per Cache::put.
> This snapshot only shows the time spent within Infinispan. Out of this time, 9% is spent indeed within cache.put. How much time does the test spend doing non-infinispan related activities? Where I'm trying to get to is: are we sure Infinispan is the culprit?
> It could be that our default config isn't suited to handle such things.
> So we're open for any suggestions.
> ---
> I've added this test to GAE TCK:
> * https://github.com/GoogleCloudPlatform/appengine-tck/blob/master/core/ben...
> You need CapeDwarf 2.0.0.CR2:
> * http://capedwarf.org/downloads/
> And clone GAE TCK:
> * https://github.com/GoogleCloudPlatform/appengine-tck
> Run CapeDwarf:
> * https://github.com/capedwarf/capedwarf-blue/blob/master/README.md (see (5))
> Then run the ObjectifyBenchmarkTest:
> * cd <GAE_TCK_HOME>
> * mvn clean install
> * cd core/benchmark
> * mvn clean install -Pcapedwarf,benchmark
> Ping me for any issues.
> Thanks!
> -Ales
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
10 years, 8 months
Reliability of return values
by Radim Vansa
Hi,
recently I've stumbled upon one already expected behaviour (one instance
is [1]), but which did not got much attention.
In non-tx cache, when the primary owner fails after the request has been
replicated to backup owner, the request is retried in the new topology.
Then, the operation is executed on the new primary (the previous
backup). The outcome has been already fixed in [2], but the return value
may be wrong. For example, when we do a put, the return value for the
second attempt will be the currently inserted value (although the entry
was just created). Same situation may happen for other operations.
Currently, it's not possible to return the correct value (because it has
already been overwritten and we don't keep a history of values), but
shouldn't we rather throw an exception if we were not able to fulfil the
API contract?
Radim
[1] https://issues.jboss.org/browse/ISPN-2956
[2] https://issues.jboss.org/browse/ISPN-3422
--
Radim Vansa <rvansa(a)redhat.com>
JBoss DataGrid QA
10 years, 8 months
stabilize test suite
by Mircea Markus
Hi,
The test suite is allover, so please STOP integrating into master till the suite gets green.
Also please start looking into the failures: in order not to overlap, reply to this email with what you're looking at.
Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)
10 years, 9 months
"Lightweight" security
by Tristan Tarrant
Hi all,
This morning I have issued a rather large PR [1] which contains a new
way to handle security. Because of the implications, I thought it best
to explain the rationale behind it here.
When a cache has been configured with authorization, retrieving such a
cache returns an instance of SecureCache. SecureCache is a simple
wrapper around a Cache which checks whether the "current user" has
enough permissions to perform an operation.
For us the "current user" is a Subject associated with the
AccessControlContext, which basically means Subject.doAs(subject, ...).
Occasionally, however, we have internal code which needs to do things
with caches with "elevated privileges". As an example, the distexec code
needs to check the cache configuration and obtain the RpcManager,
operations which the current Subject might not have.
In my initial implementation, this was done using the
AccessController.doPrivileged(...) method which requires the presence of
a SecurityManager to be able to grant the privileges to trusted code.
Unfortunately a SecurityManager is a bit overkill (usability-wise and
performance-wise) for our use-case, since its purpose is mainly to avoid
running untrusted code (think an applet).
But the typical "authorization"-enabled Infinispan application is
running trusted code, and it just needs a way to identify one Subject
from another when performing cache operations.
So, I present to you the new org.infinispan.security.Security class
which uses a much simpler approach: it identifies "privileged" code
simply by its package name, using the caller stack in a way similar to
what logging frameworks do. While this approach is "insecure" in the
usual meaning of the word, since it can be easily subverted, it covers
the common use-case in a much lighter and faster way.
Obviously using a SecurityManager, if so inclined, is still supported
and will be used to validate privileges if present.
Tristan
[1] https://github.com/infinispan/infinispan/pull/2629
10 years, 9 months
More commits, or squashing?
by Sanne Grinovero
A discussion which could interest more of you:
https://github.com/infinispan/infinispan/pull/2613#issuecomment-45760469
There is no golden rule, but I believe it's highly desirable to give
the sequence a logical flow which makes sense.
In this specific situation, it also helped myself as I've been making
slow progress fragmented over some months on it.
For example, this single commit is a single sed script, whose source
is included in the commit:
https://github.com/Sanne/infinispan/commit/c4b22038761d38da21480d87a37abc...
Guess what I need to do when rebasing?
Also if you keep the flow "logical" and separate file rename/move from
changes, git will help you massively on resolving conflicts on any
change made concurrently on the same files. For example the WildFly 81
PR would have been much less painful, and I think would have been
integrated way earlier as it wouldn't look as scary..
Finally, I want to win the commits competitions of course ;-)
https://github.com/Sanne
Cheers,
Sanne
10 years, 9 months
Query integration tests performance and JGroups
by Gustavo Fernandes
Hi,
While investigating some CI failures in query (timeouts), I've narrowed
down the issue to the Jgroups protocol stack being used.
Running a 'mvn clean install' in the query/ module takes about 6min
(when timeout does not happen). If I run instead:
mvn -Dtest.protocol.jgroups=udp clean install
Time goes down to around 50s. Recent changes in core's jgoups-tcp.xml
for the tests were the removal of the loopback=true and the modification
of the bundler_type, but they don't seem to affect the outcome.
FYI, taking a single test and stripping down from it everything but the
cluster formation and data population (5 objects) leads to the cpu
hotspot below, and it takes almost 1 minute
I'd be happy to change the query tests to udp, but first would like to
hear your thoughts about this issue
Gustavo
+----------------------------------------------------------------------------------+------------------+--------------------+
| Name | Time (ms) | Invocation Count |
+----------------------------------------------------------------------------------+------------------+--------------------+
| +---java.net.SocketInputStream.read(byte[], int, int, int) | 101,742 100 % | 4,564 |
| | | | |
| +---java.net.SocketInputStream.read(byte[], int, int) | | |
| | | | |
| +---java.io.BufferedInputStream.fill() | | |
| | | | |
| +---java.io.BufferedInputStream.read() | | |
| | | | |
| +---java.io.DataInputStream.readInt() | | |
| | | | |
| +---org.jgroups.blocks.TCPConnectionMap$TCPConnection$Receiver.run() | | |
| | | | |
| +---java.lang.Thread.run() | | |
+----------------------------------------------------------------------------------+------------------+--------------------+
10 years, 9 months
Leaking ThreadLocal
by Sanne Grinovero
Hi all,
as spotted by the testsuite - which was rightfully complaining - the
CHMv8 which we backported from the JDK has a static ThreadLocal that
it uses internally.
Our copy is having the same, and so Dan enhanced the "ThreadLocal
detection" utility in the testsuite to "allow" specific threadlocal
instances; nice as at least infinispan-core now has no more failures.
Next the bigger question: this might be ok in the JDK - which
obviously doesn't get redeployed - but is it ok in our stuff?
I guess the answer is no.. would you all agree that the CHM instances
should share an instance whose lifecycle has to be managed by the
CacheManager ?
Cheers,
Sanne
10 years, 9 months