July 2015 - infinispan-dev - Jboss List Archives

Lucene 5 is coming: pitfalls to consider

by Sanne Grinovero

Hi all, the Hibernate Search branch upgrading to Apache Lucene 5.2.x is almost ready, but there are some drawbacks on top of the many nice efficiency improvements. # API changes The API changes are not too bad, and definitely an improvement. I'll provide a detailed list as usual in the Hibernate Search migration guide - for now let it suffice to know that it's an easy upgrade for end users, as long as they were just creating Query instances and not using the more powerful and complex stuff. # Sorting To sort on a field will require an UninvertingReader to wrap the cached IndexReaders, and the uninverting process is very inefficient. On top of that, the result of the uninverting process is not cacheable, so that will need to be repeated on each index, for each query which is executed. In short, I expect performance of sorted queries to be quite degraded in our first milestone using Lucene 5, and we'll have to discuss how to fix this. Needless to say, fixing this is a blocking requirement before we can consider the migration complete. Sorting will not need an UninvertingReader if the target field has been indexed as DocValues, but that implies: - we'll need an explicit, upfront (indexing time) flag to be set - we'll need to detect if the matching indexing options are compatible with the runtime query to skip the uninverting process This is mostly a job for Hibernate Search, but in terms of user experience it means you have to mark fields for "sortability" explicitly; will we need to extend the protobuf schema? Please make sure we'll just have to hook in existing metadata, we can't fix this after API freeze. # Filters We did some clever bitset level optimisations to merge multiple Filter instances and save memory to cache multiple filter instances, I had to drop that code as we don't deal with in-heap structures more but the design is about iterating off heap chunks of data, and resort on the more traditional Lucene stack for filtering. I couldn't measure the performance impact yet; it's a significantly different approach and while it sounds promising on paper, we'll need some help testing this. The Lucene team can generally be trusted to go in the better direction, but we'll have to verify if we're using it in the best way. # Analyzers It is no longer possible to override the field->analyzer mapping at runtime. We did expose this feature as a public API and I found a way to still do it, but it comes with a performance price tag. We'll soon deprecate this feature; if you can, start making sure there's no need for this in Infinispan as at some time in the near future we'll have to drop this, with no replacement. # Index encoding As usual the index encoding evolves and the easy solution is to rebuild it. Lucene 5 no longer ships with backwards compatible de-coders, but these are available as separate dependencies. If you feel the need to be able to read existing indexes, we should include these. (I'm including these as private dependencies in the Hibernate Search modules). Thanks, Sanne

10 years, 3 months

4
7
0 / 0

Redis infinispan cache store

by Simon Paulger

Hi, I'm interested in developing inifinispan integration with Redis for use in JBoss. Before working on JBoss, I first need to add the capability to Infinispan itself. Is this an enhancement that the infinispan community would be interested in? Regards, Simon

10 years, 4 months

3
8
0 / 0

Shared vs Non-Shared CacheStores

by Sanne Grinovero

I would like to propose a clear cut separation between our shared and non-shared CacheStores, in all terms such as: - Configuration options - Integration contracts (Split the CacheStore SPI) - Implementations - Terminology, to avoid any further confusion around valid configurations and sensible architectures We have loads of examples of users who get in trouble by configuring one incorrectly, but also there are plenty of efficiency improvements we could take advantage of by clearly splitting the integration points and the implementations in two categories. Not least, it's a very common and dangerous pitfall to assume that Infinispan is able to restore a consistent state after having stopped a DIST cluster which passivated into non-shared CacheStore instances, or even REPL clusters when they don't shutdown all at the same exact time (and "exact same time" is a strange concept at least..). We need to clarify the different options, tradeoffs and their consequences.. to users and ourselves, as a clearly defined use case will avoid bugs and simplify implementations. # The purpose of each I think that people should use a non-shared (local?) CacheStore for the sole purpose of expanding to storage capacity of each single node.. be it because you don't have enough memory at all, or be it because you prefer some extra safety margin because either your estimates are complex, or maybe because we live in a real world were the hashing function might not be perfect in practice. I hope we all agree that Infinispan should be able to take such situations with at worst a graceful performance degradatation, rather than complain sending OOMs to the admin and setting the service on strike. A Shared CacheStore is useful for very different purposes; primarily to implement a Cache on some other service - for example your (single, shared) RDBMs, a slow (or expensive) webservice your organization has to call frequently, etc.. Or it's useful even as a write-through cache on a similar service, maybe internal but not able to handle the high variation of load spikes which Infinsipan can handle better. Finally, a great use case is to have a consistent backup of all your data-grid content, possibly in some "reference" form such as JPA mapped entities. # Benefits of a Non-Shared A non-shared CacheStore implementor should be able to take advantage of *its purpose*, among the big ones I see: - Exclusive usage -> locking of a specific entry can be handled at datacontainer level, can simplify quite some internal code. - Reliability -> since a clustered node needs to wipe its state at reboot (after a crash), it's much simpler to code any such CacheStore to avoid any form of disk synch or persistance guarantees. - Encoding format -> this can be controlled entirely by Infinispan, and no need to take factors like rolling upgrade compatible encodings in mind. JBoss Marshalling would be good enough, or some implementations might not need to serialize at all. Our non-shared CacheStore implentation(s) could take advantage of lower level more complex code optimisations and interfaces, as users would rarely want to customize one of these, while the use case of mapping data to a shared service needs a more user friendly SPI so to keep it simple to plug in custom stores: custom data formats, custom connectors, get some help in implementing concurrency correctly. Proper Transaction integration for the CacheStore has been on our wishlist for some time too, I suspect that accepting that we have been mixing up two different things under a same name so far, would make it simpler to implement further improvements such as transactions: the way to do such a thing is very different in each of these use cases, so it would help at least to implement it on a subset first, or maybe only if it turns out there's no need for such things in the context of the local-only-dedicated "swapfile". # Mixed types should be killed I'm aware that some of our current implementations _could_ work both as shared or non-shared, for example the JDBC or JPACacheStore or the Remote Cachestore.. but in most cases it doesn't make much sense. Why would you ever want to use the JPACacheStore if not to share data with a _shared_ database? We should take such options away, and by doing so focus on the use cases which actually matter and simplify the implementations and improve the configuration validations. If ever a compelling storage technology is identified which we'd like to offer as an option for both shared or non-shared, I would still recommend to make two different implementations, as there certainly are different requirements and assumptions when coding such a thing. Not least, I would very like to see a default local CacheStore: picking one for local "emergency swapping" should be a no-brainer for users; we could setup one by default and not bother newcomers with complex choices. If we simplify the requirement of such a thing, it should be easy to write one on standard Java NIO2 APIs and get rid of the complexities of maintaining the native integration with things like LevelDB, not least the inefficiency of Java to make such native calls. Then as a second step, we should attack the other use case: backups; from a *purpose driven perspective* I'd then see us revive the Cassandra integration; obviously as a shared-only option. Cheers, Sanne

10 years, 4 months

6
13
0 / 0

Special cache types and their configuration (or lack of)

by Tristan Tarrant

Hi all, I wanted to bring attention to some discussion that has happened in the context of Radim's work on simplified code for specific cache types [1]. In particular, Radim proposes adding explicit configuration options (i.e. a new simple-cache cache type) to the programmatic/declarative API to ensure that a user is aware of the limitations of the resulting cache type (no interceptors, no persistence, no tx, etc). My opinion is that we should aim for "less" configuration and not "more", and that optimizations such as these should get enabled implicitly when the parameters allow it: if the configuration code detects it can use a "simple" cache. Also, this choice should happen at cache construction time, and not dynamically at cache usage time. WDYT ? Tristan [1] https://github.com/infinispan/infinispan/pull/3577 -- Tristan Tarrant Infinispan Lead JBoss, a division of Red Hat

10 years, 4 months

5
9
0 / 0

Question about Hibernate ORM 5.0 + Infinispan 8.0...

by Scott Marlow

Hi, I heard that Infinispan 8.0 may soon be integrated into WildFly 10.0. If that happens, how does that impact Hibernate ORM 5.0 which currently integrates with Infinispan 7.2.1.Final? Does Hibernate ORM 5.0 need any changes to integrate with Infinispan 8.0? Thanks, Scott

10 years, 5 months

7
25
0 / 0

Weekly IRC Meeting log 2015-07-27

by Tristan Tarrant

The meeting logs for this weeks #infinispan IRC meeting are available at http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2015/inf... Cheers Tristan -- Tristan Tarrant Infinispan Lead JBoss, a division of Red Hat

10 years, 5 months

1
0
0 / 0

Infinispan 8.0.0.Beta2 released

by Adrian Nistor

Dear community Infinispan 8.0.0.Beta2 is now available! Further details in the blog post: http://blog.infinispan.org/2015/07/infinispan-800beta2.html Cheers Adrian

10 years, 5 months

2
1
0 / 0

Strict Expiration

by William Burns

This is a necro of [1]. With Infinispan 8.0 we are adding in clustered expiration. That includes an expiration event raised that is clustered as well. Unfortunately expiration events currently occur multiple times (if numOwners > 1) at different times across nodes in a cluster. This makes coordinating a single cluster expiration event quite difficult. To work around this I am proposing that the expiration of an event is done solely by the owner of the given key that is now expired. This would fix the issue of having multiple events and the event can be raised while holding the lock for the given key so concurrent modifications would not be an issue. The problem arises when you have other nodes that have expiration set but expire at different times. Max idle is the biggest offender with this as a read on an owner only refreshes the owners timestamp, meaning other owners would not be updated and expire preemptively. To have expiration work properly in this case you would need coordination between the owners to see if anyone has a higher value. This requires blocking and would have to be done while accessing a key that is expired to be sure if expiration happened or not. The linked dev listing proposed instead to only expire an entry by the reaper thread and not on access. In this case a read will return a non null value until it is fully expired, increasing hit ratios possibly. Their are quire a bit of real benefits for this: 1. Cluster cache reads would be much simpler and wouldn't have to block to verify the object exists or not since this would only be done by the reaper thread (note this would have only happened if the entry was expired locally). An access would just return the value immediately. 2. Each node only expires entries it owns in the reaper thread reducing how many entries they must check or remove. This also provides a single point where events would be raised as we need. 3. A lot of code can now be removed and made simpler as it no longer has to check for expiration. The expiration check would only be done in 1 place, the expiration reaper thread. The main issue with this proposal is as the other listing mentions is if user code expects the value to be gone after expiration for correctness. I would say this use case is not as compelling for maxIdle, especially since we never supported it properly. And in the case of lifespan the user could very easily store the expiration time in the object that they can check after a get as pointed out in the other thread. [1] http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-stri...

10 years, 5 months

6
17
0 / 0

Development process and handling of PRs

by Tristan Tarrant

Hi all, there is something about our current development model which I feel is holding us back a little. This is caused by a number of issues: - Handling Pull Requests: we are really slow at doing this. When issuing a PR, a developer expects at least one review to happen within the next half-day at most. Instead, requests sit in the queue for days (weeks) before they even get considered. I don't expect everybody to just drop what they are doing and review immediately, but at least be a bit more reactive. - It seems like we're always aiming for the perfect PR. Obviously a PR should have zero failures, but we should be a bit more iterative about the way we make changes. This is probably also a consequence of the above: why should I break up my PR into small chunks, if it takes so long to review each one and the cumulative delay is detrimental to my progress. I like what Pedro has done for his locking changes. - We're afraid of changes, but that's what a development phase is for, especially for a new major release. We should be a bit more aggressive with trying things out. A PR can be merged even if there are some concerns (obviously not from a fundamental design POV), and it can be refined in later steps. This is what I would like to see in Beta2: - The functional API (I can take care of rebasing the PR) - The management console - The query grouping/aggregation stuff - anything else we can merge soon I would like to release Wednesday at the latest, so please do your best to help in achieving this goal. Tristan -- Tristan Tarrant Infinispan Lead JBoss, a division of Red Hat

10 years, 5 months

2
2
0 / 0

Weekly IRC meeting log 2015-07-20

by Tristan Tarrant

Hi all, here are the logs from the weekly IRC meeting: http://transcripts.jboss.org/meeting/irc.freenode.org/infinispan/2015/inf... Tristan -- Tristan Tarrant Infinispan Lead JBoss, a division of Red Hat

10 years, 5 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev July 2015