February 2014 - infinispan-dev - Jboss List Archives

Re: [infinispan-dev] Design change in Infinispan Query

by Mircea Markus

On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor(a)gmail.com> wrote: > They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story. > > There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc. The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches. > > > > On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus(a)redhat.com> wrote: > > On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote: > > >> On 24 févr. 2014, at 17:39, Mircea Markus <mmarkus(a)redhat.com> wrote: > >> > >> > >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote: > >>> > >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome. > >> > >> Curious to hear the whole story :-) > >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod). > > > > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested. > > People are going to use infinispan with one cache per entity, because it makes sense: > - different config (repl/dist | persistent/non-persistent) for different data types > - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18 > I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well. > > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 10 months

4
6
0 / 0

RadarGun 1.1.0.Final released

by Radim Vansa

Hi all, it has been a long time since last release of RadarGun. We have been using it intensively and developed many new features - 1.0.0 had 7,340 lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become multi-purpose tool, used for checking both performance and functionality of caches under stress. During 1.1.0 development, most parts of code changed beyond the beyonds, but we tried to keep the old configuration compatible. However, the design started to be rather limiting, and therefore, we have decided to make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x branch we will provide bugfixes, but all new features should go to 2.0.0. Some decoys for features expected for RadarGun 2.0.0: * non-homogenous clusters: client/server setups, cooperation of different versions of products, or easy setup of cross-site deployment with different configurations * abstracting from cache wrapper: you will be able to use RadarGun for more than just caches without any hacks ** current CacheWrapper interface will be designed to match JSR-107 javax.cache.Cache rather than java.util.Map * pluggable reporting: statistics will be directly multiplexed to configured reporters (again, without cheating on directories), reporters will provide the output formatted as CSV, HTML or even can deploy the results to external repository * merging local and distributed benchmark -> master + single slave within one JVM * better property parsing: evaluation of expressions, property replacement executed on slaves I hope you will like it! And enjoy 1.1.0.Final release now. Radim ------ Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

10 years, 10 months

5
4
0 / 0

Ditching ASYNC modes for REPL/DIST/INV/CacheStores?

by Galder Zamarreño

Hi all, The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality. Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO. WDYT? Cheers, -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org

10 years, 10 months

9
22
0 / 0

How to add programmatic config to an exisitng xml configured cache

by Faseela K

Hi, I have some infinispan configurations available in "config.xml". After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder. I am doing something like this : Configuration template = null; ConfigurationBuilder builder = new ConfigurationBuilder(); DefaultCacheManager manager = new DefaultCacheManager( "config.xml"); template = manager.getCacheConfiguration("evictionCache"); builder.read(template); builder.loaders().passivation(false).shared(false).preload(true) .addFileCacheStore().fetchPersistentState(true) .purgerThreads(3).purgeSynchronously(true) .ignoreModifications(false).purgeOnStartup(false) .location("tmp").async() .enabled(true).flushLockTimeout(15000).threadPoolSize(5) .singletonStore().enabled(true).pushStateWhenCoordinator(true) .pushStateTimeout(20000); manager.defineConfiguration("abcd", builder.build()); The problem with this code is, it's overwriting the evictionCache configuration. Can somebody help me to fix this issue? Thanks, Faseela

10 years, 10 months

2
1
0 / 0

Further dist.exec and M/R API improvements

by Vladimir Blagojevic

Hey guys, As some of you might know we have received additional requirements from community and internally to add a few things to dist.executors and map/reduce API. On distributed executors front we need to enable distributed executors to store results into cache directly rather than returning them to invoker [1]. As soon as we introduce this API we also need a asyc. mechanism to allow notifications of subtask completion/failure. I was thinking we add a concept of DistributedTaskExecutionListener which can be specified in DistributedTaskBuilder: DistributedTaskBuilder<T> executionListener(DistributedTaskExecutionListener<K, T> listener); We needed DistributedTaskExecutionListener anyway. All distributed tasks might use some feedback about task progress, completion/failure and on. My proposal is roughly: public interface DistributedTaskExecutionListener<K, T> { void subtaskSent(Address node, Set<K> inputKeys); void subtaskFailed(Address node, Set<K> inputKeys, Exception e); void subtaskSucceded(Address node, Set<K> inputKeys, T result); void allSubtasksCompleted(); } So much for that. If tasks do not use input keys these parameters would be emply sets. Now for [1] we need to add additional methods to DistributedExecutorService. We can not specify result cache in DistributedTaskBuilder as we are still bound to only submit methods in DistributedExecutorService that return futures and we don't want that. We need two new void methods: <T, K> void submitEverywhere(DistributedTask<T> task, Cache<DistExecResultKey<K>, T> result); <T, K > void submitEverywhere(DistributedTask<T> task, Cache<DistExecResultKey<K>, T> result, K... input); Now, why bother with DistExecResultKey? Well we have tasks that use input keys and tasks that don't. So results cache could only be keyed by either keys or execution address, or combination of those two. Therefore, DistExecResultKey could be something like: public interface DistExecResultKey<K> { Address getExecutionAddress(); K getKey(); } If you have a better idea how to address this aspect let us know. So much for distributed executors. For map/reduce we also have to enable storing of map reduce task results into cache [2] and allow users to specify custom cache for intermediate results[3]. Part of task [2] is to allow notification about map/reduce task progress and completion. Just as in dist.executor I would add MapReduceTaskExecutionListener interface: public interface MapReduceTaskExecutionListener { void mapTaskInitialized(Address executionAddress); void mapTaskSucceeded(Address executionAddress); void mapTaskFailed(Address executionTarget, Exception cause); void mapPhaseCompleted(); void reduceTaskInitialized(Address executionAddress); void reduceTaskSucceeded(Address executionAddress); void reduceTaskFailed(Address address, Exception cause); void reducePhaseCompleted(); } while MapReduceTask would have an additional method: public void execute(Cache<KOut, VOut> resultsCache); MapReduceTaskExecutionListener could be specified using fluent MapReduceTask API just as intermediate cache would be: public MapReduceTask<KIn, VIn, KOut, VOut> usingIntermediateCache(Cache<KOut, List<VOut>> tmpCache); thus addressing issue [3]. Let me know what you think, Vladimir [1] https://issues.jboss.org/browse/ISPN-4030 [2] https://issues.jboss.org/browse/ISPN-4002 [3] https://issues.jboss.org/browse/ISPN-4021

10 years, 10 months

3
7
0 / 0

Re: [infinispan-dev] Design change in Infinispan Query

by Mircea Markus

On Feb 24, 2014, at 5:39 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote: > On 24 February 2014 16:51, Mircea Markus <mmarkus(a)redhat.com> wrote: >> Just to recap the main reason for the JPA cache store is to be a replacement for the JDBCacheStore, nothing more than that. >> And it certainly has advantages compared with the JDBC Cache Stores: >> - JPA offers database independence/portability >> - doesn't put that many restrictions on the schema >> - it's easier write/read from an exiting database table > > Don't you dare hijacking my nice 2 years old thread :-D :-D > BTW why is this dicussion not public anymore? I missed the switch to undercover. I don't know where it switched to private, make it public again ;) > > Cheers, > Sanne > >> >> >> >> On Feb 18, 2014, at 1:18 PM, Tristan Tarrant <tristan(a)infinispan.org> wrote: >> >>> I think that the CacheLoader/Store SPI should be enhanced with "schema" information, whatever its source (JPA annotations, ProtoBuf, etc). >>> >>> A schema-aware store can then do what it pleases. >>> >>> Tristan >>> >>> On 18/02/2014 14:03, Emmanuel Bernard wrote: >>>> On Tue 2014-02-18 13:16, Adrian Nistor wrote: >>>>>> JPA cache store is a waste of time IMO :) >>>>> +1 :) >>>> My understanding is that the JPACacheStore discussion is revived because >>>> users want to map an existing databases, load the data in the grid and >>>> keep both synchronized. >>>> At least that's the use case I was told was needed to be covered. >>> >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 10 months

1
0
0 / 0

Introducing Infinispan OData server: Remote JSON documents querying

by Tomas Sykora

Hello all! :) It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally! This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData). There is still much to do/implement/improve in the server, but it is working as it is now. Check a blog post if you are interested: http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-serv... Any feedback is more than welcome. + I'd like to say a big THANK YOU to all who supported me! Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian. It wouldn't be done without your patience and willingness to help me :-) Tomas

10 years, 10 months

3
2
0 / 0

Design change in Infinispan Query

by Sanne Grinovero

Hello all, currently Infinispan Query is an interceptor registering on the specific Cache instance which has indexing enabled; one such interceptor is doing all what it needs to do in the sole scope of the cache it was registered in. If you enable indexing - for example - on 3 different caches, there will be 3 different Hibernate Search engines started in background, and they are all unaware of each other. After some design discussions with Ales for CapeDwarf, but also calling attention on something that bothered me since some time, I'd evaluate the option to have a single Hibernate Search Engine registered in the CacheManager, and have it shared across indexed caches. Current design limitations: A- If they are all configured to use the same base directory to store indexes, and happen to have same-named indexes, they'll share the index without being aware of each other. This is going to break unless the user configures some tricky parameters, and even so performance won't be great: instances will lock each other out, or at best write in alternate turns. B- The search engine isn't particularly "heavy", still it would be nice to share some components and internal services. C- Configuration details which need some care - like injecting a JGroups channel for clustering - needs to be done right isolating each instance (so large parts of configuration would be quite similar but not totally equal) D- Incoming messages into a JGroups Receiver need to be routed not only among indexes, but also among Engine instances. This prevents Query to reuse code from Hibernate Search. Problems with a unified Hibernate Search Engine: 1#- Isolation of types / indexes. If the same indexed class is stored in different (indexed) caches, they'll share the same index. Is it a problem? I'm tempted to consider this a good thing, but wonder if it would surprise some users. Would you expect that? 2#- configuration format overhaul: indexing options won't be set on the cache section but in the global section. I'm looking forward to use the schema extensions anyway to provide a better configuration experience than the current <properties />. 3#- Assuming 1# is fine, when a search hit is found I'd need to be able to figure out from which cache the value should be loaded. 3#A we could have the cache name encoded in the index, as part of the identifier: {PK,cacheName} 3#B we actually shard the index, keeping a physically separate index per cache. This would mean searching on the joint index view but extracting hits from specific indexes to keep track of "which index".. I think we can do that but it's definitely tricky. It's likely easier to keep indexed values from different caches in different indexes. that would mean to reject #1 and mess with the user defined index name, to add for example the cache name to the user defined string. Any comment? Cheers, Sanne

10 years, 10 months

6
14
0 / 0

ClusteredListeners: message delivered twice

by Mircea Markus

Hey Will, With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 10 months

4
3
0 / 0

MapReduce limitations and suggestions.

by Evangelos Vazaios

Hello everyone, I started using the MapReduce implementation of Infinispan and I came across some possible limitations. Thus, I want to make some suggestions about the MapReduce (MR) implementation of Infinispan. Depending on the algorithm, there might be some memory problems, especially for intermediate results. An example of such a case is group by. Suppose that we have a cluster of 2 nodes with 2 GB available. Let a distributed cache, where simple car objects (id,brand,colour) are stored and the total size of data is 3.5GB. If all objects have the same colour , then all 3.5 GB would go to only one reducer, as a result an OutOfMemoryException will be thrown. To overcome these limitations, I propose to add as parameter the name of the intermediate cache to be used. This will enable the creation of a custom configured cache that deals with the memory limitations. Another feature that I would like to have is to set the name of the output cache. The reasoning behind this is similar to the one mentioned above. I wait for your thoughts on these two suggestions. Regards, Evangelos

10 years, 10 months

9
25
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev February 2014