Re: [infinispan-dev] Design change in Infinispan Query
by Mircea Markus
On Feb 25, 2014, at 3:46 PM, Adrian Nistor <anistor(a)gmail.com> wrote:
> They can do what they please. Either put multiple types in one basket or put them in separate caches (one type per cache). But allowing / recommending is one thing, mandating it is a different story.
>
> There's no reason to forbid _any_ of these scenarios / mandate one over the other! There was previously in this thread some suggestion of mandating the one type per cache usage. -1 for it
Agreed. I actually don't see how we can enforce people that declare Cache<Object,Object> not put whatever they want in it. Also makes total sense for smaller caches as it is easy to set up etc.
The debate in this email, the way I understood it, was: are/should people using multiple caches for storing data? If yes we should consider querying functionality spreading over multiple caches.
>
>
>
> On Tue, Feb 25, 2014 at 5:08 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
>
> On Feb 25, 2014, at 9:28 AM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
>
> >> On 24 févr. 2014, at 17:39, Mircea Markus <mmarkus(a)redhat.com> wrote:
> >>
> >>
> >>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
> >>>
> >>> By the way, Mircea, Sanne and I had quite a long discussion about this one and the idea of one cache per entity. It turns out that the right (as in easy) solution does involve a higher level programming model like OGM provides. You can simulate it yourself using the Infinispan APIs but it is just cumbersome.
> >>
> >> Curious to hear the whole story :-)
> >> We cannot mandate all the suers to use OGM though, one of the reasons being OGM is not platform independent (hotrod).
> >
> > Then solve all the issues I have raised with a magic wand and come back to me when you have done it, I'm interested.
>
> People are going to use infinispan with one cache per entity, because it makes sense:
> - different config (repl/dist | persistent/non-persistent) for different data types
> - have map/reduce tasks running only the Person entires not on Dog as well, when you want to select (Person) where age > 18
> I don't see a reason to forbid this, on the contrary. The way I see it the relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a better abstraction and should be recommended as such for the Java clients, but ultimately we're a general purpose storage engine that is available to different platforms as well.
>
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)
10 years, 10 months
RadarGun 1.1.0.Final released
by Radim Vansa
Hi all,
it has been a long time since last release of RadarGun. We have been
using it intensively and developed many new features - 1.0.0 had 7,340
lines of Java code, 1.1.0 has 32,978 lines. RadarGun has become
multi-purpose tool, used for checking both performance and functionality
of caches under stress.
During 1.1.0 development, most parts of code changed beyond the beyonds,
but we tried to keep the old configuration compatible. However, the
design started to be rather limiting, and therefore, we have decided to
make the last release for 1.1.0 and move on to RadarGun 2.0.0. In 1.1.x
branch we will provide bugfixes, but all new features should go to 2.0.0.
Some decoys for features expected for RadarGun 2.0.0:
* non-homogenous clusters: client/server setups, cooperation of
different versions of products, or easy setup of cross-site deployment
with different configurations
* abstracting from cache wrapper: you will be able to use RadarGun for
more than just caches without any hacks
** current CacheWrapper interface will be designed to match JSR-107
javax.cache.Cache rather than java.util.Map
* pluggable reporting: statistics will be directly multiplexed to
configured reporters (again, without cheating on directories), reporters
will provide the output formatted as CSV, HTML or even can deploy the
results to external repository
* merging local and distributed benchmark -> master + single slave
within one JVM
* better property parsing: evaluation of expressions, property
replacement executed on slaves
I hope you will like it! And enjoy 1.1.0.Final release now.
Radim
------
Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA
10 years, 10 months
Ditching ASYNC modes for REPL/DIST/INV/CacheStores?
by Galder Zamarreño
Hi all,
The following came to my mind yesterday: I think we should ditch ASYNC modes for DIST/REPL/INV and our async cache store functionality.
Instead, whoever wants to store something asyncronously should use asynchronous methods, i.e. call putAsync. So, this would mean that when you call put(), it's always sync. This would reduce the complexity and configuration of our code base, without affecting our functionality, and it would make things more logical IMO.
WDYT?
Cheers,
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org
10 years, 10 months
How to add programmatic config to an exisitng xml configured cache
by Faseela K
Hi,
I have some infinispan configurations available in "config.xml".
After loading this configuration, I want to append some more configurations programmatically, using Configuration Builder.
I am doing something like this :
Configuration template = null;
ConfigurationBuilder builder = new ConfigurationBuilder();
DefaultCacheManager manager = new DefaultCacheManager(
"config.xml");
template = manager.getCacheConfiguration("evictionCache");
builder.read(template);
builder.loaders().passivation(false).shared(false).preload(true)
.addFileCacheStore().fetchPersistentState(true)
.purgerThreads(3).purgeSynchronously(true)
.ignoreModifications(false).purgeOnStartup(false)
.location("tmp").async()
.enabled(true).flushLockTimeout(15000).threadPoolSize(5)
.singletonStore().enabled(true).pushStateWhenCoordinator(true)
.pushStateTimeout(20000);
manager.defineConfiguration("abcd", builder.build());
The problem with this code is, it's overwriting the evictionCache configuration.
Can somebody help me to fix this issue?
Thanks,
Faseela
10 years, 10 months
Further dist.exec and M/R API improvements
by Vladimir Blagojevic
Hey guys,
As some of you might know we have received additional requirements from
community and internally to add a few things to dist.executors and
map/reduce API. On distributed executors front we need to enable
distributed executors to store results into cache directly rather than
returning them to invoker [1]. As soon as we introduce this API we also
need a asyc. mechanism to allow notifications of subtask
completion/failure. I was thinking we add a concept of
DistributedTaskExecutionListener which can be specified in
DistributedTaskBuilder:
DistributedTaskBuilder<T>
executionListener(DistributedTaskExecutionListener<K, T> listener);
We needed DistributedTaskExecutionListener anyway. All distributed tasks
might use some feedback about task progress, completion/failure and on.
My proposal is roughly:
public interface DistributedTaskExecutionListener<K, T> {
void subtaskSent(Address node, Set<K> inputKeys);
void subtaskFailed(Address node, Set<K> inputKeys, Exception e);
void subtaskSucceded(Address node, Set<K> inputKeys, T result);
void allSubtasksCompleted();
}
So much for that. If tasks do not use input keys these parameters would
be emply sets. Now for [1] we need to add additional methods to
DistributedExecutorService. We can not specify result cache in
DistributedTaskBuilder as we are still bound to only submit methods in
DistributedExecutorService that return futures and we don't want that.
We need two new void methods:
<T, K> void submitEverywhere(DistributedTask<T> task,
Cache<DistExecResultKey<K>, T> result);
<T, K > void submitEverywhere(DistributedTask<T> task,
Cache<DistExecResultKey<K>, T> result, K... input);
Now, why bother with DistExecResultKey? Well we have tasks that use
input keys and tasks that don't. So results cache could only be keyed by
either keys or execution address, or combination of those two.
Therefore, DistExecResultKey could be something like:
public interface DistExecResultKey<K> {
Address getExecutionAddress();
K getKey();
}
If you have a better idea how to address this aspect let us know. So
much for distributed executors.
For map/reduce we also have to enable storing of map reduce task results
into cache [2] and allow users to specify custom cache for intermediate
results[3]. Part of task [2] is to allow notification about map/reduce
task progress and completion. Just as in dist.executor I would add
MapReduceTaskExecutionListener interface:
public interface MapReduceTaskExecutionListener {
void mapTaskInitialized(Address executionAddress);
void mapTaskSucceeded(Address executionAddress);
void mapTaskFailed(Address executionTarget, Exception cause);
void mapPhaseCompleted();
void reduceTaskInitialized(Address executionAddress);
void reduceTaskSucceeded(Address executionAddress);
void reduceTaskFailed(Address address, Exception cause);
void reducePhaseCompleted();
}
while MapReduceTask would have an additional method:
public void execute(Cache<KOut, VOut> resultsCache);
MapReduceTaskExecutionListener could be specified using fluent
MapReduceTask API just as intermediate cache would be:
public MapReduceTask<KIn, VIn, KOut, VOut>
usingIntermediateCache(Cache<KOut, List<VOut>> tmpCache);
thus addressing issue [3].
Let me know what you think,
Vladimir
[1] https://issues.jboss.org/browse/ISPN-4030
[2] https://issues.jboss.org/browse/ISPN-4002
[3] https://issues.jboss.org/browse/ISPN-4021
10 years, 10 months
Re: [infinispan-dev] Design change in Infinispan Query
by Mircea Markus
On Feb 24, 2014, at 5:39 PM, Sanne Grinovero <sanne(a)infinispan.org> wrote:
> On 24 February 2014 16:51, Mircea Markus <mmarkus(a)redhat.com> wrote:
>> Just to recap the main reason for the JPA cache store is to be a replacement for the JDBCacheStore, nothing more than that.
>> And it certainly has advantages compared with the JDBC Cache Stores:
>> - JPA offers database independence/portability
>> - doesn't put that many restrictions on the schema
>> - it's easier write/read from an exiting database table
>
> Don't you dare hijacking my nice 2 years old thread :-D
:-D
> BTW why is this dicussion not public anymore? I missed the switch to undercover.
I don't know where it switched to private, make it public again ;)
>
> Cheers,
> Sanne
>
>>
>>
>>
>> On Feb 18, 2014, at 1:18 PM, Tristan Tarrant <tristan(a)infinispan.org> wrote:
>>
>>> I think that the CacheLoader/Store SPI should be enhanced with "schema" information, whatever its source (JPA annotations, ProtoBuf, etc).
>>>
>>> A schema-aware store can then do what it pleases.
>>>
>>> Tristan
>>>
>>> On 18/02/2014 14:03, Emmanuel Bernard wrote:
>>>> On Tue 2014-02-18 13:16, Adrian Nistor wrote:
>>>>>> JPA cache store is a waste of time IMO :)
>>>>> +1 :)
>>>> My understanding is that the JPACacheStore discussion is revived because
>>>> users want to map an existing databases, load the data in the grid and
>>>> keep both synchronized.
>>>> At least that's the use case I was told was needed to be covered.
>>>
>>
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)
10 years, 10 months
Introducing Infinispan OData server: Remote JSON documents querying
by Tomas Sykora
Hello all! :)
It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally!
This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard and platform independent way of communication with the server (OData).
There is still much to do/implement/improve in the server, but it is working as it is now.
Check a blog post if you are interested:
http://tsykora-tech.blogspot.cz/2014/02/introducing-infinispan-odata-serv...
Any feedback is more than welcome.
+ I'd like to say a big THANK YOU to all who supported me!
Mainly: JDG QE guys, Manik, Mircea, Sanne and Adrian.
It wouldn't be done without your patience and willingness to help me :-)
Tomas
10 years, 10 months
Design change in Infinispan Query
by Sanne Grinovero
Hello all,
currently Infinispan Query is an interceptor registering on the
specific Cache instance which has indexing enabled; one such
interceptor is doing all what it needs to do in the sole scope of the
cache it was registered in.
If you enable indexing - for example - on 3 different caches, there
will be 3 different Hibernate Search engines started in background,
and they are all unaware of each other.
After some design discussions with Ales for CapeDwarf, but also
calling attention on something that bothered me since some time, I'd
evaluate the option to have a single Hibernate Search Engine
registered in the CacheManager, and have it shared across indexed
caches.
Current design limitations:
A- If they are all configured to use the same base directory to
store indexes, and happen to have same-named indexes, they'll share
the index without being aware of each other. This is going to break
unless the user configures some tricky parameters, and even so
performance won't be great: instances will lock each other out, or at
best write in alternate turns.
B- The search engine isn't particularly "heavy", still it would be
nice to share some components and internal services.
C- Configuration details which need some care - like injecting a
JGroups channel for clustering - needs to be done right isolating each
instance (so large parts of configuration would be quite similar but
not totally equal)
D- Incoming messages into a JGroups Receiver need to be routed not
only among indexes, but also among Engine instances. This prevents
Query to reuse code from Hibernate Search.
Problems with a unified Hibernate Search Engine:
1#- Isolation of types / indexes. If the same indexed class is
stored in different (indexed) caches, they'll share the same index. Is
it a problem? I'm tempted to consider this a good thing, but wonder if
it would surprise some users. Would you expect that?
2#- configuration format overhaul: indexing options won't be set on
the cache section but in the global section. I'm looking forward to
use the schema extensions anyway to provide a better configuration
experience than the current <properties />.
3#- Assuming 1# is fine, when a search hit is found I'd need to be
able to figure out from which cache the value should be loaded.
3#A we could have the cache name encoded in the index, as part
of the identifier: {PK,cacheName}
3#B we actually shard the index, keeping a physically separate
index per cache. This would mean searching on the joint index view but
extracting hits from specific indexes to keep track of "which index"..
I think we can do that but it's definitely tricky.
It's likely easier to keep indexed values from different caches in
different indexes. that would mean to reject #1 and mess with the user
defined index name, to add for example the cache name to the user
defined string.
Any comment?
Cheers,
Sanne
10 years, 10 months
ClusteredListeners: message delivered twice
by Mircea Markus
Hey Will,
With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of?
Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org)
10 years, 10 months
MapReduce limitations and suggestions.
by Evangelos Vazaios
Hello everyone,
I started using the MapReduce implementation of Infinispan and I came
across some possible limitations. Thus, I want to make some suggestions
about the MapReduce (MR) implementation of Infinispan.
Depending on the algorithm, there might be some memory problems,
especially for intermediate results.
An example of such a case is group by. Suppose that we have a cluster
of 2 nodes with 2 GB available. Let a distributed cache, where simple
car objects (id,brand,colour) are stored and the total size of data is
3.5GB. If all objects have the same colour , then all 3.5 GB would go to
only one reducer, as a result an OutOfMemoryException will be thrown.
To overcome these limitations, I propose to add as parameter the name of
the intermediate cache to be used. This will enable the creation of a
custom configured cache that deals with the memory limitations.
Another feature that I would like to have is to set the name of the
output cache. The reasoning behind this is similar to the one mentioned
above.
I wait for your thoughts on these two suggestions.
Regards,
Evangelos
10 years, 10 months