Design change in Infinispan Query
by Sanne Grinovero
Hello all,
currently Infinispan Query is an interceptor registering on the
specific Cache instance which has indexing enabled; one such
interceptor is doing all what it needs to do in the sole scope of the
cache it was registered in.
If you enable indexing - for example - on 3 different caches, there
will be 3 different Hibernate Search engines started in background,
and they are all unaware of each other.
After some design discussions with Ales for CapeDwarf, but also
calling attention on something that bothered me since some time, I'd
evaluate the option to have a single Hibernate Search Engine
registered in the CacheManager, and have it shared across indexed
caches.
Current design limitations:
A- If they are all configured to use the same base directory to
store indexes, and happen to have same-named indexes, they'll share
the index without being aware of each other. This is going to break
unless the user configures some tricky parameters, and even so
performance won't be great: instances will lock each other out, or at
best write in alternate turns.
B- The search engine isn't particularly "heavy", still it would be
nice to share some components and internal services.
C- Configuration details which need some care - like injecting a
JGroups channel for clustering - needs to be done right isolating each
instance (so large parts of configuration would be quite similar but
not totally equal)
D- Incoming messages into a JGroups Receiver need to be routed not
only among indexes, but also among Engine instances. This prevents
Query to reuse code from Hibernate Search.
Problems with a unified Hibernate Search Engine:
1#- Isolation of types / indexes. If the same indexed class is
stored in different (indexed) caches, they'll share the same index. Is
it a problem? I'm tempted to consider this a good thing, but wonder if
it would surprise some users. Would you expect that?
2#- configuration format overhaul: indexing options won't be set on
the cache section but in the global section. I'm looking forward to
use the schema extensions anyway to provide a better configuration
experience than the current <properties />.
3#- Assuming 1# is fine, when a search hit is found I'd need to be
able to figure out from which cache the value should be loaded.
3#A we could have the cache name encoded in the index, as part
of the identifier: {PK,cacheName}
3#B we actually shard the index, keeping a physically separate
index per cache. This would mean searching on the joint index view but
extracting hits from specific indexes to keep track of "which index"..
I think we can do that but it's definitely tricky.
It's likely easier to keep indexed values from different caches in
different indexes. that would mean to reject #1 and mess with the user
defined index name, to add for example the cache name to the user
defined string.
Any comment?
Cheers,
Sanne
10 years, 2 months
Re: [infinispan-dev] Removing Infinispan dependency on the Hibernate-Infinispan module in 4.x
by Galder Zamarreño
Scott, what do you suggest doing instead then? Without the commands, evictAll invalidation won't work.
Are you suggesting that I revert back to using the cache as a notification bus so that regions are invalidated?
On Feb 8, 2012, at 4:13 PM, Scott Marlow wrote:
> http://lists.jboss.org/pipermail/infinispan-dev/2012-February/010125.html has more context.
>
> Since there are no easy/quick fixes that can be applied at this time, to remove the AS7 Infinispan dependency on the Hibernate-Infinispan module, I think we should avoid depending on the service loader way to supply the custom commands (in the Hibernate-Infinispan module), at least until this can be addressed elsewhere.
>
> I propose that the Hibernate-Infinispan second level cache should not use the Service Loader to pass custom commands into Infinispan. If we agree, I'll create a jira for this.
>
> Scott
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
10 years, 10 months
Map Reduce 2.0
by Vladimir Blagojevic
Hey guys,
Before moving forward with next iteration of map reduce I wanted to hear
your thoughts about the following proposal. After we agree on the
general direction I will transcribe the agreed design on a wiki page and
start implementation.
Shortcoming of current map reduce implementation
While our current map reduce implementation is more than a proof of a
concept there are several drawbacks preventing it from being an
industrial grade map reduce solution. The main drawback is the inability
of the current solution to deal with a large data (in GB/TB) map reduce
problems. This shortcoming is mainly around our reduce phase execution.
Reduce phase, as you might know, is currently done on a single
Infinispan master task node; reduce phase of map reduce problems we can
support (data size wise) is therefore shrunk to a working memory of a
single node.
Proposed solution
The proposed solution involves distributing execution of reduce phase
tasks across the cluster thus effectively achieving higher reduce task
parallelization and at the same time removing the above mentioned reduce
phase restriction. Effectively leveraging our consistent hashing
solution even further we can parallelize reduce phase and elevate our
map reduce solution to an industrial level. Here is how we can achieve that.
Map phase
MapReduceTask, as it currently does, will hash task input keys and group
them by execution node N they are hashed to. For each node N and its
grouped input KIn keys MapReduceTask creates a MapCombineCommand which
is migrated to an execution target node N. MapCombineCommand is similar
to current MapReduceCommand. MapCombineCommand takes an instance of a
Mapper and an instance of a Reducer, which is a combiner [1].
Once loaded into target execution node MapCombineCommand takes each
local KIn key and executes Mapper method void map(KIn key, VIn value,
Collector<KOut, VOut> collector). Results are collected to a common
Collector<KOut, VOut> collector and combine phase is initiated. A
Combiner, if specified, takes KOut keys and imediatelly invokes reduce
phase on keys. The result of mapping phase executed on each node is
<KOut, VOut> map M. There will be one resulting M map per execution node N.
At the end of combine phase instead of returning map M to the master
task node (as we currently do), we now hash each KOut in map M and group
KOut keys by the execution node N they are hashed to. Each group of KOut
keys and its VOut values, hashed to the same node, is wrapped with a new
command Migrate. Command Migrate, which is very similar to
PutKeyValueCommand,executed on Infinispan target node N esentially
maintains KOut K -> List<VOut> mapping, i.e all KOut/VOut pairs from all
executed MapCombineCommands will be collocated on a node N where KOut is
hashed to and value for KOut will be a list of all VOut values. We
essentially collect all VOut values under each KOut for all executed
MapCombineCommands.
At this point MapCombineCommand has finished its execution; list of KOut
keys is returned to a master node and its MapReduceTask. We do not
return VOut values as we do not need them at master task node.
MapReduceTask is ready to start with reduce phase.
Reduce phase
MapReduceTask initializes ReduceCommand with a user specified Reducer.
For each key KOut collected from a map phase we group them by execution
node N they are hashed to. For each node N and its grouped input KOut
keys MapReduceTask creates a ReduceCommand and sends it to a node N
where KOut keys are hashed. Once loaded on target execution node,
ReduceCommand for each KOut key grabs list of values VOut and invokes:
VOut reduce(KOut reducedKey, Iterator<VOut> iter).
A result of ReduceCommand is a map M where each key is KOut and value is
VOut. Each Infinispan execution node N returns one map M where each key
KOut is hashed to N and each VOut is KOut's reduced value.
When all ReduceCommands return to a calling node, MapReduceTask simply
combines all these M maps and returns final Map<KOut, VOut> as a result
of MapReduceTask. All intermediate KOut->List<VOut> maps left on
Infinispan cluster are then cleaned up.
[1] See section 4.3 of http://research.google.com/archive/mapreduce.html
11 years, 10 months
Time for a tryLock() ?
by Galder Zamarreño
Looks like rolling back the transaction when a lock timeout is encountered can be problematic: https://community.jboss.org/message/731307#731307
Maybe time to implement a tryLock() that attempts to acquire the lock but does not rollback the transaction if it cannot acquire it?
Thoughts?
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
11 years, 11 months
Time for a tryPut()? :-)
by Thomas Fromm
Heyho,
Similar to the tryLock-issue discussed in another thread, I've a problem
with put().
Cache.put(...) can have a lot of reasons for failling. e.g.
java.lang.RuntimeException: org.infinispan.CacheException: Member
ISNode-35671 no longer in cluster....
javax.transaction.HeuristicMixedException (different reasons)
...
Reasons of failing puts are often normal cluster operations. e.g. new
node joins in, other one leaves ... Under load, this stuff causes lots
of transactions to fail.
For a single put was my solution just to try (limited times) again, when
exception appear.
For transactions this is a bit more complex, due I need to "replay" the
operations of the whole failed transaction.
Do you have any best practice for these situations? Makes the feature
request for a uncontended put makes sense?
--tf
12 years
How the remote fetching for GET works in DIST mode
by Michal Linhard
Maybe this is an ancient topic discussed, ages ago somewhere, but if
there's a quick answer, please save me from searching through tons of
archives.
Let's have a DIST mode cache.
When we're doing a memcached/REST tests where for each request there's
cca numOwners/numNodes probability that we'll hit the owner,
in numNodes=4, numOwners=2 this means cca every second request needs a
remote fetch (assume no L1 cache)
I'm just looking at the DistributionManagerImpl.retrieveFromRemoteSource
and trying understand how it works.
Does it really cause 2 more GET request processing in the cluster ?
OK even when we're assuming serving a GET request should be quick, I'd
like to see how it would perform if we only allowed getting from the
first owner.
I guess right now we don't allow configuring this. Are there any catches
with this ?
m.
--
Michal Linhard
Quality Assurance Engineer
JBoss Datagrid
Red Hat Czech s.r.o.
Purkynova 99 612 45 Brno, Czech Republic
phone: +420 532 294 320 ext. 62320
mobile: +420 728 626 363
12 years
Hot Rod Java client and NIO
by Manik Surtani
Mircea,
When you wrote the Hot Rod client, you abstracted the transport away so we could have alternate network implementations, right? The reason I ask is that at some point we should look at not only an NIO impl (I know you had one that you experimented with some while back) but also a JDK7 NIO2 one, and benchmark the three.
WDYT?
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org
12 years
PutMapCommand throws a NullPointerException in Distributed Mode
by Pedro Ruivo
Hi all,
I've spotted a bug in PutMapCommand. When the keys in the Command
touches in multiple nodes, the remote nodes (nodes that didn't created
the command) can throw the exception [1] when executing the perform()
method. I'm using a transactional cache.
The test case in [2] reproduces the bug. If you want, I can open a JIRA
and if you need more details let me know.
Cheers,
Pedro Ruivo
[1] Exception:
Caused by: java.lang.NullPointerException
at
org.infinispan.commands.write.PutMapCommand.perform(PutMapCommand.java:79)
at
org.infinispan.interceptors.CallInterceptor.handleDefault(CallInterceptor.java:83)
at
org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:82)
at
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:67)
--
[2]
branch: https://github.com/pruivo/infinispan/tree/issue_2
test case:
https://github.com/pruivo/infinispan/blob/issue_2/core/src/test/java/org/...
12 years
Issue when preload from cache loader with write skew check
by Pedro Ruivo
Hi all,
I think I've spotted a issue when I use repeatable read with write skew
check and I preload the cache.
I've made a test case to reproduce the bug. It can be found here [1].
The problem is that each keys preloaded is put in the container with
version = null. When I try to commit a transaction, I get this exception:
java.lang.IllegalStateException: Entries cannot have null versions!
at
org.infinispan.container.entries.ClusteredRepeatableReadEntry.performWriteSkewCheck(ClusteredRepeatableReadEntry.java:44)
at
org.infinispan.transaction.WriteSkewHelper.performWriteSkewCheckAndReturnNewVersions(WriteSkewHelper.java:81)
at
org.infinispan.interceptors.locking.ClusteringDependentLogic$AllNodesLogic.createNewVersionsAndCheckForWriteSkews(ClusteringDependentLogic.java:133)
at
org.infinispan.interceptors.VersionedEntryWrappingInterceptor.visitPrepareCommand(VersionedEntryWrappingInterceptor.java:64)
I think that all info is in the test case, but if you need something let
me know.
Cheers,
Pedro
[1]
https://github.com/pruivo/infinispan/blob/issue_1/core/src/test/java/org/...
12 years