lock within tx
by Mircea Markus
Hi Manik,
In the following modified version of NodeAPITest.testAddingDataTx, the
output will be "false" (the version here is slightly modified).
public void testAddingDataTx() throws Exception {
Node<Object, Object> rootNode = cache.getRoot();
tm.begin();
Node<Object, Object> nodeA = rootNode.addChild(A);
NodeKey dataKey = new NodeKey(Fqn.fromString("/a"),
NodeKey.Type.DATA);
System.out.println("is it locked???" +
cache.getCache().getAdvancedCache().getInvocationContextContainer().get().hasLockedKey(dataKey));
nodeA.put("key", "value");
assertEquals("value", nodeA.get("key"));
tm.commit();
}
Now, after creating a node within a tx, shouldn't the corresponding
dataKey have a WL? (i.e. the code to print "true").
Cheers,
Mircea
15 years, 5 months
Async cache interface
by Manik Surtani
Picking up on
https://jira.jboss.org/jira/browse/ISPN-72
I was thinking about from what point the call is placed on a separate
thread. The obvious bit is the network stack so that the caller gets
a Future before the call goes out on the network, but maybe there is
sense to do this even earlier? Perhaps as soon as the invocation is
made? Things to consider though are transaction context - which exist
on the caller's thread - and potential context class loaders on the
thread. Stuff that can be dealt with easily enough - e.g., if we have
an AsyncInterceptor that handles the thread pooling and management of
the Future, and this sits *after* the TxInterceptor so transaction
participation is already determined and transaction context attached
to the call. Context classloaders and the like could be attached to
the worker thread that then carries the call down the chain and to the
wire.
Thoughts?
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 6 months
JDBM cache store
by Manik Surtani
Genman
Could you please have a look at the maven deps for this module - it's
pulling in a lot of seemingly unnecessary JARs, including antlr,
cglib, ldap constants from Apache, SLF4J, etc. Surely Apache's fork
of JDBM doesn't need this stuff?!?
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 6 months
Rehashing for DIST
by Manik Surtani
So the blocker for distribution now is rehashing. This is much
trickier a problem than I previously thought, since it brings all of
the concerns we have with state transfer - the ability to generate and
apply state, preferably while not stopping the cluster, not
overwriting state from ongoing transactions, etc.
I've detailed out two approaches - one is pessimistic, based on FLUSH,
and probably won't be implemented but I've included it here for
completeness. The second is more optimistic, based on NBST, although
several degrees more complex. Still not happy with it as a solution
though, but it could be a fallback. Still researching other
alternatives as well, and an open to suggestions and ideas.
Anyway, here we go. For each of the steps outlined below, ML1 is the
old member list prior to a topology change, and ML2 is the new one.
A. Pessimistic approach
-------------------
JOIN:
1. Joiner FLUSHes
2. Joiner requests state from *all* other members (concurrently,
using separate threads?) using channel.getState(address)
3. All other caches iterate through their local data container.
3.1. If the cache is the primary owner based on ML1 and the joiner is
*an* owner based on ML2, this entry is written to the state stream.
3.2. If the cache is no longer an owner of the entry based on ML2,
the entry is moved to L1 cache if enabled (otherwise, removed).
4. All caches close their state streams.
5. Joiner, after applying state received, releases FLUSH and joins
the cluster.
LEAVE:
1. Coordinator requests a FLUSH
2. All caches iterate through their data containers.
2.1. If based on ML1, the cache is the primary owner of an entry, and
based on ML1 and ML2, there is a new owner who does not have the
entry, an RPC request is sent to the new owner.
2.1.1. The new owner requests state from the primary owner with
channel.getState()
2.1.2. The primary owner writes this state for the new owner.
2.2. If the owner is no longer an owner based on ML2, the entry is
moved to L1 if enabled (otherwise, removed)
3. Caches close any state streams opened. Readers apply all state
received.
4. Caches send an RPC to the coordinator informing the coord of an
end-of-rehash phase.
5. Coordinator releases FLUSH
This process will involve multiple streams connecting each cache to
potentially every other cache. Perhaps streams over UDP multicast may
be more efficient here?
NBST approach
-----------------------
This approach is very similar to NBST where the FLUSH phases defined
in the pessimistic approach are replaced by modification logging and
streaming of the modification log, using RPC to control a brief
partial lock on modifications when the last of the state is written.
Also, to allow for ongoing remote gets to retrieve correct state when
dealing with a race of making a request from a new owner when the new
owner hasn't applied state, or an old owner when the old owner has
already removed state, we should support responding to a remote get
even if you are no longer an owner - either by using L1, or my making
another remote get in turn with the view that you deem correct.
This does increase complexity over NBST quite significantly though,
since each cache would need to maintain a transaction log for each and
every other cache based on which keys are mapped there, to allow state
application as per the pessimistic steps above to be accurate.
Like I said, not the most elegant, but I thought I'd just throw it out
there until I come up with a better approach. :-)
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 6 months
Eviction overhaul
by Manik Surtani
Hello all.
I have finished my work with the eviction code in Infinispan, here is a
summary of what has happened.
From a user perspective (including API and configuration) as well as
a design overview, please have a look at
http://www.jboss.org/community/docs/DOC-13449
From an implementation perspective, have a look at the srcs of
FIFODataContainer and LRUDataContainer. These two classes are where
everything happens. The javadocs should explain the details, but in a
nutshell you can expect constant time operations for all puts, gets,
removes, iterations. :-)
Feedback on the impls would be handy. :-)
Cheers
--
Manik Surtani
Lead, JBoss Cache
http://www.jbosscache.org
manik(a)jboss.org
15 years, 6 months
Re: Infinispan and search APIs
by Manik Surtani
Emmanuel and I started discussing this offline, bringing this on to
the ML now.
On 5 May 2009, at 16:49, Emmanuel Bernard wrote:
>
> On May 1, 2009, at 16:13, Manik Surtani wrote:
>
>> Hey dude
>>
>> When you have some time, could we chat about this? Since we have a
>> chance to build any hooks into Infinispan that we may need in
>> future for the query API, it would be good to get an idea of this
>> now.
>
> you need:
> - change notification (create, update, delete)
> - define a change notification as belonging to a context (tx usually)
> - define a way to execute things at context ending (typically
> tx.commit)
> - start / stop notifications incl the ability to pass properties the
> Infinispan way (probably materialized as Map<String, Object> and the
> list of object types being indexed. (relaxing this constraint leads
> to a reduction of concurrency in HSearch)
> - ability to access the HSearch change interceptor to retrieve the
> SearchFactory from the main Infinispan api (could be a SPI) (ie from
> an Hibernate SessionImpl we can go access the SessionFactory and the
> inerceptor and then the SearchFactory
> - might need an object key from an object (not sure)
> - ability to load a set of objects by id (in batch)
Yes, this is all stuff we figured out in JBC-Searchable. All
straightforward enough to do. And rather than implement as a listener
as we did in JBC, we ought to make this an interceptor - it will
perform better and you have greater access to transaction context.
> I think that's the most important hooks.
> Also the ability to put a Lucene index on infinispan
Well, there should be two options to storing indexes. The first, a
simplistic approach where indexes are replicated everywhere, is
achieved by using a separate cache for indexes which uses REPL
regardless of what cache mode the data cache uses. Any query on any
instance uses the cached indexes, to retrieve keys, and then does a
get() on the data cache to load keys. This load may retrieve the
entries from across the network if the cache is in DIST mode, load off
a cache store, etc.
The second approach to storing indexes is more interesting IMO. Each
node only maintains LOCAL (*) indexes for keys that are mapped on to
itself. E.g., in a cluster of {A, B, C, D, E}, we have {K1, K2, K3}
mapped to A and {K4, K5, K6} mapped to C. A stores indexes for K1 -
K3 locally, and C stores indexes for K4 - K6. Running a query on,
say, E, would result in a query RPC call broadcast around the cluster
and each node runs the query on their local indexes only, returning
results to E. E then collates these results. I think this is much
more scalable as a) the indexes themselves are fragmented and the
system will be able to cope with more indexes as you add more nodes,
and b) processing time to search through the indexes is divided up
between the processors.
Naturally there is overhead in broadcasting the query and collating
results, and this is why we need efficient result set implementations
that could retrieve this lazily, etc. E.g., your result iterator
should not wait for all results to arrive before starting to iterate
through what you have, and nodes should first send back result counts
before actually loading and sending back results, etc. But this is a
part of what we need to design.
(*) We could use DIST for the indexes as well, provided the data
indexed and the indexes are hashed to the same nodes for storage.
This is, IMO, tricky since indexes would span several entries, not all
of which are guaranteed to be hashed to the same nodes. Hence my
thoughts on indexes being stored in a LOCAL cache.
>> Like I was saying, one of the things we would need is the ability
>> to do aggregate queries. The "nice to have" would be to offer an
>> EJBQL style interface rather than a Lucene one, but the second part
>> is not crucial IMO.
>
> aggregate query is kinda possible:
> - count(*) is really just getResultSize()
> - distinct count(property) is harder, needs some agregattion in
> memory but likely possible
> - sum() / avg() would need some actual aggregation in memory from
> all matching results we could store some values in the lucene index
> and sum them
> - group by / having is harder and will probably have to either be
> done in memory or done by calling n queries, one per "group" (likely
> doable)
If you feel this can be achieved easily enough using Lucene queries,
I'm fine with this being the basis of our impl.
> JPA-QL query is more a fantasy, just like GAE has "support" for JPA-
> QL. Every time join is used, you would be fucked.
It's just a familiarity thing - I don't necessarily need JPA-QL in
itself, just that it has decent and familiar support for aggregation
and query writing in general. Joins would obviously not be supported
and the query parser would barf on encountering one.
The other plus with JPA-QL is that it could offer easy migration off
JPA and onto a data grid to some degree (certain caveats exist such as
joins though). Like I said, this isn't important, could even be a
plugin for later on that translates JPA-QL to Lucene queries.
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 6 months
expose LogFactory.IS_LOG4J_AVAILABLE ?
by Adrian Cole
Hi, team.
Does anyone mind if I expose LogFactory.IS_LOG4J_AVAILABLE? I'd rather not
copy/paste the detection logic into the S3 module who has similar needs.
Cheers,
-Adrian
15 years, 6 months
OOB for commits
by Mircea Markus
Hi,
Currently we are sending all the commit messages flagged as OOB, not the
same about prepares.
Guess the reason for this is to make commits move quicker on the wire -
am I right?
If so, what about doing the same for the rest of tx messages (prepares
and rollbacks)?
Cheers,
Mircea
15 years, 6 months
Infinispan - runGuiDemo.bat
by list. rb
I thought I'd share my infinispan start script for win32 platforms.
Is it worthy of committing? This is my first open-source contribution; just
wanted to help.
Kind regards
SETLOCAL ENABLEDELAYEDEXPANSION
@echo off
rem set JAVA_HOME="C:\Program Files\Java\jre1.5.0_06"
rem set PATH=%JAVA_HOME%\bin;%PATH%
if not exist bin (cd .. )
set INFINISPAN_HOME=%CD%
set CP="%INFINISPAN_HOME%\etc"
set JVM_PARAMS="-Djava.net.preferIPv4Stack=true
-Dlog4j.configuration=%INFINISPAN_HOME%\etc\log4j.xml"
set module=core
for /F "tokens=* delims=\" %%a in ('dir /S /B /AA
%INFINISPAN_HOME%\modules\%module%\*.jar') do set CP=%%a;!CP!
set module=gui-demo
for /F "tokens=* delims=\" %%a in ('dir /S /B /AA
%INFINISPAN_HOME%\modules\%module%\*.jar') do set CP=%%a;!CP!
java -cp %CP% %JVM_PARAMS% org.infinispan.demo.InfinispanDemo
15 years, 6 months