Hi Kapil
After reading through this again, it is indeed an interesting use case. My comments
inline:
On 9 Sep 2011, at 05:23, kapil nayar wrote:
We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....} which has 1:1 mapping.
The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7
Now, we would need the following indexes:
A->B and B->A
Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.
Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.
1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values
associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously
from multiple nodes.
I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it
requires transactions (which we want to avoid..).
The AtomicMap does do this, but will lock the entire map for any operation. We're
working on a FineGrainedMap as well, which will allow concurrent updates to contents
within the map. See
https://issues.jboss.org/browse/ISPN-1115
However this too is likely to require JTA transactions for consistency. Could you explain
why you wish to avoid transactions?
Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes
defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries
like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g.
To look for all A1 values we could do something like A1* which should return both A1B1 and
A2B2....I may be making some assumptions here (feel free to correct!)
Yes, this should be possible.
3. There seems to be one bottleneck though - since the cache mode is
"distribution", it seems it is mandatory to use a backend DB to store these
indexes and moreover the DB needs to be shared. This requirement actually seems to defeat
the purpose of using Infinispan.
Not necessarily. You can configure Lucene to store indexes in a replicated Infinispan
cache as well. This means the indexes are globally available, and in-memory. You would
need a lot of memory though! :)
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org