Thanks Manik.
We want to avoid transactions because of the additional perceptible
overhead. Our opinion is that the use case does not literally involve
multiple datastores/ caches and hence transactions should be avoided as far
as possible. Do you have different thoughts/ inputs?
If we go with Lucene option - is there a way to calculate the memory
footprint required for the indices based upon the field length etc.
Kapil
On Wed, Sep 14, 2011 at 8:22 AM, Manik Surtani <manik(a)jboss.org> wrote:
Hi Kapil
After reading through this again, it is indeed an interesting use case. My
comments inline:
On 9 Sep 2011, at 05:23, kapil nayar wrote:
We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....} which has 1:1 mapping.
The mappings would be something like (assume that C would be stored along
side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7
Now, we would need the following indexes:
A->B and B->A
Notice, that both are unique mappings. However, as shown A has multiple
mappings to B.
The big-table type of data structure allow this and make it pretty easy off
the shelf.
Now, I am trying to explore if we can implement these mappings with
Infinispan.
We may need a basic multi-map - to store multiple values for the same key
in the cache.
1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values
associated with the key.
4. These operations (especially "put") on the same key can occur
simultaneously from multiple nodes.
I know there is an atomic map option in Infinispan which may be applicable,
but AFAIK it requires transactions (which we want to avoid..).
The AtomicMap does do this, but will lock the entire map for any operation.
We're working on a FineGrainedMap as well, which will allow concurrent
updates to contents within the map. See
https://issues.jboss.org/browse/ISPN-1115
However this too is likely to require JTA transactions for consistency.
Could you explain why you wish to avoid transactions?
Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C}
with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store
multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow
wild carded searches. e.g. To look for all A1 values we could do something
like A1* which should return both A1B1 and A2B2....I may be making some
assumptions here (feel free to correct!)
Yes, this should be possible.
3. There seems to be one bottleneck though - since the cache mode is
"distribution", it seems it is mandatory to use a backend DB to store these
indexes and moreover the DB needs to be shared. This requirement actually
seems to defeat the purpose of using Infinispan.
Not necessarily. You can configure Lucene to store indexes in a replicated
Infinispan cache as well. This means the indexes are globally available,
and in-memory. You would need a lot of memory though! :)
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev