[infinispan-dev] multi-mapping with indexing - do we need big-table

Thu Sep 15 09:39:53 EDT 2011

Thanks Manik.

We want to avoid transactions because of the additional perceptible
overhead. Our opinion is that the use case does not literally involve
multiple datastores/ caches and hence transactions should be avoided as far
as possible. Do you have different thoughts/ inputs?

If we go with Lucene option - is there a way to calculate the memory
footprint required for the indices based upon the field length etc.

Kapil

On Wed, Sep 14, 2011 at 8:22 AM, Manik Surtani <manik at jboss.org> wrote:

> Hi Kapil
>
> After reading through this again, it is indeed an interesting use case.  My
> comments inline:
>
> On 9 Sep 2011, at 05:23, kapil nayar wrote:
>
> We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
> Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.
>
> The mappings would be something like (assume that C would be stored along
> side B):
> A1-> B1, B2
> A2-> B3, B5
> A3-> B4, B6, B7
>
> Now, we would need the following indexes:
> A->B and B->A
>
> Notice, that both are unique mappings. However, as shown A has multiple
> mappings to B.
> The big-table type of data structure allow this and make it pretty easy off
> the shelf.
>
> Now, I am trying to explore if we can implement these mappings with
> Infinispan.
> We may need a basic multi-map - to store multiple values for the same key
> in the cache.
>
> 1. The "get" would return the complete list of the values.
> 2. The "put" would add the new value without replacing the existing value.
> 3. The "remove" would remove a specific value or optionally all values
> associated with the key.
> 4. These operations (especially "put") on the same key can occur
> simultaneously from multiple nodes.
>
> I know there is an atomic map option in Infinispan which may be applicable,
> but AFAIK it requires transactions (which we want to avoid..).
>
>
> The AtomicMap does do this, but will lock the entire map for any operation.
>  We're working on a FineGrainedMap as well, which will allow concurrent
> updates to contents within the map.  See
> https://issues.jboss.org/browse/ISPN-1115
>
> However this too is likely to require JTA transactions for consistency.
>  Could you explain why you wish to avoid transactions?
>
>
> Alternatively, perhaps Infinispan (in combination with lucene) can be used.
> 1. We should be able to create data structure {B, C} and store A-> {B,C}
> with indexes defined for B.
> 2. Also, the key A could be structured as a combination of A+B to store
> multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow
> wild carded searches. e.g. To look for all A1 values we could do something
> like A1* which should return both A1B1 and A2B2....I may be making some
> assumptions here (feel free to correct!)
>
>
> Yes, this should be possible.
>
> 3. There seems to be one bottleneck though - since the cache mode is
> "distribution", it seems it is mandatory to use a backend DB to store these
> indexes and moreover the DB needs to be shared. This requirement actually
> seems to defeat the purpose of using Infinispan.
>
>
> Not necessarily.  You can configure Lucene to store indexes in a replicated
> Infinispan cache as well.  This means the indexes are globally available,
> and in-memory.  You would need a lot of memory though!  :)
>
> Cheers
> Manik
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20110915/bdd55580/attachment.html