On 15 Sep 2011, at 14:39, kapil nayar wrote:

Thanks Manik.

We want to avoid transactions because of the additional perceptible overhead. Our opinion is that the use case does not literally involve multiple datastores/ caches and hence transactions should be avoided as far as possible. Do you have different thoughts/ inputs?

If we go with Lucene option - is there a way to calculate the memory footprint required for the indices based upon the field length etc.

Kapil

On Wed, Sep 14, 2011 at 8:22 AM, Manik Surtani <manik@jboss.org> wrote:

Hi Kapil

After reading through this again, it is indeed an interesting use case. My comments inline:

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....} which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

The AtomicMap does do this, but will lock the entire map for any operation. We're working on a FineGrainedMap as well, which will allow concurrent updates to contents within the map. See https://issues.jboss.org/browse/ISPN-1115

However this too is likely to require JTA transactions for consistency. Could you explain why you wish to avoid transactions?

Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)

Yes, this should be possible.

3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Not necessarily. You can configure Lucene to store indexes in a replicated Infinispan cache as well. This means the indexes are globally available, and in-memory. You would need a lot of memory though! :)

Cheers
Manik
--
Manik Surtani
manik@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Manik Surtani

manik@jboss.org

twitter.com/maniksurtani

Lead, Infinispan

http://www.infinispan.org