[infinispan-dev] Experiment: Affinity Tagging

Dan Berindei dan.berindei at gmail.com
Thu Jan 22 10:44:20 EST 2015


On Wed, Jan 21, 2015 at 3:28 AM, Sanne Grinovero <sanne at infinispan.org>
wrote:

> On 20 January 2015 at 14:33, Adrian Nistor <anistor at redhat.com> wrote:
> > None of the existing Hash implementations can, but this new one will be
> > special. It could have access to the config (and CH) of the user's cache
> so
> > it will know the number of segments. The index cache will have to use the
> > same type of CH as the data cache in order to keep ownership in sync and
> the
> > Hash implementation will be the special delegating Hash.
> >
> > There is a twist though, the above only works with SyncConsistentHash.
> > Bacause when two caches with identical topology use DefaultConsistentHash
> > they could still not be in sync in terms of key ownership. Only
> > SyncConsistentHash ensures that.
>
> Many thanks for pointing out the need for a SyncConsistentHashFactory,
> I was not aware of the limitations described in the javadoc.
>
> Side note: I'm surprised of the limitation of the normal
> ConsistentHashFactory as described in the javadoc of
> SyncConsistentHashFactory .. is it because our normal implementation
> is actually not "Consistent" ? Or is it referring to additional
> properties of our Hash function?
>
>
Yes, it's because our DefaultConsistentHash isn't really consistent - i.e.
the mapping of segments to nodes depends on more than just the addresses of
the nodes. It seemed like the best way to fix the load distribution
problems we had at the time, but it is starting to feel a little painful
now.

Another property of the "real" consistent hash is that a key will only move
from an existing owner to a joiner, and there are no unnecessary moves
between the existing nodes. It's a nice property, but I'm afraid we never
really had it in Infinispan because of the way we handled multiple nodes
with the same hashcode. I didn't manage to get this working in
SyncConsistentHashFactory while also keeping a nice mostly-even
distribution of segments, but I haven't completely given up on it yet...


> Cheers,
> Sanne
>
>
> >
> > Knowledge of how CH currently maps hashcodes to segments is assumed
> already.
> > I've spotted at least 3 places in code where it happens, so it is time to
> > document it or move this responsibility to the Hash interface as you
> suggest
> > to make it really pluggable.
> >
> > Adrian
> >
> >
> > On 01/20/2015 03:32 PM, Dan Berindei wrote:
> >
> > Adrian, I don't think that will work. The Hash doesn't know the number of
> > segments so it can't tell where a particular key will land - even
> assuming
> > knowledge about how the ConsistentHash will map hash codes to segments.
> >
> > However, I'm all for replacing the current Hash interface with another
> > interface that maps keys directly to segments.
> >
> > Cheers
> > Dan
> >
> >
> > On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor at redhat.com>
> wrote:
> >>
> >> Hi Sanne,
> >>
> >> An alternative approach would be to implement an
> >> org.infinispan.commons.hash.Hash which delegates to the stock
> >> implementation for all keys except those that need to be assigned to a
> >> specific segment. It should return the desired segment for those.
> >>
> >> Adrian
> >>
> >>
> >> On 01/20/2015 02:48 AM, Sanne Grinovero wrote:
> >> > Hi all,
> >> >
> >> > I'm playing with an idea for some internal components to be able to
> >> > "tag" the key for an entry to be stored into Infinispan in a very
> >> > specific segment of the CH.
> >> >
> >> > Conceptually the plan is easy to understand by looking at this patch:
> >> >
> >> >
> >> >
> https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174d23037335f
> >> >
> >> > Hacking the change into ReplicatedConsistentHash is quite barbaric,
> >> > please bear with me as I couldn't figure a better way to be able to
> >> > experiment with this. I'll probably want to extend this class, but
> >> > then I'm not sure how to plug it in?
> >
> >
> > You would need to create your own ConsistentHashFactory, possibly
> extending
> > ReplicatedConsistentHashFactory. You can then plug the factory in with
> >
> >
> configurationBuilder.clustering().hash().consistentHashFactory(yourFactory)
> >
> > However, this isn't a really good idea, because then you need a different
> > implementation for distributed mode, and then another implementation for
> > topology-aware clusters (with rack/machine/site ids). And your users
> would
> > also need to select the proper factory for each cache.
> >
> >>
> >> >
> >> > What would you all think of such a "tagging" mechanism?
> >> >
> >> > # Why I didn't use the KeyAffinityService
> >> > - I need to use my own keys, not the meaningless stuff produced by the
> >> > service
> >> > - the extensive usage of Random in there doesn't seem suited for a
> >> > performance critical path
> >
> >
> > You can plug in your own KeyGenerator to generate keys, and maybe replace
> > the Random with a static/thread-local counter.
> >
> >>
> >>
> >>
> >> >
> >> > # Why I didn't use the Grouping API
> >> > - I need to pick the specific storage segment, not just co-locate with
> >> > a different key
> >> >
> >
> >
> > This is actually a drawback of the KeyAffinityService more than Grouping.
> > With grouping, you can actually follow the KeyAffinityService strategy
> and
> > generate random strings until you get one in the proper segment, and then
> > tag all your keys with that exact string.
> >
> >>
> >> >
> >> > The general goal is to make it possible to "tag" all entries of an
> >> > index, and have an independent index for each segment of the CH. So
> >> > the resulting effect would be, that when a primary owner for any key K
> >> > is making an update, and this triggers an index update, that update is
> >> >   A) going to happen on the same node -> no need to forwarding to a
> >> > "master indexing node"
> >> >   B) each such writes on the index happen on the same node which is
> >> > primary owner for all the written entries of the index.
> >> >
> >> > There are two additional nice consequences:
> >> >   - there would be no need to perform a reliable "master election":
> >> > ownership singleton is already guaranteed by Infinispan's essential
> >> > logic, so it would reuse that
> >> >   - the propagation of writes on the index from the primary owner
> >> > (which is the local node by definition) to backup owners could use
> >> > REPL_ASYNC for most practical use cases.
> >> >
> >> > So net result is that the overhead for indexing is reduced to 0 (ZERO)
> >> > blocking RPCs if the async repl is acceptable, or to only one blocking
> >> > roundtrip if very strict consistency is required.
> >
> >
> > Sounds very interesting, but I think there may be a problem with your
> > strategy: Infinispan doesn't guarantee you that one of the nodes
> executing
> > the CommitCommand is the primary owner at the time the CommitCommand is
> > executed. You could have something like this:
> >
> > Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary)
> > C initiates a tx that executes put(k, v)
> > Tx prepare succeeds on A and B
> > A crashes, but the other nodes don't detect the crash yet
> > Tx commit succeeds on B, who still thinks is a backup owner
> > B detects the crash, installs a new cluster view consistent hash with
> > owners(k) = [B]
> >
> >
> >>
> >> >
> >> > Thanks,
> >> > Sanne
> >> > _______________________________________________
> >> > infinispan-dev mailing list
> >> > infinispan-dev at lists.jboss.org
> >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20150122/1ddf0ca1/attachment.html 


More information about the infinispan-dev mailing list