[infinispan-dev] Experiment: Affinity Tagging

Sanne Grinovero sanne at infinispan.org
Tue Jan 20 20:12:14 EST 2015


On 20 January 2015 at 13:32, Dan Berindei <dan.berindei at gmail.com> wrote:
> Adrian, I don't think that will work. The Hash doesn't know the number of
> segments so it can't tell where a particular key will land - even assuming
> knowledge about how the ConsistentHash will map hash codes to segments.
>
> However, I'm all for replacing the current Hash interface with another
> interface that maps keys directly to segments.

Right, I'll eventually need a different abstraction, or a change to
the Hash interface. However my need seems highly specialistic, I'm not
sure if there would be a general interest into such a capability for
other Hash implementors?

>
> Cheers
> Dan

Never ever sign if you have more interesting comments below, I only
saw them by chance ;)


> On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor at redhat.com> wrote:
>>
>> Hi Sanne,
>>
>> An alternative approach would be to implement an
>> org.infinispan.commons.hash.Hash which delegates to the stock
>> implementation for all keys except those that need to be assigned to a
>> specific segment. It should return the desired segment for those.
>>
>> Adrian
>>
>>
>> On 01/20/2015 02:48 AM, Sanne Grinovero wrote:
>> > Hi all,
>> >
>> > I'm playing with an idea for some internal components to be able to
>> > "tag" the key for an entry to be stored into Infinispan in a very
>> > specific segment of the CH.
>> >
>> > Conceptually the plan is easy to understand by looking at this patch:
>> >
>> >
>> > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174d23037335f
>> >
>> > Hacking the change into ReplicatedConsistentHash is quite barbaric,
>> > please bear with me as I couldn't figure a better way to be able to
>> > experiment with this. I'll probably want to extend this class, but
>> > then I'm not sure how to plug it in?
>
>
> You would need to create your own ConsistentHashFactory, possibly extending
> ReplicatedConsistentHashFactory. You can then plug the factory in with
>
> configurationBuilder.clustering().hash().consistentHashFactory(yourFactory)
>
> However, this isn't a really good idea, because then you need a different
> implementation for distributed mode, and then another implementation for
> topology-aware clusters (with rack/machine/site ids). And your users would
> also need to select the proper factory for each cache.

Right, this is the complexity I was facing. I'll stick to my hack
solution for our little POC... but ultimately I'll need to plug this
in if none of the solutions below work out.

>> > What would you all think of such a "tagging" mechanism?
>> >
>> > # Why I didn't use the KeyAffinityService
>> > - I need to use my own keys, not the meaningless stuff produced by the
>> > service
>> > - the extensive usage of Random in there doesn't seem suited for a
>> > performance critical path
>
>
> You can plug in your own KeyGenerator to generate keys, and maybe replace
> the Random with a static/thread-local counter.

Thanks for the tip on KeyGenerator, I'll investigate on that :)
But I'll never add a static/threadlocal.. I'd rather commit my ugly
code from the commit linked above.

>
>>
>>
>>
>> >
>> > # Why I didn't use the Grouping API
>> > - I need to pick the specific storage segment, not just co-locate with
>> > a different key
>> >
>
>
> This is actually a drawback of the KeyAffinityService more than Grouping.
> With grouping, you can actually follow the KeyAffinityService strategy and
> generate random strings until you get one in the proper segment, and then
> tag all your keys with that exact string.

Interesting! A bit convoluted but could spare me to plug in the HashFactory.
BTW I really dislike this idea of the KeyAffinityService to generate
random keys until it works out..  I guess it might not be too bad if
you want to pick a node out of ten, but I'm working at segment
granularity level and with the right luck it could take a long time.
It would be nice to have a function like this which would return in a
deterministic amount of time, like simply an inverse Hash.

>> > The general goal is to make it possible to "tag" all entries of an
>> > index, and have an independent index for each segment of the CH. So
>> > the resulting effect would be, that when a primary owner for any key K
>> > is making an update, and this triggers an index update, that update is
>> >   A) going to happen on the same node -> no need to forwarding to a
>> > "master indexing node"
>> >   B) each such writes on the index happen on the same node which is
>> > primary owner for all the written entries of the index.
>> >
>> > There are two additional nice consequences:
>> >   - there would be no need to perform a reliable "master election":
>> > ownership singleton is already guaranteed by Infinispan's essential
>> > logic, so it would reuse that
>> >   - the propagation of writes on the index from the primary owner
>> > (which is the local node by definition) to backup owners could use
>> > REPL_ASYNC for most practical use cases.
>> >
>> > So net result is that the overhead for indexing is reduced to 0 (ZERO)
>> > blocking RPCs if the async repl is acceptable, or to only one blocking
>> > roundtrip if very strict consistency is required.
>
>
> Sounds very interesting, but I think there may be a problem with your
> strategy: Infinispan doesn't guarantee you that one of the nodes executing
> the CommitCommand is the primary owner at the time the CommitCommand is
> executed. You could have something like this:

Index storage is generally used without transactions, but even
assuming we had transactions enabled or the "vanilla" put operation
suffered had a similar timing issue (as we'd determine this node to be
owner higher up in the search stack, before the actual put reaches the
Infinispan core API) it's not a problem as we'd simply lose locality
of the write: it would be slightly less efficient, but still write the
"right thing".
The intention is to maximise locality with the hints, but failing
locality on writes I just expect it to be handled as any other put
operation on Infinispan.. with a couple more RPCs, with any race
condition regarding topology changes being handled at lower level.
Which is exactly why I'm now working on top of  container segments:
they are a stable building block and allow us to not worry on how
you'll actually distribute data or re-route update commands.

Thanks all for the suggestions!
Sanne

>
> Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary)
> C initiates a tx that executes put(k, v)
> Tx prepare succeeds on A and B
> A crashes, but the other nodes don't detect the crash yet
> Tx commit succeeds on B, who still thinks is a backup owner
> B detects the crash, installs a new cluster view consistent hash with
> owners(k) = [B]
>
>
>>
>> >
>> > Thanks,
>> > Sanne


More information about the infinispan-dev mailing list