On 20 January 2015 at 13:32, Dan Berindei <dan.berindei(a)gmail.com> wrote:
Adrian, I don't think that will work. The Hash doesn't know
the number of
segments so it can't tell where a particular key will land - even assuming
knowledge about how the ConsistentHash will map hash codes to segments.
However, I'm all for replacing the current Hash interface with another
interface that maps keys directly to segments.
Right, I'll eventually need a different abstraction, or a change to
the Hash interface. However my need seems highly specialistic, I'm not
sure if there would be a general interest into such a capability for
other Hash implementors?
Cheers
Dan
Never ever sign if you have more interesting comments below, I only
saw them by chance ;)
On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor
<anistor(a)redhat.com> wrote:
>
> Hi Sanne,
>
> An alternative approach would be to implement an
> org.infinispan.commons.hash.Hash which delegates to the stock
> implementation for all keys except those that need to be assigned to a
> specific segment. It should return the desired segment for those.
>
> Adrian
>
>
> On 01/20/2015 02:48 AM, Sanne Grinovero wrote:
> > Hi all,
> >
> > I'm playing with an idea for some internal components to be able to
> > "tag" the key for an entry to be stored into Infinispan in a very
> > specific segment of the CH.
> >
> > Conceptually the plan is easy to understand by looking at this patch:
> >
> >
> >
https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174...
> >
> > Hacking the change into ReplicatedConsistentHash is quite barbaric,
> > please bear with me as I couldn't figure a better way to be able to
> > experiment with this. I'll probably want to extend this class, but
> > then I'm not sure how to plug it in?
You would need to create your own ConsistentHashFactory, possibly extending
ReplicatedConsistentHashFactory. You can then plug the factory in with
configurationBuilder.clustering().hash().consistentHashFactory(yourFactory)
However, this isn't a really good idea, because then you need a different
implementation for distributed mode, and then another implementation for
topology-aware clusters (with rack/machine/site ids). And your users would
also need to select the proper factory for each cache.
Right, this is the complexity I was facing. I'll stick to my hack
solution for our little POC... but ultimately I'll need to plug this
in if none of the solutions below work out.
> > What would you all think of such a "tagging"
mechanism?
> >
> > # Why I didn't use the KeyAffinityService
> > - I need to use my own keys, not the meaningless stuff produced by the
> > service
> > - the extensive usage of Random in there doesn't seem suited for a
> > performance critical path
You can plug in your own KeyGenerator to generate keys, and maybe replace
the Random with a static/thread-local counter.
Thanks for the tip on KeyGenerator, I'll investigate on that :)
But I'll never add a static/threadlocal.. I'd rather commit my ugly
code from the commit linked above.
>
>
>
> >
> > # Why I didn't use the Grouping API
> > - I need to pick the specific storage segment, not just co-locate with
> > a different key
> >
This is actually a drawback of the KeyAffinityService more than Grouping.
With grouping, you can actually follow the KeyAffinityService strategy and
generate random strings until you get one in the proper segment, and then
tag all your keys with that exact string.
Interesting! A bit convoluted but could spare me to plug in the HashFactory.
BTW I really dislike this idea of the KeyAffinityService to generate
random keys until it works out.. I guess it might not be too bad if
you want to pick a node out of ten, but I'm working at segment
granularity level and with the right luck it could take a long time.
It would be nice to have a function like this which would return in a
deterministic amount of time, like simply an inverse Hash.
> > The general goal is to make it possible to "tag"
all entries of an
> > index, and have an independent index for each segment of the CH. So
> > the resulting effect would be, that when a primary owner for any key K
> > is making an update, and this triggers an index update, that update is
> > A) going to happen on the same node -> no need to forwarding to a
> > "master indexing node"
> > B) each such writes on the index happen on the same node which is
> > primary owner for all the written entries of the index.
> >
> > There are two additional nice consequences:
> > - there would be no need to perform a reliable "master election":
> > ownership singleton is already guaranteed by Infinispan's essential
> > logic, so it would reuse that
> > - the propagation of writes on the index from the primary owner
> > (which is the local node by definition) to backup owners could use
> > REPL_ASYNC for most practical use cases.
> >
> > So net result is that the overhead for indexing is reduced to 0 (ZERO)
> > blocking RPCs if the async repl is acceptable, or to only one blocking
> > roundtrip if very strict consistency is required.
Sounds very interesting, but I think there may be a problem with your
strategy: Infinispan doesn't guarantee you that one of the nodes executing
the CommitCommand is the primary owner at the time the CommitCommand is
executed. You could have something like this:
Index storage is generally used without transactions, but even
assuming we had transactions enabled or the "vanilla" put operation
suffered had a similar timing issue (as we'd determine this node to be
owner higher up in the search stack, before the actual put reaches the
Infinispan core API) it's not a problem as we'd simply lose locality
of the write: it would be slightly less efficient, but still write the
"right thing".
The intention is to maximise locality with the hints, but failing
locality on writes I just expect it to be handled as any other put
operation on Infinispan.. with a couple more RPCs, with any race
condition regarding topology changes being handled at lower level.
Which is exactly why I'm now working on top of container segments:
they are a stable building block and allow us to not worry on how
you'll actually distribute data or re-route update commands.
Thanks all for the suggestions!
Sanne
Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary)
C initiates a tx that executes put(k, v)
Tx prepare succeeds on A and B
A crashes, but the other nodes don't detect the crash yet
Tx commit succeeds on B, who still thinks is a backup owner
B detects the crash, installs a new cluster view consistent hash with
owners(k) = [B]
>
> >
> > Thanks,
> > Sanne