Experiment: Affinity Tagging

Infinispan 7.1.0.CR2 released

Consistency guarantees after merge...

Sanne Grinovero

Monday, 19 January 2015 Mon, 19 Jan '15

6:48 p.m.

Hi all, I'm playing with an idea for some internal components to be able to "tag" the key for an entry to be stored into Infinispan in a very specific segment of the CH. Conceptually the plan is easy to understand by looking at this patch: https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... Hacking the change into ReplicatedConsistentHash is quite barbaric, please bear with me as I couldn't figure a better way to be able to experiment with this. I'll probably want to extend this class, but then I'm not sure how to plug it in? What would you all think of such a "tagging" mechanism? # Why I didn't use the KeyAffinityService - I need to use my own keys, not the meaningless stuff produced by the service - the extensive usage of Random in there doesn't seem suited for a performance critical path # Why I didn't use the Grouping API - I need to pick the specific storage segment, not just co-locate with a different key The general goal is to make it possible to "tag" all entries of an index, and have an independent index for each segment of the CH. So the resulting effect would be, that when a primary owner for any key K is making an update, and this triggers an index update, that update is A) going to happen on the same node -> no need to forwarding to a "master indexing node" B) each such writes on the index happen on the same node which is primary owner for all the written entries of the index. There are two additional nice consequences: - there would be no need to perform a reliable "master election": ownership singleton is already guaranteed by Infinispan's essential logic, so it would reuse that - the propagation of writes on the index from the primary owner (which is the local node by definition) to backup owners could use REPL_ASYNC for most practical use cases. So net result is that the overhead for indexing is reduced to 0 (ZERO) blocking RPCs if the async repl is acceptable, or to only one blocking roundtrip if very strict consistency is required. Thanks, Sanne

Show replies by date

Adrian Nistor

Monday, 19 January Mon, 19 Jan

8:08 p.m.

...

Dan Berindei

Tuesday, 20 January Tue, 20 Jan

7:32 a.m.

...

You would need to create your own ConsistentHashFactory, possibly extending ReplicatedConsistentHashFactory. You can then plug the factory in with configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) However, this isn't a really good idea, because then you need a different implementation for distributed mode, and then another implementation for topology-aware clusters (with rack/machine/site ids). And your users would also need to select the proper factory for each cache.

...

> > What would you all think of such a "tagging" mechanism? > > # Why I didn't use the KeyAffinityService > - I need to use my own keys, not the meaningless stuff produced by the service > - the extensive usage of Random in there doesn't seem suited for a > performance critical path

You can plug in your own KeyGenerator to generate keys, and maybe replace the Random with a static/thread-local counter.

...

> # Why I didn't use the Grouping API > - I need to pick the specific storage segment, not just co-locate with > a different key >

This is actually a drawback of the KeyAffinityService more than Grouping. With grouping, you can actually follow the KeyAffinityService strategy and generate random strings until you get one in the proper segment, and then tag all your keys with that exact string.

...

> > The general goal is to make it possible to "tag" all entries of an > index, and have an independent index for each segment of the CH. So > the resulting effect would be, that when a primary owner for any key K > is making an update, and this triggers an index update, that update is > A) going to happen on the same node -> no need to forwarding to a > "master indexing node" > B) each such writes on the index happen on the same node which is > primary owner for all the written entries of the index. > > There are two additional nice consequences: > - there would be no need to perform a reliable "master election": > ownership singleton is already guaranteed by Infinispan's essential > logic, so it would reuse that > - the propagation of writes on the index from the primary owner > (which is the local node by definition) to backup owners could use > REPL_ASYNC for most practical use cases. > > So net result is that the overhead for indexing is reduced to 0 (ZERO) > blocking RPCs if the async repl is acceptable, or to only one blocking > roundtrip if very strict consistency is required.

Sounds very interesting, but I think there may be a problem with your strategy: Infinispan doesn't guarantee you that one of the nodes executing the CommitCommand is the primary owner at the time the CommitCommand is executed. You could have something like this: Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) C initiates a tx that executes put(k, v) Tx prepare succeeds on A and B A crashes, but the other nodes don't detect the crash yet Tx commit succeeds on B, who still thinks is a backup owner B detects the crash, installs a new cluster view consistent hash with owners(k) = [B]

...

> > Thanks, > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Adrian Nistor

8:33 a.m.

None of the existing Hash implementations can, but this new one will be special. It could have access to the config (and CH) of the user's cache so it will know the number of segments. The index cache will have to use the same type of CH as the data cache in order to keep ownership in sync and the Hash implementation will be the special delegating Hash. There is a twist though, the above only works with SyncConsistentHash. Bacause when two caches with identical topology use DefaultConsistentHash they could still not be in sync in terms of key ownership. Only SyncConsistentHash ensures that. Knowledge of how CH currently maps hashcodes to segments is assumed already. I've spotted at least 3 places in code where it happens, so it is time to document it or move this responsibility to the Hash interface as you suggest to make it really pluggable. Adrian On 01/20/2015 03:32 PM, Dan Berindei wrote:

...

Adrian, I don't think that will work. The Hash doesn't know the number of segments so it can't tell where a particular key will land - even assuming knowledge about how the ConsistentHash will map hash codes to segments. However, I'm all for replacing the current Hash interface with another interface that maps keys directly to segments. Cheers Dan On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor(a)redhat.com <mailto:anistor@redhat.com>> wrote: Hi Sanne, An alternative approach would be to implement an org.infinispan.commons.hash.Hash which delegates to the stock implementation for all keys except those that need to be assigned to a specific segment. It should return the desired segment for those. Adrian On 01/20/2015 02:48 AM, Sanne Grinovero wrote: > Hi all, > > I'm playing with an idea for some internal components to be able to > "tag" the key for an entry to be stored into Infinispan in a very > specific segment of the CH. > > Conceptually the plan is easy to understand by looking at this patch: > > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... > > Hacking the change into ReplicatedConsistentHash is quite barbaric, > please bear with me as I couldn't figure a better way to be able to > experiment with this. I'll probably want to extend this class, but > then I'm not sure how to plug it in? You would need to create your own ConsistentHashFactory, possibly extending ReplicatedConsistentHashFactory. You can then plug the factory in with configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) However, this isn't a really good idea, because then you need a different implementation for distributed mode, and then another implementation for topology-aware clusters (with rack/machine/site ids). And your users would also need to select the proper factory for each cache. > > What would you all think of such a "tagging" mechanism? > > # Why I didn't use the KeyAffinityService > - I need to use my own keys, not the meaningless stuff produced by the service > - the extensive usage of Random in there doesn't seem suited for a > performance critical path You can plug in your own KeyGenerator to generate keys, and maybe replace the Random with a static/thread-local counter. > > # Why I didn't use the Grouping API > - I need to pick the specific storage segment, not just co-locate with > a different key > This is actually a drawback of the KeyAffinityService more than Grouping. With grouping, you can actually follow the KeyAffinityService strategy and generate random strings until you get one in the proper segment, and then tag all your keys with that exact string. > > The general goal is to make it possible to "tag" all entries of an > index, and have an independent index for each segment of the CH. So > the resulting effect would be, that when a primary owner for any key K > is making an update, and this triggers an index update, that update is > A) going to happen on the same node -> no need to forwarding to a > "master indexing node" > B) each such writes on the index happen on the same node which is > primary owner for all the written entries of the index. > > There are two additional nice consequences: > - there would be no need to perform a reliable "master election": > ownership singleton is already guaranteed by Infinispan's essential > logic, so it would reuse that > - the propagation of writes on the index from the primary owner > (which is the local node by definition) to backup owners could use > REPL_ASYNC for most practical use cases. > > So net result is that the overhead for indexing is reduced to 0 (ZERO) > blocking RPCs if the async repl is acceptable, or to only one blocking > roundtrip if very strict consistency is required. Sounds very interesting, but I think there may be a problem with your strategy: Infinispan doesn't guarantee you that one of the nodes executing the CommitCommand is the primary owner at the time the CommitCommand is executed. You could have something like this: Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) C initiates a tx that executes put(k, v) Tx prepare succeeds on A and B A crashes, but the other nodes don't detect the crash yet Tx commit succeeds on B, who still thinks is a backup owner B detects the crash, installs a new cluster view consistent hash with owners(k) = [B] > > Thanks, > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

7:28 p.m.

On 20 January 2015 at 14:33, Adrian Nistor <anistor(a)redhat.com> wrote:

...

Many thanks for pointing out the need for a SyncConsistentHashFactory, I was not aware of the limitations described in the javadoc. Side note: I'm surprised of the limitation of the normal ConsistentHashFactory as described in the javadoc of SyncConsistentHashFactory .. is it because our normal implementation is actually not "Consistent" ? Or is it referring to additional properties of our Hash function? Cheers, Sanne

...

Knowledge of how CH currently maps hashcodes to segments is assumed already. I've spotted at least 3 places in code where it happens, so it is time to document it or move this responsibility to the Hash interface as you suggest to make it really pluggable. Adrian On 01/20/2015 03:32 PM, Dan Berindei wrote: Adrian, I don't think that will work. The Hash doesn't know the number of segments so it can't tell where a particular key will land - even assuming knowledge about how the ConsistentHash will map hash codes to segments. However, I'm all for replacing the current Hash interface with another interface that maps keys directly to segments. Cheers Dan On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor(a)redhat.com> wrote: > > Hi Sanne, > > An alternative approach would be to implement an > org.infinispan.commons.hash.Hash which delegates to the stock > implementation for all keys except those that need to be assigned to a > specific segment. It should return the desired segment for those. > > Adrian > > > On 01/20/2015 02:48 AM, Sanne Grinovero wrote: > > Hi all, > > > > I'm playing with an idea for some internal components to be able to > > "tag" the key for an entry to be stored into Infinispan in a very > > specific segment of the CH. > > > > Conceptually the plan is easy to understand by looking at this patch: > > > > > > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... > > > > Hacking the change into ReplicatedConsistentHash is quite barbaric, > > please bear with me as I couldn't figure a better way to be able to > > experiment with this. I'll probably want to extend this class, but > > then I'm not sure how to plug it in? You would need to create your own ConsistentHashFactory, possibly extending ReplicatedConsistentHashFactory. You can then plug the factory in with configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) However, this isn't a really good idea, because then you need a different implementation for distributed mode, and then another implementation for topology-aware clusters (with rack/machine/site ids). And your users would also need to select the proper factory for each cache. > > > > > What would you all think of such a "tagging" mechanism? > > > > # Why I didn't use the KeyAffinityService > > - I need to use my own keys, not the meaningless stuff produced by the > > service > > - the extensive usage of Random in there doesn't seem suited for a > > performance critical path You can plug in your own KeyGenerator to generate keys, and maybe replace the Random with a static/thread-local counter. > > > > > > > # Why I didn't use the Grouping API > > - I need to pick the specific storage segment, not just co-locate with > > a different key > > This is actually a drawback of the KeyAffinityService more than Grouping. With grouping, you can actually follow the KeyAffinityService strategy and generate random strings until you get one in the proper segment, and then tag all your keys with that exact string. > > > > > The general goal is to make it possible to "tag" all entries of an > > index, and have an independent index for each segment of the CH. So > > the resulting effect would be, that when a primary owner for any key K > > is making an update, and this triggers an index update, that update is > > A) going to happen on the same node -> no need to forwarding to a > > "master indexing node" > > B) each such writes on the index happen on the same node which is > > primary owner for all the written entries of the index. > > > > There are two additional nice consequences: > > - there would be no need to perform a reliable "master election": > > ownership singleton is already guaranteed by Infinispan's essential > > logic, so it would reuse that > > - the propagation of writes on the index from the primary owner > > (which is the local node by definition) to backup owners could use > > REPL_ASYNC for most practical use cases. > > > > So net result is that the overhead for indexing is reduced to 0 (ZERO) > > blocking RPCs if the async repl is acceptable, or to only one blocking > > roundtrip if very strict consistency is required. Sounds very interesting, but I think there may be a problem with your strategy: Infinispan doesn't guarantee you that one of the nodes executing the CommitCommand is the primary owner at the time the CommitCommand is executed. You could have something like this: Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) C initiates a tx that executes put(k, v) Tx prepare succeeds on A and B A crashes, but the other nodes don't detect the crash yet Tx commit succeeds on B, who still thinks is a backup owner B detects the crash, installs a new cluster view consistent hash with owners(k) = [B] > > > > > Thanks, > > Sanne > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev(a)lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Dan Berindei

Thursday, 22 January Thu, 22 Jan

9:44 a.m.

On Wed, Jan 21, 2015 at 3:28 AM, Sanne Grinovero <sanne(a)infinispan.org> wrote:

...

On 20 January 2015 at 14:33, Adrian Nistor <anistor(a)redhat.com> wrote: > None of the existing Hash implementations can, but this new one will be > special. It could have access to the config (and CH) of the user's cache so > it will know the number of segments. The index cache will have to use the > same type of CH as the data cache in order to keep ownership in sync and the > Hash implementation will be the special delegating Hash. > > There is a twist though, the above only works with SyncConsistentHash. > Bacause when two caches with identical topology use DefaultConsistentHash > they could still not be in sync in terms of key ownership. Only > SyncConsistentHash ensures that. Many thanks for pointing out the need for a SyncConsistentHashFactory, I was not aware of the limitations described in the javadoc. Side note: I'm surprised of the limitation of the normal ConsistentHashFactory as described in the javadoc of SyncConsistentHashFactory .. is it because our normal implementation is actually not "Consistent" ? Or is it referring to additional properties of our Hash function?

Yes, it's because our DefaultConsistentHash isn't really consistent - i.e. the mapping of segments to nodes depends on more than just the addresses of the nodes. It seemed like the best way to fix the load distribution problems we had at the time, but it is starting to feel a little painful now. Another property of the "real" consistent hash is that a key will only move from an existing owner to a joiner, and there are no unnecessary moves between the existing nodes. It's a nice property, but I'm afraid we never really had it in Infinispan because of the way we handled multiple nodes with the same hashcode. I didn't manage to get this working in SyncConsistentHashFactory while also keeping a nice mostly-even distribution of segments, but I haven't completely given up on it yet...

...

Cheers, Sanne > > Knowledge of how CH currently maps hashcodes to segments is assumed already. > I've spotted at least 3 places in code where it happens, so it is time to > document it or move this responsibility to the Hash interface as you suggest > to make it really pluggable. > > Adrian > > > On 01/20/2015 03:32 PM, Dan Berindei wrote: > > Adrian, I don't think that will work. The Hash doesn't know the number of > segments so it can't tell where a particular key will land - even assuming > knowledge about how the ConsistentHash will map hash codes to segments. > > However, I'm all for replacing the current Hash interface with another > interface that maps keys directly to segments. > > Cheers > Dan > > > On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor(a)redhat.com> wrote: >> >> Hi Sanne, >> >> An alternative approach would be to implement an >> org.infinispan.commons.hash.Hash which delegates to the stock >> implementation for all keys except those that need to be assigned to a >> specific segment. It should return the desired segment for those. >> >> Adrian >> >> >> On 01/20/2015 02:48 AM, Sanne Grinovero wrote: >> > Hi all, >> > >> > I'm playing with an idea for some internal components to be able to >> > "tag" the key for an entry to be stored into Infinispan in a very >> > specific segment of the CH. >> > >> > Conceptually the plan is easy to understand by looking at this patch: >> > >> > >> > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... >> > >> > Hacking the change into ReplicatedConsistentHash is quite barbaric, >> > please bear with me as I couldn't figure a better way to be able to >> > experiment with this. I'll probably want to extend this class, but >> > then I'm not sure how to plug it in? > > > You would need to create your own ConsistentHashFactory, possibly extending > ReplicatedConsistentHashFactory. You can then plug the factory in with > > configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) > > However, this isn't a really good idea, because then you need a different > implementation for distributed mode, and then another implementation for > topology-aware clusters (with rack/machine/site ids). And your users would > also need to select the proper factory for each cache. > >> >> > >> > What would you all think of such a "tagging" mechanism? >> > >> > # Why I didn't use the KeyAffinityService >> > - I need to use my own keys, not the meaningless stuff produced by the >> > service >> > - the extensive usage of Random in there doesn't seem suited for a >> > performance critical path > > > You can plug in your own KeyGenerator to generate keys, and maybe replace > the Random with a static/thread-local counter. > >> >> >> >> > >> > # Why I didn't use the Grouping API >> > - I need to pick the specific storage segment, not just co-locate with >> > a different key >> > > > > This is actually a drawback of the KeyAffinityService more than Grouping. > With grouping, you can actually follow the KeyAffinityService strategy and > generate random strings until you get one in the proper segment, and then > tag all your keys with that exact string. > >> >> > >> > The general goal is to make it possible to "tag" all entries of an >> > index, and have an independent index for each segment of the CH. So >> > the resulting effect would be, that when a primary owner for any key K >> > is making an update, and this triggers an index update, that update is >> > A) going to happen on the same node -> no need to forwarding to a >> > "master indexing node" >> > B) each such writes on the index happen on the same node which is >> > primary owner for all the written entries of the index. >> > >> > There are two additional nice consequences: >> > - there would be no need to perform a reliable "master election": >> > ownership singleton is already guaranteed by Infinispan's essential >> > logic, so it would reuse that >> > - the propagation of writes on the index from the primary owner >> > (which is the local node by definition) to backup owners could use >> > REPL_ASYNC for most practical use cases. >> > >> > So net result is that the overhead for indexing is reduced to 0 (ZERO) >> > blocking RPCs if the async repl is acceptable, or to only one blocking >> > roundtrip if very strict consistency is required. > > > Sounds very interesting, but I think there may be a problem with your > strategy: Infinispan doesn't guarantee you that one of the nodes executing > the CommitCommand is the primary owner at the time the CommitCommand is > executed. You could have something like this: > > Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) > C initiates a tx that executes put(k, v) > Tx prepare succeeds on A and B > A crashes, but the other nodes don't detect the crash yet > Tx commit succeeds on B, who still thinks is a backup owner > B detects the crash, installs a new cluster view consistent hash with > owners(k) = [B] > > >> >> > >> > Thanks, >> > Sanne >> > _______________________________________________ >> > infinispan-dev mailing list >> > infinispan-dev(a)lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Sanne Grinovero

Tuesday, 20 January Tue, 20 Jan

7:12 p.m.

On 20 January 2015 at 13:32, Dan Berindei <dan.berindei(a)gmail.com> wrote:

...

Right, I'll eventually need a different abstraction, or a change to the Hash interface. However my need seems highly specialistic, I'm not sure if there would be a general interest into such a capability for other Hash implementors?

...

Cheers Dan

Never ever sign if you have more interesting comments below, I only saw them by chance ;)

...

On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor(a)redhat.com> wrote: > > Hi Sanne, > > An alternative approach would be to implement an > org.infinispan.commons.hash.Hash which delegates to the stock > implementation for all keys except those that need to be assigned to a > specific segment. It should return the desired segment for those. > > Adrian > > > On 01/20/2015 02:48 AM, Sanne Grinovero wrote: > > Hi all, > > > > I'm playing with an idea for some internal components to be able to > > "tag" the key for an entry to be stored into Infinispan in a very > > specific segment of the CH. > > > > Conceptually the plan is easy to understand by looking at this patch: > > > > > > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... > > > > Hacking the change into ReplicatedConsistentHash is quite barbaric, > > please bear with me as I couldn't figure a better way to be able to > > experiment with this. I'll probably want to extend this class, but > > then I'm not sure how to plug it in? You would need to create your own ConsistentHashFactory, possibly extending ReplicatedConsistentHashFactory. You can then plug the factory in with configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) However, this isn't a really good idea, because then you need a different implementation for distributed mode, and then another implementation for topology-aware clusters (with rack/machine/site ids). And your users would also need to select the proper factory for each cache.

Right, this is the complexity I was facing. I'll stick to my hack solution for our little POC... but ultimately I'll need to plug this in if none of the solutions below work out.

...

> > What would you all think of such a "tagging" mechanism? > > > > # Why I didn't use the KeyAffinityService > > - I need to use my own keys, not the meaningless stuff produced by the > > service > > - the extensive usage of Random in there doesn't seem suited for a > > performance critical path You can plug in your own KeyGenerator to generate keys, and maybe replace the Random with a static/thread-local counter.

Thanks for the tip on KeyGenerator, I'll investigate on that :) But I'll never add a static/threadlocal.. I'd rather commit my ugly code from the commit linked above.

...

> > > > > > > # Why I didn't use the Grouping API > > - I need to pick the specific storage segment, not just co-locate with > > a different key > > This is actually a drawback of the KeyAffinityService more than Grouping. With grouping, you can actually follow the KeyAffinityService strategy and generate random strings until you get one in the proper segment, and then tag all your keys with that exact string.

Interesting! A bit convoluted but could spare me to plug in the HashFactory. BTW I really dislike this idea of the KeyAffinityService to generate random keys until it works out.. I guess it might not be too bad if you want to pick a node out of ten, but I'm working at segment granularity level and with the right luck it could take a long time. It would be nice to have a function like this which would return in a deterministic amount of time, like simply an inverse Hash.

...

> > The general goal is to make it possible to "tag" all entries of an > > index, and have an independent index for each segment of the CH. So > > the resulting effect would be, that when a primary owner for any key K > > is making an update, and this triggers an index update, that update is > > A) going to happen on the same node -> no need to forwarding to a > > "master indexing node" > > B) each such writes on the index happen on the same node which is > > primary owner for all the written entries of the index. > > > > There are two additional nice consequences: > > - there would be no need to perform a reliable "master election": > > ownership singleton is already guaranteed by Infinispan's essential > > logic, so it would reuse that > > - the propagation of writes on the index from the primary owner > > (which is the local node by definition) to backup owners could use > > REPL_ASYNC for most practical use cases. > > > > So net result is that the overhead for indexing is reduced to 0 (ZERO) > > blocking RPCs if the async repl is acceptable, or to only one blocking > > roundtrip if very strict consistency is required. Sounds very interesting, but I think there may be a problem with your strategy: Infinispan doesn't guarantee you that one of the nodes executing the CommitCommand is the primary owner at the time the CommitCommand is executed. You could have something like this:

Index storage is generally used without transactions, but even assuming we had transactions enabled or the "vanilla" put operation suffered had a similar timing issue (as we'd determine this node to be owner higher up in the search stack, before the actual put reaches the Infinispan core API) it's not a problem as we'd simply lose locality of the write: it would be slightly less efficient, but still write the "right thing". The intention is to maximise locality with the hints, but failing locality on writes I just expect it to be handled as any other put operation on Infinispan.. with a couple more RPCs, with any race condition regarding topology changes being handled at lower level. Which is exactly why I'm now working on top of container segments: they are a stable building block and allow us to not worry on how you'll actually distribute data or re-route update commands. Thanks all for the suggestions! Sanne

...

Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) C initiates a tx that executes put(k, v) Tx prepare succeeds on A and B A crashes, but the other nodes don't detect the crash yet Tx commit succeeds on B, who still thinks is a backup owner B detects the crash, installs a new cluster view consistent hash with owners(k) = [B] > > > > > Thanks, > > Sanne

Dan Berindei

Thursday, 22 January Thu, 22 Jan

9:04 a.m.

On Wed, Jan 21, 2015 at 3:12 AM, Sanne Grinovero <sanne(a)infinispan.org> wrote:

...

On 20 January 2015 at 13:32, Dan Berindei <dan.berindei(a)gmail.com> wrote: > Adrian, I don't think that will work. The Hash doesn't know the number of > segments so it can't tell where a particular key will land - even assuming > knowledge about how the ConsistentHash will map hash codes to segments. > > However, I'm all for replacing the current Hash interface with another > interface that maps keys directly to segments. Right, I'll eventually need a different abstraction, or a change to the Hash interface. However my need seems highly specialistic, I'm not sure if there would be a general interest into such a capability for other Hash implementors?

If it's better than KeyAffinityService and/or grouping, I'm pretty sure there will be other takers.

...

Never ever sign if you have more interesting comments below, I only saw them by chance ;)

Oops, I only intended to reply to Adrian, and I forgot to remove the signature when I went further... I removed it now :)

...

> On Tue, Jan 20, 2015 at 4:08 AM, Adrian Nistor <anistor(a)redhat.com> wrote: >> >> Hi Sanne, >> >> An alternative approach would be to implement an >> org.infinispan.commons.hash.Hash which delegates to the stock >> implementation for all keys except those that need to be assigned to a >> specific segment. It should return the desired segment for those. >> >> Adrian >> >> >> On 01/20/2015 02:48 AM, Sanne Grinovero wrote: >> > Hi all, >> > >> > I'm playing with an idea for some internal components to be able to >> > "tag" the key for an entry to be stored into Infinispan in a very >> > specific segment of the CH. >> > >> > Conceptually the plan is easy to understand by looking at this patch: >> > >> > >> > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... >> > >> > Hacking the change into ReplicatedConsistentHash is quite barbaric, >> > please bear with me as I couldn't figure a better way to be able to >> > experiment with this. I'll probably want to extend this class, but >> > then I'm not sure how to plug it in? > > > You would need to create your own ConsistentHashFactory, possibly extending > ReplicatedConsistentHashFactory. You can then plug the factory in with > > configurationBuilder.clustering().hash().consistentHashFactory(yourFactory) > > However, this isn't a really good idea, because then you need a different > implementation for distributed mode, and then another implementation for > topology-aware clusters (with rack/machine/site ids). And your users would > also need to select the proper factory for each cache. Right, this is the complexity I was facing. I'll stick to my hack solution for our little POC... but ultimately I'll need to plug this in if none of the solutions below work out.

In the meantime Adrian clarified that you're creating the configuration yourself, so there's no problem picking the proper consistent hash factory - as long as the user didn't plug in his own for the indexed cache.

...

>> > What would you all think of such a "tagging" mechanism? >> > >> > # Why I didn't use the KeyAffinityService >> > - I need to use my own keys, not the meaningless stuff produced by the >> > service >> > - the extensive usage of Random in there doesn't seem suited for a >> > performance critical path > > > You can plug in your own KeyGenerator to generate keys, and maybe replace > the Random with a static/thread-local counter. Thanks for the tip on KeyGenerator, I'll investigate on that :) But I'll never add a static/threadlocal.. I'd rather commit my ugly code from the commit linked above.

Indeed, the fact that you need to generate a different key every time makes KeyAffinityService harder to work with.

...

> >> >> >> >> > >> > # Why I didn't use the Grouping API >> > - I need to pick the specific storage segment, not just co-locate with >> > a different key >> > > > > This is actually a drawback of the KeyAffinityService more than Grouping. > With grouping, you can actually follow the KeyAffinityService strategy and > generate random strings until you get one in the proper segment, and then > tag all your keys with that exact string. Interesting! A bit convoluted but could spare me to plug in the HashFactory. BTW I really dislike this idea of the KeyAffinityService to generate random keys until it works out.. I guess it might not be too bad if you want to pick a node out of ten, but I'm working at segment granularity level and with the right luck it could take a long time. It would be nice to have a function like this which would return in a deterministic amount of time, like simply an inverse Hash.

Weird, I was 100% sure that there's no way to reverse MurmurHash3, but it seems there is [1]. Our implementation is a bit different, but it shouldn't be very hard to adopt. On the other hand, AbstractTopologyAwareEncoder1x.denormalizeSegmentHashIds brute-forces MurmurHash3 on each topology update to get a "denormalized" start value for each segment, which has to map to 0.2% of the segment. I don't remember how much it took, but it wasn't that bad. The good part about using grouping is that you can have lots of keys with the same group key, so you would only have to find the inverse once. [1] https://131002.net/siphash/#at

...

>> > The general goal is to make it possible to "tag" all entries of an >> > index, and have an independent index for each segment of the CH. So >> > the resulting effect would be, that when a primary owner for any key K >> > is making an update, and this triggers an index update, that update is >> > A) going to happen on the same node -> no need to forwarding to a >> > "master indexing node" >> > B) each such writes on the index happen on the same node which is >> > primary owner for all the written entries of the index. >> > >> > There are two additional nice consequences: >> > - there would be no need to perform a reliable "master election": >> > ownership singleton is already guaranteed by Infinispan's essential >> > logic, so it would reuse that >> > - the propagation of writes on the index from the primary owner >> > (which is the local node by definition) to backup owners could use >> > REPL_ASYNC for most practical use cases. >> > >> > So net result is that the overhead for indexing is reduced to 0 (ZERO) >> > blocking RPCs if the async repl is acceptable, or to only one blocking >> > roundtrip if very strict consistency is required. > > > Sounds very interesting, but I think there may be a problem with your > strategy: Infinispan doesn't guarantee you that one of the nodes executing > the CommitCommand is the primary owner at the time the CommitCommand is > executed. You could have something like this: Index storage is generally used without transactions, but even assuming we had transactions enabled or the "vanilla" put operation suffered had a similar timing issue (as we'd determine this node to be owner higher up in the search stack, before the actual put reaches the Infinispan core API) it's not a problem as we'd simply lose locality of the write: it would be slightly less efficient, but still write the "right thing".

I was thinking about the indexed cache, not the storage cache. In the storage cache I think the only serious problem is when the originator dies, especially if it's also the primary owner, neither tx nor non-tx deal with that properly ATM. However, if I understood your idea correctly, only the primary owner in the indexed cache will ever write to the index, and the indexed cache may well use transactions. So I think my scenario is relevant, and the index won't be updated by B or anyone else. Non-transactional caches do not have the issue, the index update can only disappear if you use async replication in the index storage cache. Instead you can have a single write triggering multiple index updates, but I'm guessing you have that covered.

...

The intention is to maximise locality with the hints, but failing locality on writes I just expect it to be handled as any other put operation on Infinispan.. with a couple more RPCs, with any race condition regarding topology changes being handled at lower level. Which is exactly why I'm now working on top of container segments: they are a stable building block and allow us to not worry on how you'll actually distribute data or re-route update commands. Thanks all for the suggestions! Sanne > > Cluster [A, B, C, D], key k, owners(k) = [A, B] (A is primary) > C initiates a tx that executes put(k, v) > Tx prepare succeeds on A and B > A crashes, but the other nodes don't detect the crash yet > Tx commit succeeds on B, who still thinks is a backup owner > B detects the crash, installs a new cluster view consistent hash with > owners(k) = [B] >

Cheers Dan

Sanne Grinovero

Tuesday, 20 January Tue, 20 Jan

6:54 p.m.

Thanks Adrian, right I initially expected doing something like that, but the Hash contract doesn't expose/leak details about segments. I guess I could forge a specific hash result but that seems fragile, while my needs are very simple as I already know the segment id: for a given indexing back-end it's a constant. On 20 January 2015 at 02:08, Adrian Nistor <anistor(a)redhat.com> wrote:

...

Hi Sanne, An alternative approach would be to implement an org.infinispan.commons.hash.Hash which delegates to the stock implementation for all keys except those that need to be assigned to a specific segment. It should return the desired segment for those. Adrian On 01/20/2015 02:48 AM, Sanne Grinovero wrote: > Hi all, > > I'm playing with an idea for some internal components to be able to > "tag" the key for an entry to be stored into Infinispan in a very > specific segment of the CH. > > Conceptually the plan is easy to understand by looking at this patch: > > https://github.com/Sanne/infinispan/commit/45a3d9e62318d5f5f950a60b5bb174... > > Hacking the change into ReplicatedConsistentHash is quite barbaric, > please bear with me as I couldn't figure a better way to be able to > experiment with this. I'll probably want to extend this class, but > then I'm not sure how to plug it in? > > What would you all think of such a "tagging" mechanism? > > # Why I didn't use the KeyAffinityService > - I need to use my own keys, not the meaningless stuff produced by the service > - the extensive usage of Random in there doesn't seem suited for a > performance critical path > > # Why I didn't use the Grouping API > - I need to pick the specific storage segment, not just co-locate with > a different key > > > The general goal is to make it possible to "tag" all entries of an > index, and have an independent index for each segment of the CH. So > the resulting effect would be, that when a primary owner for any key K > is making an update, and this triggers an index update, that update is > A) going to happen on the same node -> no need to forwarding to a > "master indexing node" > B) each such writes on the index happen on the same node which is > primary owner for all the written entries of the index. > > There are two additional nice consequences: > - there would be no need to perform a reliable "master election": > ownership singleton is already guaranteed by Infinispan's essential > logic, so it would reuse that > - the propagation of writes on the index from the primary owner > (which is the local node by definition) to backup owners could use > REPL_ASYNC for most practical use cases. > > So net result is that the overhead for indexing is reduced to 0 (ZERO) > blocking RPCs if the async repl is acceptable, or to only one blocking > roundtrip if very strict consistency is required. > > Thanks, > Sanne > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

4108

days inactive

4110

days old

infinispan-dev@lists.jboss.org

Manage subscription

8 comments

3 participants

tags (0)

participants (3)

Adrian Nistor
Dan Berindei
Sanne Grinovero

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Experiment: Affinity Tagging