Denormalizing hashes

Clean ThreadLocals

The monthly failing tests...

Radim Vansa

Tuesday, 10 December 2013 Tue, 10 Dec '13

12:17 p.m.

Hi Galder, as I am trying to debug some problem in C++ client, I was looking into the server code. And I am not sure whether I understand the code correctly, but it seems to me that the server denormalizes the consistent hash for each client anew (after each topology change or client joining). Is this true? Looking into trace logs, I can see stuff like 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$] (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222 From denormalizeSegmentHashIds() method I see that this means that we have executed the hash function 639768 times just to notify one client. Is my understanding correct? Also, there is nothing like the concept of primary owner, is this right? I thought that every first request in HotRod will go to primary owner, so that the PUT does not have to do the first hop and is executed directly on the primary. But it seems to me that it goes to any of the owners (practically random one, as you are only looking for the numOwner ids in leeway = on the beginning of the range - then, 99.98% or more requests should go to the server with last position in the leeway). This looks pretty suboptimal for writes, isn't it? Cheers Radim PS: for every line of code you write in Scala, God kills a kitten -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

Show replies by date

Dan Berindei

Wednesday, 11 December Wed, 11 Dec

2:18 a.m.

...

Yes, this happens every time a client joins and/or every time the cache topology changes. We could easily cache the result of denormalizeSegmentHashIds, as it only depends on the number of segments. It's just that I wasn't expecting it to take so many iterations.

...

Also, there is nothing like the concept of primary owner, is this right?

The client CH doesn't have a concept of backup owners. But for each (hash id, server) pair that gets sent to the client, it means all the hash codes between the previous hash id and this hash id have this server as the primary owner. The server in the next (hash id, server) pair is the first backup, and so on. For each segment, the server generates numOwners (hash id, server) pairs. That means, for most of the hash codes in the segment, the list of owners on the client will be the same as the list of owners on the server. But for 0.0002 (leewayFraction) of the hash codes, the client primary owner will be indeed one of the server backup owners.

...

I thought that every first request in HotRod will go to primary owner, so that the PUT does not have to do the first hop and is executed directly on the primary. But it seems to me that it goes to any of the owners (practically random one, as you are only looking for the numOwner ids in leeway = on the beginning of the range - then, 99.98% or more requests should go to the server with last position in the leeway). This looks pretty suboptimal for writes, isn't it?

I'm not sure what you mean here, but I'm pretty sure the request goes to the correct server because we have a test for it: ConsistentHashV1IntegrationTest Cheers Dan

...

Cheers Radim PS: for every line of code you write in Scala, God kills a kitten -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Radim Vansa

2:38 a.m.

Hi Dan I am not speaking about changing something for the C++ client, I understand that the client code cannot be changed in order to keep the backward compatibility. The current hash-wheel approach is working well, but there are few flaws that could be fixed keeping the client code untouched. Please, correct me if I am wrong. 1) The denormalization is executed for every client for every topology change/client join. I don't have any numbers, but calling the hashing algorithm million times per every such occasion sounds as wasting computing power. -> cache the denormalized stuff on server 2) The server is sending numOwners hashIds per segment, one for each owner. What's the reason for that? I think that only primary owners should be inserted there. This would: a) target all PUT requests to primary owner, reducing PUT latency and lowering the general load in cluster b) reduce the routing information And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather orthogonal to the segment vs. hash wheel approach and its details. Radim On 12/11/2013 09:18 AM, Dan Berindei wrote:

...

Hi Radim Actually, it's me that wrote the denormalization code :) It was meant as a stop-gap measure before we upgraded the HotRod protocol to support the segment-based consistent hash, but the denormalization worked well enough (or so we thought) that we didn't get to changing the protocol yet. That's not a big change in itself, but we also wanted to make the consistent hash per-cache on the client (it's now per-cache manager), and that's a bit more complicated to do. And it's not like it would have been a good idea to change this before starting the C++ client, the client would still have to support the current style of consistent hash. On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> wrote: Hi Galder, as I am trying to debug some problem in C++ client, I was looking into the server code. And I am not sure whether I understand the code correctly, but it seems to me that the server denormalizes the consistent hash for each client anew (after each topology change or client joining). Is this true? Looking into trace logs, I can see stuff like 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$] (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222 <http://192.168.11.101:11222> From denormalizeSegmentHashIds() method I see that this means that we have executed the hash function 639768 times just to notify one client. Is my understanding correct? Yes, this happens every time a client joins and/or every time the cache topology changes. We could easily cache the result of denormalizeSegmentHashIds, as it only depends on the number of segments. It's just that I wasn't expecting it to take so many iterations. Also, there is nothing like the concept of primary owner, is this right? The client CH doesn't have a concept of backup owners. But for each (hash id, server) pair that gets sent to the client, it means all the hash codes between the previous hash id and this hash id have this server as the primary owner. The server in the next (hash id, server) pair is the first backup, and so on. For each segment, the server generates numOwners (hash id, server) pairs. That means, for most of the hash codes in the segment, the list of owners on the client will be the same as the list of owners on the server. But for 0.0002 (leewayFraction) of the hash codes, the client primary owner will be indeed one of the server backup owners. I thought that every first request in HotRod will go to primary owner, so that the PUT does not have to do the first hop and is executed directly on the primary. But it seems to me that it goes to any of the owners (practically random one, as you are only looking for the numOwner ids in leeway = on the beginning of the range - then, 99.98% or more requests should go to the server with last position in the leeway). This looks pretty suboptimal for writes, isn't it? I'm not sure what you mean here, but I'm pretty sure the request goes to the correct server because we have a test for it: ConsistentHashV1IntegrationTest Cheers Dan Cheers Radim PS: for every line of code you write in Scala, God kills a kitten -- Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

Dan Berindei

3:01 a.m.

On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa <rvansa(a)redhat.com> wrote:

...

Hi Dan I am not speaking about changing something for the C++ client, I understand that the client code cannot be changed in order to keep the backward compatibility.

Sure, I was just trying to give some background information on what we discussed and why we still have the wheel-based CH in the client.

...

The current hash-wheel approach is working well, but there are few flaws that could be fixed keeping the client code untouched. Please, correct me if I am wrong. 1) The denormalization is executed for every client for every topology change/client join. I don't have any numbers, but calling the hashing algorithm million times per every such occasion sounds as wasting computing power. -> cache the denormalized stuff on server

+1, like I said it would be easy to do but it never came up as a problem before.

...

2) The server is sending numOwners hashIds per segment, one for each owner. What's the reason for that? I think that only primary owners should be inserted there. This would:

The main reason is to support clients from Infinispan 5.1, which pick a random owner instead of always choosing the primary ( https://issues.jboss.org/browse/ISPN-2655).

...

a) target all PUT requests to primary owner, reducing PUT latency and lowering the general load in cluster

Nope, it wouldn't. The same fraction of requests would go to the primary owner as before, because we won't find the exact "denormalized" hash id that maps to the segment border when normalized. b) reduce the routing information

...

For 7.0, I guess we could say that 5.1 clients are no longer supported and we could switch to sending only the primary owners to the clients. But I'm not sure whether the loss of backwards compatibility is worth a couple hundred bytes sent once for every client. > And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather > orthogonal to the segment vs. hash wheel approach and its details.

...

Agree. Could you create issues in JIRA for both your proposals? > Radim

...

> On 12/11/2013 09:18 AM, Dan Berindei wrote:

...

> Hi Radim

...

> Actually, it's me that wrote the denormalization code :)

...

> It was meant as a stop-gap measure before we upgraded the HotRod protocol > to support the segment-based consistent hash, but the denormalization > worked well enough (or so we thought) that we didn't get to changing the > protocol yet.

...

> That's not a big change in itself, but we also wanted to make the > consistent hash per-cache on the client (it's now per-cache manager), and > that's a bit more complicated to do. And it's not like it would have been a > good idea to change this before starting the C++ client, the client would > still have to support the current style of consistent hash.

...

> On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa <rvansa(a)redhat.com> wrote:

...

>> Hi Galder, >

...

>> as I am trying to debug some problem in C++ client, I was looking into >> the server code. And I am not sure whether I understand the code >> correctly, but it seems to me that the server denormalizes the >> consistent hash for each client anew (after each topology change or >> client joining). Is this true? Looking into trace logs, I can see stuff >> like >

...

>> 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$] >> (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222 >

...

>> From denormalizeSegmentHashIds() method I see that this means that we >> have executed the hash function 639768 times just to notify one client. >> Is my understanding correct? >

...

> Yes, this happens every time a client joins and/or every time the cache > topology changes.

...

> We could easily cache the result of denormalizeSegmentHashIds, as it only > depends on the number of segments. It's just that I wasn't expecting it to > take so many iterations.

...

>> Also, there is nothing like the concept of primary owner, is this right? >

...

> The client CH doesn't have a concept of backup owners. But for each > (hash id, server) pair that gets sent to the client, it means all the hash > codes between the previous hash id and this hash id have this server as the > primary owner. The server in the next (hash id, server) pair is the first > backup, and so on.

...

> For each segment, the server generates numOwners (hash id, server) > pairs. That means, for most of the hash codes in the segment, the list of > owners on the client will be the same as the list of owners on the server. > But for 0.0002 (leewayFraction) of the hash codes, the client primary owner > will be indeed one of the server backup owners.

...

>> I thought that every first request in HotRod will go to primary owner, >> so that the PUT does not have to do the first hop and is executed >> directly on the primary. But it seems to me that it goes to any of the >> owners (practically random one, as you are only looking for the numOwner >> ids in leeway = on the beginning of the range - then, 99.98% or more >> requests should go to the server with last position in the leeway). This >> looks pretty suboptimal for writes, isn't it? >

...

> I'm not sure what you mean here, but I'm pretty sure the request goes to > the correct server because we have a test for it: > ConsistentHashV1IntegrationTest

> Cheers > Dan

>> Cheers >

>> Radim >

>> PS: for every line of code you write in Scala, God kills a kitten >

...

>> -- >> Radim Vansa <rvansa(a)redhat.com

...

>> JBoss DataGrid QA >

...

>> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >

...

> _______________________________________________ > infinispan-dev mailing listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev

...

> -- > Radim Vansa <rvansa(a)redhat.com> <rvansa(a)redhat.com

...

> JBoss DataGrid QA

...

> _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev

...

Radim Vansa

3:37 a.m.

...

On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> wrote: Hi Dan I am not speaking about changing something for the C++ client, I understand that the client code cannot be changed in order to keep the backward compatibility. Sure, I was just trying to give some background information on what we discussed and why we still have the wheel-based CH in the client. The current hash-wheel approach is working well, but there are few flaws that could be fixed keeping the client code untouched. Please, correct me if I am wrong. 1) The denormalization is executed for every client for every topology change/client join. I don't have any numbers, but calling the hashing algorithm million times per every such occasion sounds as wasting computing power. -> cache the denormalized stuff on server +1, like I said it would be easy to do but it never came up as a problem before.

Fair enough.

...

2) The server is sending numOwners hashIds per segment, one for each owner. What's the reason for that? I think that only primary owners should be inserted there. This would: The main reason is to support clients from Infinispan 5.1, which pick a random owner instead of always choosing the primary (https://issues.jboss.org/browse/ISPN-2655).

You can always report numKeyOwners=1 and the old code should handle that.

...

a) target all PUT requests to primary owner, reducing PUT latency and lowering the general load in cluster Nope, it wouldn't. The same fraction of requests would go to the primary owner as before, because we won't find the exact "denormalized" hash id that maps to the segment border when normalized.

Oh, only now have I noticed that the hashIds are sorted by the normalized ID, therefore, the primary owner always picks the first position (and most requests will hit the first). Mea culpa. Still, it makes no sense to include the backup owners into the routing table, as the probability that a read will hit them is negligable.

...

b) reduce the routing information For 7.0, I guess we could say that 5.1 clients are no longer supported and we could switch to sending only the primary owners to the clients. But I'm not sure whether the loss of backwards compatibility is worth a couple hundred bytes sent once for every client. And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather orthogonal to the segment vs. hash wheel approach and its details. Agree. Could you create issues in JIRA for both your proposals?

OK. By the way, shouldn't we tag the features that should be included for hotrod protocol v1.4 with some tag, such as hotrod14? But as I said, if we fake the numOwners to be always 1, it should be backwards compatible even now. Nevertheless, as you in fact do route most requests to the primary owner, it won't provide much performance gain and it's not as hot as I first thought. Radim

Dan Berindei

4:37 a.m.

On Wed, Dec 11, 2013 at 11:37 AM, Radim Vansa <rvansa(a)redhat.com> wrote:

...

On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa <rvansa(a)redhat.com> wrote: > Hi Dan > > I am not speaking about changing something for the C++ client, I > understand that the client code cannot be changed in order to keep the > backward compatibility. > Sure, I was just trying to give some background information on what we discussed and why we still have the wheel-based CH in the client. > > The current hash-wheel approach is working well, but there are few flaws > that could be fixed keeping the client code untouched. Please, correct me > if I am wrong. > > 1) The denormalization is executed for every client for every topology > change/client join. I don't have any numbers, but calling the hashing > algorithm million times per every such occasion sounds as wasting computing > power. -> cache the denormalized stuff on server > +1, like I said it would be easy to do but it never came up as a problem before. Fair enough. > > 2) The server is sending numOwners hashIds per segment, one for each > owner. What's the reason for that? I think that only primary owners should > be inserted there. This would: > The main reason is to support clients from Infinispan 5.1, which pick a random owner instead of always choosing the primary ( https://issues.jboss.org/browse/ISPN-2655). You can always report numKeyOwners=1 and the old code should handle that.

Yeah, it looks like it would work. I was thinking that when retrying a failed operation, the pre-5.2 client would still try one of the key owners, but I see now that it always chose a random server when retrying. Also, the client doesn't expose numKeyOwners to the user, like I had assumed.

...

> a) target all PUT requests to primary owner, reducing PUT latency and > lowering the general load in cluster > Nope, it wouldn't. The same fraction of requests would go to the primary owner as before, because we won't find the exact "denormalized" hash id that maps to the segment border when normalized. Oh, only now have I noticed that the hashIds are sorted by the normalized ID, therefore, the primary owner always picks the first position (and most requests will hit the first). Mea culpa. Still, it makes no sense to include the backup owners into the routing table, as the probability that a read will hit them is negligable. b) reduce the routing information > For 7.0, I guess we could say that 5.1 clients are no longer supported and we could switch to sending only the primary owners to the clients. But I'm not sure whether the loss of backwards compatibility is worth a couple hundred bytes sent once for every client. > And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather > orthogonal to the segment vs. hash wheel approach and its details. > > Agree. Could you create issues in JIRA for both your proposals? OK. By the way, shouldn't we tag the features that should be included for hotrod protocol v1.4 with some tag, such as hotrod14? But as I said, if we fake the numOwners to be always 1, it should be backwards compatible even now. Nevertheless, as you in fact do route most requests to the primary owner, it won't provide much performance gain and it's not as hot as I first thought. Radim _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

4595

days inactive

4596

days old

infinispan-dev@lists.jboss.org

Manage subscription

5 comments

2 participants

tags (0)

participants (2)

Dan Berindei
Radim Vansa

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Denormalizing hashes