I have created a wiki page with some early thoughts around this.
http://community.jboss.org/wiki/DesigningServerHinting
So far defining such hints in XML is easy enough, and sharing these hints cluster-wide
again is easy enough as it can be added to the join handshake process in DIST.
This info can be added to the Address of each node when calculating the hash of each
Address to place it on a hash wheel, however each and every technique I have seen so far
would just increase spread and *reduce* the chances of colocated nodes being adjacent on a
hash wheel. But not *guarantee* this. The thing is, such placing needs to be done
deterministically by any node in the grid, so the only inputs to such a function can only
be an Address and the set of hints.
I'm not sure how useful or acceptable this is though. Thoughts?
Cheers
Manik
On 22 Mar 2010, at 16:25, Manik Surtani wrote:
This relates to
https://jira.jboss.org/jira/browse/ISPN-180.
In JBoss Cache, we had a provision to allow for pluggable buddy selection algorithms. By
default, the buddy selection process would first try and pick a buddy in the same buddy
group, failing which any buddy *not* on the same physical machine, failing which any buddy
not in the same JVM, and finally any buddy at all. Further, being pluggable, people could
write their own buddy selection algorithms to pick buddies based on any additional
metrics, such as machine performance by hooking into monitoring tools, etc.
In Infinispan we do not have an equivalent as yet. The consistent hash approach to
distribution takes a hash of each server's address and uses this to place the server
on a consistent hash wheel. Owners for keys are picked based on consecutive places on the
wheel. So there is every possibility that nodes on the same physical host or rack are
selected to back each other up, which is not optimal for data durability.
One approach is for each node to provide additional hints as to where it is - hints
including "machine id", "rack id" and maybe even "site id".
The hash function that calculates an addresses position on the hash wheel would take these
3 metrics into account, so this should be robust and pretty efficient. The only drawback
with this approach is that for each address, this additional data needs to be globally
available since CH's need to work globally and deterministically. This information
could be a part of a DIST JOIN request, which would work well.
What do people think? Any interesting alternate approaches to this problem?
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org