[
https://issues.jboss.org/browse/ISPN-1000?page=com.atlassian.jira.plugin....
]
Manik Surtani edited comment on ISPN-1000 at 4/20/11 9:54 AM:
--------------------------------------------------------------
Some design ideas:
{code}
# PUSH based rehashing
The purpose behind this is to provide for a better rehashing scheme that is:
* more robust
* has fewer RPC messages
* performs better
The current rehashing scheme is RPC-heavy and attempts to be non-blocking, resulting in
brittleness.
## Overview
* Every node registers a view change listener
* On view change,
* determine if current node is affected and kick off a RehashTask
* If related to a JOIN, RehashTask blocks on JOIN message to grab TopologyInfo (if config
is topology-aware).
* Enter READ_ONLY mode
* if affected, loop through keyset in data container and non-shared cache store. For
each key:
* if LAST_OWNER on OLD_CH and key has a new owner in NEW_CH, add to push set
* if LAST_OWNER is no longer an owner based on NEW_CH, add key to invalidation set
(deals with JOINs and LEAVEs in the same way now!)
* Push state to new owners
* Apply NEW_CH to local node, exit READ_ONLY mode.
* If successful, invalidate keys in invalidation set
### Determining if node is affected by ViewChange:
* If JOIN:
* If idx(JOINER) within + or - numOwners of idx(self)
* Or if LEAVE:
* and idx(LEAVER) within + or - (numOwners + 1) idx(self)
_TODO_: HOw would this affect vnodes? LEAVER or JOINER will have multiple positions to
check, and self will have multiple positions too. Can this be encapsulated in the
ConsistentHash impl?
### READ_ONLY mode:
I think attempting a non-blocking rehash at this stage is way too complex. We have it
with the current scheme and we can see how brittle it is. IMO we should go for a blocking
model for now and make sure rehashing is stable and robust, and consider non-blocking
later.
READ_ONLY mode is achieved by using the TransactionLogger. But we won't actually use
this to *log* any transactions for now. Instead we can use it to block new transactions
and non-transactional writes since it exists in all the appropriate places.
### Parallel joiners and leavers
How will this work with parallel joiners and leavers?
{code}
was (Author: manik):
Some design ideas:
{code}
# PUSH based rehashing
The purpose behind this is to provide for a better rehashing scheme that is:
* more robust
* has fewer RPC messages
* performs better
The current rehashing scheme is RPC-heavy and attempts to be non-blocking, resulting in
brittleness.
## Overview
* Every node registers a view change listener
* On JOIN, broadcast JOIN message containing optional topology info (if config is topology
aware).
* On view change,
* determine if current node is affected and kick off a RehashTask
* If related to a JOIN, RehashTask blocks on JOIN message to grab TopologyInfo (if config
is topology-aware).
* Enter READ_ONLY mode
* if affected, loop through keyset in data container and non-shared cache store. For
each key:
* if LAST_OWNER on OLD_CH and key has a new owner in NEW_CH, add to push set
* if LAST_OWNER is no longer an owner based on NEW_CH, add key to invalidation set
(deals with JOINs and LEAVEs in the same way now!)
* Push state to new owners
* Apply NEW_CH to local node, exit READ_ONLY mode.
* If successful, invalidate keys in invalidation set
### Determining if node is affected by ViewChange:
* If JOIN:
* If idx(JOINER) within + or - numOwners of idx(self)
* Or if LEAVE:
* and idx(LEAVER) within + or - (numOwners + 1) idx(self)
_TODO_: HOw would this affect vnodes? LEAVER or JOINER will have multiple positions to
check, and self will have multiple positions too. Can this be encapsulated in the
ConsistentHash impl?
### READ_ONLY mode:
I think attempting a non-blocking rehash at this stage is way too complex. We have it
with the current scheme and we can see how brittle it is. IMO we should go for a blocking
model for now and make sure rehashing is stable and robust, and consider non-blocking
later.
READ_ONLY mode is achieved by using the TransactionLogger. But we won't actually use
this to *log* any transactions for now. Instead we can use it to block new transactions
and non-transactional writes since it exists in all the appropriate places.
### Parallel joiners and leavers
How will this work with parallel joiners and leavers?
{code}
PUSH based rehashing
--------------------
Key: ISPN-1000
URL:
https://issues.jboss.org/browse/ISPN-1000
Project: Infinispan
Issue Type: Feature Request
Components: Distributed Cache
Affects Versions: 4.2.0.Final
Reporter: Manik Surtani
Assignee: Manik Surtani
Labels: rehash
Fix For: 5.0.0.CR1, 5.0.0.FINAL
Current rehash schemes are based on a PULL of state. Joiners (and new owners after a
leave) pull state from their neighbours. This JIRA is to reimplement this as a PUSH based
scheme, where all nodes detect new joiners (or leavers) and analyse their internal state
and determine what needs to be pushed where.
The scheme should be more robust, involving far fewer RPCs and coordination, and would
work better for merge views detected when partitions heal.
Based on Bela's prototype on
https://github.com/belaban/infinispan/tree/rebalance-changes
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira