Hi Manik, see inline:
On Wed, Jan 5, 2011 at 7:32 PM, Manik Surtani <manik(a)jboss.org> wrote:
On 4 Jan 2011, at 16:12, Eduardo Martins wrote:
4) AtomicMap doubt
I read in the Infinispan blog that AtomicMap provides colocation of
all entries, is that idea outdated? If not we may need a way to turn
that off :) For instance would not that mean the Tree API does not
works well with distribution mode? I apologize in advance if I'm
missing something, but if AtomicMap defines colocation, AtomicMap is
good for the node's data map, but not for the node's childs fqns.
Shouldn't each child fqn be freely distributed, being colocated
instead with the related node cache entry and data (atomic)map? Our
impl is kind of an "hybrid" of the Tree API, allows cache entries
references (similar to childs) but no data map, and the storage of
references through AtomicMap in same way as Tree API worries me.
Please clarify.
Each FQN has underneath an AtomicMap, so each you won't find yourself
finding k,v pairs belonging to a particular Fqn in different nodes.
We make no guarantees wrt child fqn nodes. So, just cos FQN B is child of
FQN A, it does not mean that the atomic map of B will be in same node as
atomic map of B.
Before I proceed with my reasoning, please clarify, the colocation
within AtomicMap is real? If I store data there, all data will be
colocated?
Yup, all *data* stored in the AtomicMap will be located in the same node.
It's treated as a single entity.
Ok, then lets think on the Tree API, typically/optimally you add, get
and remove a specific node in same cluster node/zone, iterating
through a node's childs is rare and usually without much perf
constraints. With current impl the node's child map entries are
colocated, since each node does that through an AtomicMap with child's
last FQN element, IMHO this is not great for performance:
1. Consider parent node P in node N1
2. In node N2, add a P child, this goes to N1. Adding P Child also
creates the child cache entries, all 3 seem to be colocated through
hashCode(), correct me if I'm wrong. Lets assume all hash ideally to
local node N2.
3. In node N2, get P child, this may go to N1 if we use P, skips it if
use Cache.get(...)
4. In node N2, remove P child, this needs to go to N1
Are you following the issue, if P is a popular parent node (which
happens a lot to root childs), N1 will be hammered by other nodes!
What you said above is not exactly true. A TreeNode contains 2 AtomicMaps -
one for data and one for structure. The DataMap contains the K/V attribute
pairs on the node. The StructureMap contains information about children.
Firstly, these 2 AtomicMaps aren't necessarily colocated. This is OK,
I just said both are colocated because both maps are stored in
Infinispan using an entry key object that defers hashCode() to fqn's
hashCode() only, it does not uses the NodeKey type also. I had the
idea that would result in colocation, is that incorrect?
since you rarely update structure (adding/removing children) and
data
(attributes on a node) in the same transaction. Secondly, there is nothing
that says parents and children are colocated since they use different keys.
So,
/a/b/c <-- could be on N1
/a/b/c/d <-- could be on N2
so transactions doing stuff on /a/b/c and /a/b/c/d won't be affecting the
same node - unless it is structure that is changing. So as per your example
above, in step 2, all 3 don't necessarily go to the same node. Also if you
do something like TreeCache.getNode() with an FQN, you don't walk through
parents.
That's what I mean in 3. using Cache.get(...) would not need parent.
I think you misunderstood the real issue, the fact that the node
structure is a map where all entries are colocated, as I said this
means that for populars parent nodes which have lots of childs, there
will be a lot of traffic to add/remove childs, this would not happen
if the parent's structure entry related with the child was colocated
with the child entries itself.
-- Eduardo
..............................................
http://emmartins.blogspot.com
http://redhat.com/solutions/telco