On Mon, Jul 14, 2014 at 6:16 AM, Pierre Sutra <pierre.sutra(a)unine.ch> wrote:
Hello,
> It doesn't look like what you have currently is safe for rehashes
> though since the owners would change nodes. You would need to move
> the listener between nodes in this case. Also you removed an edge
> case when a listener might not be installed if a CH change occurs
> right when sending to nodes (talked about later).
Indeed, this modification is not safe in presence of cluster changes. In
fact, I missed that the current implementation was ensuring elasticity.
> The problem was that there is a overlap when you have a node joining
> while you are sending the initial requests that it wouldn't install
> the listener.
>
> Cluster -> Node A, B, C
>
> 1. User installs listener on Node C
> 2. Node C is sending listeners to Nodes A + B
> 3. Node D joins in the time in between and asks for the listener (from
> coordinator), but it isn't fully installed yet to be retrieved
> 4. Node C finishes installing listeners on Nodes A + B
>
> in this case Node D never would have gotten listener, so Node C also
> sees if anyone else has joined.
I understood this was necessary for cache.addListener() atomicity, but I
though erroneously that elasticity was not implemented (I also needed a
quick fix). In my view, and please correct me if I am wrong, the
architecture you describe has still an issue because the coordinator can
fail. It is in fact necessary to re-execute the installation code until
a stable view is obtained (twice the same). Going back to your example,
consider a step 5 where the coordinator, say A, fails at the time C is
installing the listener on D, and some node E is joining. In case D is
the newly elected coordinator, E will never retrieve the listener. What
do you think of this scenario ?
Sorry I didn't say the entire implementation, I was just talking about
the successful path. It asks each member in it's view one at a time
until it passes, it just happens that it always asks the coordinator
first.
>
> The difference is that Node D only sends 1 message to coordinator to
> ask for listeners instead of sending N # of messages to all nodes
> (which would be required on every JOIN from any node). This should
> scale better in the long run especially since most cases this
> shouldn't happen.
In fact, D needs retrieving filters from the nodes that where holding
the keys it is in charge now to replicate. Indeed this requires at first
glance to send a JOIN message to all nodes. If you believe that this
key-based (or key range based) listener functionality is a direction of
interest, I can try amending my code to ensure elasticity.
I think it would be probably useful to have some sort of specific key
implementation as it would be better performant as you have found out.
The changes when a rebalance occur could be a bit difficult to
implement though to get all the edge cases properly, but if you want
to take a try at it, I think it would be pretty cool.
Another way (if the elasticity of the other doesn't work) was that we
could have a shared listener on every node that when you register a
specific key listener it would register that this node wants to know
of notifications for this key with all nodes. This way the filter
would only do a constant time lookup in a hash map instead of having
to iterate over all listeners on a modification. Then the rebalance
is simple as well in that only joiners have to ask for the shared
listener info. The only problematic part is making sure when a node
unregisters itself, but shouldn't be bad.
Pierre
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev