On Mon, Feb 17, 2014 at 7:44 PM, William Burns <mudokonman@gmail.com> wrote:

On Mon, Feb 17, 2014 at 7:53 AM, Sanne Grinovero <sanne@infinispan.org> wrote:
> On 12 February 2014 10:40, Mircea Markus <mmarkus@redhat.com> wrote:
>> Hey Will,
>>
>> With the current design, during a topology change, an event might be delivered twice to a cluster listener. I think we might be able to identify such situations (a node becomes a key owner as a result of the topology change) and add this information to the event we send, e.g. a flag "potentiallyDuplicate" or something like that. Event implementors might be able to make good use of this, e.g. checking their internal state if an event is redelivered or not. What do you think? Are there any other more-than-once delivery situations we can't keep track of?

I agree, this would be important to track. I have thus added a new
flag to listeners that is set to true when a modification, removal, or
create that is done on behalf of a command that was retried due to a
topology change during the middle of it. Also this gives the benefit
not just for cluster listeners but regular listeners, since we could
have double notification currently even.

>
> I would really wish we would not push such a burden to the API
> consumer. If we at least had a modification counter associated with
> each entry this could help to identify duplicate triggers as well (on
> top of ordering of modification events as already discussed many
> times).

The issue in particular we have issues with listeners is when the
primary owner replicates the update to backup owners and then crashes
before the notification is sent. In this case we have no idea from
the originator's perspective if the backup owner has the update. When
the topology changes if updated it will be persisted to new owners
(possibly without notification). We could add a counter, however the
backup owner then has no idea if the primary owner has sent the
notification or not. Without adding some kind of 2PC to the primary
owner to tell the backup that it occurred, he won't know. However
this doesn't reliably tell the backup owner if the notification was
fired even if the node goes down during this period. Without
seriously rewriting our nontx dist code I don't see a viable way to do
this without the API consumer having to be alerted.

There's always going to be the possibility that a replication to one of the backup owner fails and the command is aborted after the listener was notified (but not on the successful backup owners). And even in tx mode, the listeners are notified during the prepare phase and not during the commit.

So I don't think we'll ever be able to make listeners 100% reliable, but the "potentially duplicate" flag should be good enough.

Cheers

Dan