Re: [infinispan-dev] Infinispan and change data capture

Monday, 12 December 2016

On Fri, Dec 9, 2016 at 9:13 AM, Radim Vansa <rvansa(a)redhat.com&gt; wrote:

...
 1) 'cache that would persist the events with a monotonically
increasing id'

 I assume that you mean globally (for all entries) monotonous. How will
 you obtain such ID? Currently, commands have unique IDs that are
 <Address, Long> where the number part is monotonous per node. That's
 easy to achieve. But introducing globally monotonous counter means that
 there will be a single contention point. (you can introduce another
 contention points by adding backups, but this is probably unnecessary as
 you can find out the last id from the indexed cache data). Per-segment
 monotonous would be probably more scalabe, though that increases
 complexity.

Having it per segment would imply only operations involving the same key
would be ordered,
probably it's fine for most cases.

Could this order be affected during topology changes though? As I could
observe, there is a small
window where there is more than 1 primary owner for a given key due to the
fact that the CH propagation
is not complete.

...

 2) 'The write to the event log would be async in order to not affect
 normal data writes'

 Who should write to the cache?
 a) originator - what if originator crashes (despite the change has been
 added)? Besides, originator would have to do (async) RPC to primary
 owner (which will be the primary owner of the event, too).
 b) primary owner - with triangle, primary does not really know if the
 change has been written on backup. Piggybacking that info won't be
 trivial - we don't want to send another message explicitly. But even if
 we get the confirmation, since the write to event cache is async, if the
 primary owner crashes before replicating the event to backup, we lost
 the event
 c) all owners, but locally - that will require more complex
 reconciliation if the event did really happen on all surviving nodes or
 not. And backups could have some trouble to resolve order, too.

 IIUC clustered listeners are called from primary owner before the change
 is really confirmed on backups (@Pedro correct me if I am wrong,
 please), but for this reliable event cache you need higher level of
 consistency.

Async writes to a cache event log would not provide the best of guarantees,
agreed.

OTOH, to have the writes done synchronously, it'd be hard to avoid extra
RPCs.
Some can be prevented by using a KeyPartitioner similar to the one used on
the AffinityIndexManager [1],
so that Segment(K) = Segment(KE),  being K the key and KE the related event
log key.

Still RPCs would happen to replicate events, and as you pointed out, it is
not trivial to piggyback this on the triangle'd
data RPCs.

I'm starting to think that an extra cache to store events is overkill.

An alternative could be to bypass the event log cache altogether and store
the events on the Lucene index directly.
For this a custom interceptor would write them to a local index when it's
"safe" to do so, similar to what the QueryInterceptor
does with the Index.ALL flag, but only writing on primary + backup, more
like to a hypothetical  "Index.OWNER" setup.

This index does not necessarily need to be stored in extra caches (like the
Infinispan directory does) but can use a local MMap
based directory, making it OS cache friendly. At event consumption time,
though, broadcast queries to the primary owners would be
needed to collect the events on each of the nodes and merge them before
serving to the clients.

[1] https://github.com/infinispan/infinispan/blob/master/core/sr
c/main/java/org/infinispan/distribution/ch/impl/AffinityPartitioner.java

...
 3) The log will also have to filter out retried operations (based on
 command ID - though this can be indexed, too). Though, I would prefer to
 see per-event command-id log to deal with retries properly.

 4) Client should pull data, but I would keep push notifications that
 'something happened' (throttled on server). There could be use case for
 rarely updated caches, and polling the servers would be excessive there.

 Radim

Makes sense, the push could be a notification that the event log changed
and the
client would them proceed with its normal pull.

...
 >
 >
 > [1]
 > https://github.com/infinispan/infinispan/wiki/Remote-Listene
 rs-improvement-proposal
 >
 > Thanks,
 > Gustavo
 >
 >
 >
 > _______________________________________________
 > infinispan-dev mailing list
 > infinispan-dev(a)lists.jboss.org
 > https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Radim Vansa <rvansa(a)redhat.com&gt;
 JBoss Performance Team

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Infinispan and change data capture