Reviving this old thread, and as before I appreciate any help the
Infinispan community might provide. There definitely is interest in
Debezium capturing the changes being made to an Infinispan cluster. This
isn’t as important when Infinispan is used as a cache, but when Infinispan
is used as a store then it is important for other apps/services to be able
to accurately keep up with the changes being made in the store.
On Jul 29, 2016, at 8:47 AM, Galder Zamarreño <galder(a)redhat.com> wrote:
--
Galder Zamarreño
Infinispan, Red Hat
On 11 Jul 2016, at 16:41, Randall Hauch <rhauch(a)redhat.com> wrote:
On Jul 11, 2016, at 3:42 AM, Adrian Nistor <anistor(a)redhat.com> wrote:
Hi Randall,
Infinispan supports both push and pull access models. The push model is
supported by events (and listeners), which are cluster wide and are
available in both library and remote mode (hotrod). The notification system
is pretty advanced as there is a filtering mechanism available that can use
a hand coded filter / converter or one specified in jpql (experimental
atm). Getting a snapshot of the initial data is also possible. But
infinispan does not produce a transaction log to be used for determining
all changes that happened since a previous connection time, so you'll
always have to get a new full snapshot when re-connecting.
So if Infinispan is the data store I would base the Debezium connector
implementation on Infinispan's event notification system. Not sure about
the other use case though.
Thanks, Adrian, for the feedback. A couple of questions.
You mentioned Infinispan has a pull model — is this just using the normal
API to read the entries?
With event listeners, a single connection will receive all of the events
that occur in the cluster, correct? Is it possible (e.g., a very
unfortunately timed crash) for a change to be made to the cache without an
event being produced and sent to listeners?
^ Yeah, that can happen due to async nature of remote events. However,
there's the possibility for clients, upon receiving a new topology, to
receive the current state of the server as events, see [1] and [2]
[1]
http://infinispan.org/docs/dev/user_guide/user_
guide.html#client_event_listener_state_consumption
[2]
http://infinispan.org/docs/dev/user_guide/user_
guide.html#client_event_listener_failure_handling
It is critical that any change event stream is consistent with the store,
and the change event stream is worthless without it. Only when the change
event stream is an accurate representation of what changed can downstream
consumers use the stream to rebuild their own perfect copy of the upstream
store and to keep those copies consistent with the upstream store.
So, given that the events are handled asynchronously, in a cluster how are
multiple changes to a single entry handled. For example, if a client sets
entry <A,Foo>, then a short time after that (or another) client sets entry
<A,Bar>, is it guaranteed that a client listening to events will see
<A,Foo> first and <A,Bar> some time later? Or is it possible that a client
listening might first see <A,Bar> and then <A,Foo>?
What happens if the network fails or partitions? How does cross site
replication address this?
In terms of cross-site, depends what the client is connected to. Clients
can now failover between sites, so they should be able to deal with events
too in the same as explained above.
Has there been any thought about adding to Infinispan a write ahead log or
transaction log to each node or, better yet, for the whole cluster?
Not that I'm aware of but we've recently added security audit log, so a
transaction log might make sense too.
Without a transaction log, Debezium would have to use a client listener
with includeCurrentState=true to obtain the state every time it reconnects.
If Debezium just included all of this state in the event stream, then the
stream might contain lots of superfluous or unnecessary events, then this
impacts all downstream consumers by forcing them to spend a lot of time
processing changes that never really happened. So the only way to avoid
that would be for Debezium to use an external store to track the changes it
has seen so far so that it doesn’t include unnecessary events in the change
event stream. It’d be a shame to have to require this much infrastructure.
A transaction log would really be a great way to solve this problem. Has
there been any more thought about Infinispan using and exposing a
transaction log? Or perhaps Infinispan could record the changes in a Kafka
topic directly?
(I guess if the Infinispan cache used relational database(s) as a cache
store(s), then Debezium could just capture the changes from there. That
seems like a big constraint, though.)
Thoughts?
I recently updated a proposal [1] based on several discussions we had in
the past that is essentially about introducing an event storage mechanism
(write ahead log) in order to improve reliability, failover and
"replayability" for the remote listeners, any feedback greatly appreciated.
[1]