One way in which Debezium and Infinispan can be used
together is when Infinispan is being used as a cache for data
stored in a database. In this case, Debezium can capture the
changes to the database and produce a stream of events; a
separate process can consume these change and evict entries from
an Infinispan cache.
If Infinispan is to be used as a data store, then it
would be useful for Debezium to be able to capture those changes
so other apps/services can consume the changes. First of all,
does this make sense? Secondly, if it does, then Debezium would
need an Infinispan connector, and it’s not clear to me how that
connector might capture the changes from Infinispan.
Debezium typically monitors the log of
transactions/changes that are committed to a database. Of
course how this works varies for each type of database. For
example, MySQL internally produces a transaction log that
contains information about every committed row change, and
MySQL ensures that every committed change is included and that
non-committed changes are excluded. The MySQL mechanism is
actually part of the replication mechanism, so slaves update
their internal state by reading the master’s log. The Debezium
MySQL connector [2] simply reads the same log.
Infinispan has several mechanisms that may be
useful:
- Interceptors - See [3]. This seems pretty
straightforward and IIUC provides access to all internal
operations. However, it’s not clear to me whether a single
interceptor will see all the changes in a cluster (perhaps
in local and replicated modes) or only those changes that
happen on that particular node (in distributed mode). It’s
also not clear whether this interceptor is called within
the context of the cache’s transaction, so if a failure
happens just at the wrong time whether a change might be
made to the cache but is not seen by the interceptor (or
vice versa).
- Cross-site replication - See [4][5]. A
potential advantage of this mechanism appears to be that
it is defined (more) globally, and it appears to function
if the remote backup comes back online after being offline
for a period of time.
- State transfer - is it possible to participate
as a non-active member of the cluster, and to effectively
read all state transfer activities that occur within the
cluster?
- Cache store - tie into the cache store
mechanism, perhaps by wrapping an existing cache store and
sitting between the cache and the cache store
- Monitor the cache store - don’t monitor
Infinispan at all, and instead monitor the store in which
Infinispan is storing entries. (This is probably the least
attractive, since some stores can’t be monitored, or
because the store is persisting an opaque binary value.)
Are there other mechanism that might be used?
There are a couple of important requirements for
change data capture to be able to work correctly:
- Upon initial connection, the CDC connector must
be able to obtain a snapshot of all existing data,
followed by seeing all changes to data that may have
occurred since the snapshot was started. If the connector
is stopped/fails, upon restart it needs to be able to
reconnect and either see all changes that occurred since
it last was capturing changes, or perform a snapshot.
(Performing a snapshot upon restart is very inefficient
and undesirable.) This works as follows: the CDC connector
only records the “offset” in the source’s sequence of
events; what this “offset” entails depends on the source.
Upon restart, the connector can use this offset
information to coordinate with the source where it wants
to start reading. (In MySQL and PostgreSQL, every event
includes the filename of the log and position in that
file. MongoDB includes in each event the monotonically
increasing timestamp of the transaction.
- No change can be missed, even when things go
wrong and components crash.
- When a new entry is added, the “after” state of
the entity will be included. When an entry is updated, the
“after” state will be included in the event; if possible,
the event should also include the “before” state. When an
entry is removed, the “before” state should be included in
the event.
Any thoughts or advice would be greatly
appreciated.
Best regards,
Randall