The Debezium project [1] is working on building change data capture connectors for a
variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and
Oracle are next on our roadmap.
One way in which Debezium and Infinispan can be used together is when Infinispan is being
used as a cache for data stored in a database. In this case, Debezium can capture the
changes to the database and produce a stream of events; a separate process can consume
these change and evict entries from an Infinispan cache.
If Infinispan is to be used as a data store, then it would be useful for Debezium to be
able to capture those changes so other apps/services can consume the changes. First of
all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan
connector, and it’s not clear to me how that connector might capture the changes from
Infinispan.
Debezium typically monitors the log of transactions/changes that are committed to a
database. Of course how this works varies for each type of database. For example, MySQL
internally produces a transaction log that contains information about every committed row
change, and MySQL ensures that every committed change is included and that non-committed
changes are excluded. The MySQL mechanism is actually part of the replication mechanism,
so slaves update their internal state by reading the master’s log. The Debezium MySQL
connector [2] simply reads the same log.
Infinispan has several mechanisms that may be useful:
Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all
internal operations. However, it’s not clear to me whether a single interceptor will see
all the changes in a cluster (perhaps in local and replicated modes) or only those changes
that happen on that particular node (in distributed mode). It’s also not clear whether
this interceptor is called within the context of the cache’s transaction, so if a failure
happens just at the wrong time whether a change might be made to the cache but is not seen
by the interceptor (or vice versa).
Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be
that it is defined (more) globally, and it appears to function if the remote backup comes
back online after being offline for a period of time.
State transfer - is it possible to participate as a non-active member of the cluster, and
to effectively read all state transfer activities that occur within the cluster?
Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache
store and sitting between the cache and the cache store
Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store
in which Infinispan is storing entries. (This is probably the least attractive, since some
stores can’t be monitored, or because the store is persisting an opaque binary value.)
Are there other mechanism that might be used?
There are a couple of important requirements for change data capture to be able to work
correctly:
Upon initial connection, the CDC connector must be able to obtain a snapshot of all
existing data, followed by seeing all changes to data that may have occurred since the
snapshot was started. If the connector is stopped/fails, upon restart it needs to be able
to reconnect and either see all changes that occurred since it last was capturing changes,
or perform a snapshot. (Performing a snapshot upon restart is very inefficient and
undesirable.) This works as follows: the CDC connector only records the “offset” in the
source’s sequence of events; what this “offset” entails depends on the source. Upon
restart, the connector can use this offset information to coordinate with the source where
it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of
the log and position in that file. MongoDB includes in each event the monotonically
increasing timestamp of the transaction.
No change can be missed, even when things go wrong and components crash.
When a new entry is added, the “after” state of the entity will be included. When an entry
is updated, the “after” state will be included in the event; if possible, the event should
also include the “before” state. When an entry is removed, the “before” state should be
included in the event.
Any thoughts or advice would be greatly appreciated.
Best regards,
Randall
[1]
http://debezium.io
[2]
http://debezium.io/docs/connectors/mysql/
[3]
http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_inte...
[4]
http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteRep...
[5]
https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Repli...