[infinispan-dev] Infinispan and change data capture

Adrian Nistor anistor at redhat.com
Mon Jul 11 04:42:23 EDT 2016


Hi Randall,

Infinispan supports both push and pull access models. The push model is 
supported by events (and listeners), which are cluster wide and are 
available in both library and remote mode (hotrod). The notification 
system is pretty advanced as there is a filtering mechanism available 
that can use a hand coded filter / converter or one specified in jpql 
(experimental atm). Getting a snapshot of the initial data is also 
possible. But infinispan does not produce a transaction log to be used 
for determining all changes that happened since a previous connection 
time, so you'll always have to get a new full snapshot when re-connecting.

So if Infinispan is the data store I would base the Debezium connector 
implementation on Infinispan's event notification system. Not sure about 
the other use case though.

Adrian

On 07/09/2016 04:38 PM, Randall Hauch wrote:
> The Debezium project [1] is working on building change data capture 
> connectors for a variety of databases. MySQL is available now, MongoDB 
> will be soon, and PostgreSQL and Oracle are next on our roadmap.
>
> One way in which Debezium and Infinispan can be used together is when 
> Infinispan is being used as a cache for data stored in a database. In 
> this case, Debezium can capture the changes to the database and 
> produce a stream of events; a separate process can consume these 
> change and evict entries from an Infinispan cache.
>
> If Infinispan is to be used as a data store, then it would be useful 
> for Debezium to be able to capture those changes so other 
> apps/services can consume the changes. First of all, does this make 
> sense? Secondly, if it does, then Debezium would need an Infinispan 
> connector, and it’s not clear to me how that connector might capture 
> the changes from Infinispan.
>
> Debezium typically monitors the log of transactions/changes that are 
> committed to a database. Of course how this works varies for each type 
> of database. For example, MySQL internally produces a transaction log 
> that contains information about every committed row change, and MySQL 
> ensures that every committed change is included and that non-committed 
> changes are excluded. The MySQL mechanism is actually part of the 
> replication mechanism, so slaves update their internal state by 
> reading the master’s log. The Debezium MySQL connector [2] simply 
> reads the same log.
>
> Infinispan has several mechanisms that may be useful:
>
>   * Interceptors - See [3]. This seems pretty straightforward and IIUC
>     provides access to all internal operations. However, it’s not
>     clear to me whether a single interceptor will see all the changes
>     in a cluster (perhaps in local and replicated modes) or only those
>     changes that happen on that particular node (in distributed mode).
>     It’s also not clear whether this interceptor is called within the
>     context of the cache’s transaction, so if a failure happens just
>     at the wrong time whether a change might be made to the cache but
>     is not seen by the interceptor (or vice versa).
>   * Cross-site replication - See [4][5]. A potential advantage of this
>     mechanism appears to be that it is defined (more) globally, and it
>     appears to function if the remote backup comes back online after
>     being offline for a period of time.
>   * State transfer - is it possible to participate as a non-active
>     member of the cluster, and to effectively read all state transfer
>     activities that occur within the cluster?
>   * Cache store - tie into the cache store mechanism, perhaps by
>     wrapping an existing cache store and sitting between the cache and
>     the cache store
>   * Monitor the cache store - don’t monitor Infinispan at all, and
>     instead monitor the store in which Infinispan is storing entries.
>     (This is probably the least attractive, since some stores can’t be
>     monitored, or because the store is persisting an opaque binary value.)
>
>
> Are there other mechanism that might be used?
>
> There are a couple of important requirements for change data capture 
> to be able to work correctly:
>
>  1. Upon initial connection, the CDC connector must be able to obtain
>     a snapshot of all existing data, followed by seeing all changes to
>     data that may have occurred since the snapshot was started. If the
>     connector is stopped/fails, upon restart it needs to be able to
>     reconnect and either see all changes that occurred since it last
>     was capturing changes, or perform a snapshot. (Performing a
>     snapshot upon restart is very inefficient and undesirable.) This
>     works as follows: the CDC connector only records the “offset” in
>     the source’s sequence of events; what this “offset” entails
>     depends on the source. Upon restart, the connector can use this
>     offset information to coordinate with the source where it wants to
>     start reading. (In MySQL and PostgreSQL, every event includes the
>     filename of the log and position in that file. MongoDB includes in
>     each event the monotonically increasing timestamp of the transaction.
>  2. No change can be missed, even when things go wrong and components
>     crash.
>  3. When a new entry is added, the “after” state of the entity will be
>     included. When an entry is updated, the “after” state will be
>     included in the event; if possible, the event should also include
>     the “before” state. When an entry is removed, the “before” state
>     should be included in the event.
>
>
> Any thoughts or advice would be greatly appreciated.
>
> Best regards,
>
> Randall
>
>
> [1] http://debezium.io
> [2] http://debezium.io/docs/connectors/mysql/
> [3] 
> http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_interceptors_chapter
> [4] 
> http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteReplication
> [5] 
> https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20160711/6366c9f5/attachment-0001.html 


More information about the infinispan-dev mailing list