[infinispan-dev] Infinispan and change data capture

Mon Dec 12 12:56:51 EST 2016

On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <sanne at infinispan.org>
wrote:

> In short, what's the ultimate goal? I see two main but different
> options intertwined:
>  - allow to synchronize the *final state* of a replica
>

I'm assuming this case is already in place when using remote listeners and
includeCurrentState=true and we are
discussing how to improve it, as described in the proposal in the wiki and
on the 5th email of this thread.

>  - inspect specific changes
>
> For the first case, it would be enough for us to be able to provide a
> "squashed history" (as in Git squash), but we'd need to keep versioned
> shapshots around and someone needs to tell you which ones can be
> garbage collected.
> For example when a key is: written, updated, updated, deleted since
> the snapshot, we'll send only "deleted" as the intermediary states are
> irrelevant.
> For the second case, say the goal is to inspect fluctuations of price
> variations of some item, then the intermediary states are not
> irrelevant.
>
> Which one will we want to solve? Both?
>

Looking at http://debezium.io/, it implies the second case.

"[...] Start it up, point it at your databases, and your apps can start
responding to all of the inserts, updates,
and deletes that other apps commit to your databases. [...] your apps can
respond quickly and never miss an event,
even when things go wrong."

IMO the choice between squashed/full history, and even retention time is
highly application specific. Deletes might
not even be involved, one may be interested on answering "what is the peak
value of a certain key during the day?"

> Personally the attempt of solving the second one seems like a huge
> pivot of the project, the current data-structures and storage are not
> designed for this.

+1, as I wrote earlier about ditching the idea of event cache storage in
favor of Lucene.

> I see the value of such benefits, but maybe
> Infinispan is not the right tool for such a problem.
>
> I'd prefer to focus on the benefits of the squashed history, and have
> versioned entries soon, but even in that case we need to define which
> versions need to be kept around, and how garbage collection /
> vacuuming is handled.
>

Is that proposal written/recorded somewhere? It'd be interesting to know
how a client interested on data
changes would consume those multi-versioned entries (push/pull with
offset?, sorted/unsorted?, client tracking/per key/per version?),
as it seems there is some storage impedance as well.

>
> In short, I'd like to see an agreement that analyzing e.g.
> fluctuations in stock prices would be a non-goal, if these are stored
> as {"stock name", value} key/value pairs. One could still implement
> such a thing by using a more sophisticated model, just don't expect to
> be able to see all intermediary values each entry has ever had since
> the key was first used.
>

Continuous Queries listens to data key/value data using a query, should it
not be expected to
see all the intermediary values when changes in the server causes an entry
to start/stop matching
the query?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20161212/3b37404b/attachment.html