[infinispan-dev] Infinispan and change data capture

Radim Vansa rvansa at redhat.com
Tue Dec 13 03:33:53 EST 2016


On 12/12/2016 06:56 PM, Gustavo Fernandes wrote:
> On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <sanne at infinispan.org 
> <mailto:sanne at infinispan.org>> wrote:
>
>     In short, what's the ultimate goal? I see two main but different
>     options intertwined:
>      - allow to synchronize the *final state* of a replica
>
>
> I'm assuming this case is already in place when using remote listeners 
> and includeCurrentState=true and we are
> discussing how to improve it, as described in the proposal in the wiki 
> and on the 5th email of this thread.

I don't think the guarantees for any listeners are explicitly stated 
anywhere in docs. There are two parts of it:

- ideal state: I assume that in ideal state we don't want to miss any 
committed operation, but we have to define committed. And mention that 
events can be received multiple times (we aim at at-least-once semantics)
- current limitations: behaviour that does not resonate with the ideal 
but we were not able to fix it so far. Even [1] does not mention 
listeners (and it would be outdated).

[1] 
https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-Infinispan

>      - inspect specific changes
>
>     For the first case, it would be enough for us to be able to provide a
>     "squashed history" (as in Git squash), but we'd need to keep versioned
>     shapshots around and someone needs to tell you which ones can be
>     garbage collected.
>     For example when a key is: written, updated, updated, deleted since
>     the snapshot, we'll send only "deleted" as the intermediary states are
>     irrelevant.
>     For the second case, say the goal is to inspect fluctuations of price
>     variations of some item, then the intermediary states are not
>     irrelevant.
>
>     Which one will we want to solve? Both?
>
>
> Looking at http://debezium.io/, it implies the second case.
>
> "[...] Start it up, point it at your databases, and your apps can 
> start responding to all of the inserts, updates,
> and deletes that other apps commit to your databases. [...] your apps 
> can respond quickly and never miss an event,
> even when things go wrong."
>
> IMO the choice between squashed/full history, and even retention time 
> is highly application specific. Deletes might
> not even be involved, one may be interested on answering "what is the 
> peak value of a certain key during the day?"
>
>     Personally the attempt of solving the second one seems like a huge
>     pivot of the project, the current data-structures and storage are not
>     designed for this. 
>
>
>
> +1, as I wrote earlier about ditching the idea of event cache storage 
> in favor of Lucene.
>
>     I see the value of such benefits, but maybe
>     Infinispan is not the right tool for such a problem.
>
>     I'd prefer to focus on the benefits of the squashed history, and have
>     versioned entries soon, but even in that case we need to define which
>     versions need to be kept around, and how garbage collection /
>     vacuuming is handled.
>
>
> Is that proposal written/recorded somewhere? It'd be interesting to 
> know how a client interested on data
> changes would consume those multi-versioned entries (push/pull with 
> offset?, sorted/unsorted?, client tracking/per key/per version?),
> as it seems there is some storage impedance as well.
>
>
>     In short, I'd like to see an agreement that analyzing e.g.
>     fluctuations in stock prices would be a non-goal, if these are stored
>     as {"stock name", value} key/value pairs. One could still implement
>     such a thing by using a more sophisticated model, just don't expect to
>     be able to see all intermediary values each entry has ever had since
>     the key was first used.
>
>
>
> Continuous Queries listens to data key/value data using a query, 
> should it not be expected to
> see all the intermediary values when changes in the server causes an 
> entry to start/stop matching
> the query?

In Konstanz we were discussing listeners with Dan and later with Adrian 
and found out that CQ expects listeners to be much more reliable than 
these actually are. So, CQ is already broken and people can live with 
that; Theoretically Debezium can do the same, boldly claim that "your 
apps can respond quickly and never miss an event, even when things go 
wrong" and push the blame to Infinispan :)

Radim

>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss Performance Team



More information about the infinispan-dev mailing list