<div dir="ltr">On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <span dir="ltr"><<a href="mailto:sanne@infinispan.org" target="_blank">sanne@infinispan.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">In short, what's the ultimate goal? I see two main but different<br>
options intertwined:<br>
- allow to synchronize the *final state* of a replica<br></blockquote><div><br>I'm assuming this case is already in place when using remote listeners and includeCurrentState=true and we are<br></div><div>discussing how to improve it, as described in the proposal in the wiki and on the 5th email of this thread.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
- inspect specific changes<br>
<br>
For the first case, it would be enough for us to be able to provide a<br>
"squashed history" (as in Git squash), but we'd need to keep versioned<br>
shapshots around and someone needs to tell you which ones can be<br>
garbage collected.<br>
For example when a key is: written, updated, updated, deleted since<br>
the snapshot, we'll send only "deleted" as the intermediary states are<br>
irrelevant.<br>
For the second case, say the goal is to inspect fluctuations of price<br>
variations of some item, then the intermediary states are not<br>
irrelevant.<br>
<br>
Which one will we want to solve? Both?<br></blockquote><div> </div><div><br></div><div>Looking at <a href="http://debezium.io/" target="_blank">http://debezium.io/</a>, it implies the second case.<br><br>"[...] Start it up, point it at your databases, and your apps can start
responding to all of the inserts, updates, <br>and deletes that other apps
commit to your databases. [...] your apps can
respond quickly and never miss an event, <br>even when things go wrong."<br><br></div><div>IMO the choice between squashed/full history, and even retention time is highly application specific. Deletes might <br>not even be involved, one may be interested on answering "what is the peak value of a certain key during the day?"<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Personally the attempt of solving the second one seems like a huge<br>
pivot of the project, the current data-structures and storage are not<br>
designed for this. </blockquote><div><br><br></div><div>+1, as I wrote earlier about ditching the idea of event cache storage in favor of Lucene.<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I see the value of such benefits, but maybe<br>
Infinispan is not the right tool for such a problem.<br>
<br>
I'd prefer to focus on the benefits of the squashed history, and have<br>
versioned entries soon, but even in that case we need to define which<br>
versions need to be kept around, and how garbage collection /<br>
vacuuming is handled.<br></blockquote><div><br></div><div>Is that proposal written/recorded somewhere? It'd be interesting to know how a client interested on data <br>changes would consume those multi-versioned entries (push/pull with offset?, sorted/unsorted?, client tracking/per key/per version?), <br>as it seems there is some storage impedance as well.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
In short, I'd like to see an agreement that analyzing e.g.<br>
fluctuations in stock prices would be a non-goal, if these are stored<br>
as {"stock name", value} key/value pairs. One could still implement<br>
such a thing by using a more sophisticated model, just don't expect to<br>
be able to see all intermediary values each entry has ever had since<br>
the key was first used.<br></blockquote><div><br><br>Continuous Queries listens to data key/value data using a query, should it not be expected to <br>see all the intermediary values when changes in the server causes an entry to start/stop matching <br>the query?<br></div></div></div></div>