<div dir="ltr">On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <span dir="ltr">&lt;<a href="mailto:sanne@infinispan.org" target="_blank">sanne@infinispan.org</a>&gt;</span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">In short, what&#39;s the ultimate goal? I see two main but different<br>

options intertwined:<br>

 - allow to synchronize the *final state* of a replica<br></blockquote><div><br>I&#39;m assuming this case is already in place when using remote listeners and includeCurrentState=true and we are<br></div><div>discussing how to improve it, as described in the proposal in the wiki and on the 5th email of this thread.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

 - inspect specific changes<br>

<br>

For the first case, it would be enough for us to be able to provide a<br>

&quot;squashed history&quot; (as in Git squash), but we&#39;d need to keep versioned<br>

shapshots around and someone needs to tell you which ones can be<br>

garbage collected.<br>

For example when a key is: written, updated, updated, deleted since<br>

the snapshot, we&#39;ll send only &quot;deleted&quot; as the intermediary states are<br>

irrelevant.<br>

For the second case, say the goal is to inspect fluctuations of price<br>

variations of some item, then the intermediary states are not<br>

irrelevant.<br>

<br>

Which one will we want to solve? Both?<br></blockquote><div> </div><div><br></div><div>Looking at <a href="http://debezium.io/" target="_blank">http://debezium.io/</a>, it implies the second case.<br><br>&quot;[...] Start it up, point it at your databases, and your apps can start 

responding to all of the inserts, updates, <br>and deletes that other apps 

commit to your databases. [...] your apps can

 respond quickly and never miss an event, <br>even when things go wrong.&quot;<br><br></div><div>IMO the choice between squashed/full history, and even retention time is highly application specific. Deletes might <br>not even be involved, one may be interested on answering &quot;what is the peak value of a certain key during the day?&quot;<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Personally the attempt of solving the second one seems like a huge<br>

pivot of the project, the current data-structures and storage are not<br>

designed for this. </blockquote><div><br><br></div><div>+1, as I wrote earlier about ditching the idea of event cache storage in favor of Lucene.<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I see the value of such benefits, but maybe<br>

Infinispan is not the right tool for such a problem.<br>

<br>

I&#39;d prefer to focus on the benefits of the squashed history, and have<br>

versioned entries soon, but even in that case we need to define which<br>

versions need to be kept around, and how garbage collection /<br>

vacuuming is handled.<br></blockquote><div><br></div><div>Is that proposal written/recorded somewhere? It&#39;d be interesting to know how a client interested on data <br>changes would consume those multi-versioned entries (push/pull with offset?, sorted/unsorted?, client tracking/per key/per version?), <br>as it seems there is some storage impedance as well.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

In short, I&#39;d like to see an agreement that analyzing e.g.<br>

fluctuations in stock prices would be a non-goal, if these are stored<br>

as {&quot;stock name&quot;, value} key/value pairs. One could still implement<br>

such a thing by using a more sophisticated model, just don&#39;t expect to<br>

be able to see all intermediary values each entry has ever had since<br>

the key was first used.<br></blockquote><div><br><br>Continuous Queries listens to data key/value data using a query, should it not be expected to <br>see all the intermediary values when changes in the server causes an entry to start/stop matching <br>the query?<br></div></div></div></div>