In short, what's the ultimate goal? I see two main but different
options intertwined:
- allow to synchronize the *final state* of a replica
- inspect specific changes
For the first case, it would be enough for us to be able to provide a
"squashed history" (as in Git squash), but we'd need to keep versioned
shapshots around and someone needs to tell you which ones can be
garbage collected.
For example when a key is: written, updated, updated, deleted since
the snapshot, we'll send only "deleted" as the intermediary states are
irrelevant.
For the second case, say the goal is to inspect fluctuations of price
variations of some item, then the intermediary states are not
irrelevant.
Which one will we want to solve? Both?
Personally the attempt of solving the second one seems like a huge
pivot of the project, the current data-structures and storage are not
designed for this.
I see the value of such benefits, but maybe
Infinispan is not the right tool for such a problem.
I'd prefer to focus on the benefits of the squashed history, and have
versioned entries soon, but even in that case we need to define which
versions need to be kept around, and how garbage collection /
vacuuming is handled.
In short, I'd like to see an agreement that analyzing e.g.
fluctuations in stock prices would be a non-goal, if these are stored
as {"stock name", value} key/value pairs. One could still implement
such a thing by using a more sophisticated model, just don't expect to
be able to see all intermediary values each entry has ever had since
the key was first used.