On Fri, Oct 10, 2014 at 8:51 PM, Mircea Markus <mmarkus@redhat.com> wrote:
On Oct 10, 2014, at 15:25, Dan Berindei <dan.berindei@gmail.com> wrote:

> On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <mmarkus@redhat.com> wrote:
>
> On Oct 10, 2014, at 15:18, Radim Vansa <rvansa@redhat.com> wrote:
>
> > That we should expose that as one method, not forcing people to
> > implement the sum() themselves.
>
> Hmm, isn't the method you mention cache.size() ? :-)
>
> Nope, because we decided to make cache.size() precise-but-slow :)

It's not possible to make it precise unless we provide snapshot isolation /MVCC support.
IMO the formula Tristan provides is a good enough approximation of the size of the data. And definitely way better than what we currently have.
(Looking at CHM.size() they offer an "accurate" size of the map by counting it in a loop and making sure that the size is reproducible. I don't think that's accurate in the general case, though, as you might count intermediate sizes in that loop).

CHM.size() actually tracks the modCount of each segment and locks all the segments in the final retry, so the result should be accurate. CHMV8 doesn't do that, instead it keeps a striped counter and doesn't try very hard to get a reproducible sum from the counter cells.

I thought we concluded size() should be stable (and accurate) when there is no write activity, and the way to implement that is with the entry iterator. The result of Tristan's formula can change without any write activity on the cache, just because there is a state transfer in progress.

For monitoring tools I'd rather have separate methods entriesInMemory() -> sum(dataContainer.size()) and entriesInStores() -> sum(cacheStore.size()) that are allowed to rise and fall as nodes join and leave the cache.

Cheers
Dan