On Fri, Oct 10, 2014 at 8:51 PM, Mircea Markus <mmarkus(a)redhat.com> wrote:
On Oct 10, 2014, at 15:25, Dan Berindei
<dan.berindei(a)gmail.com> wrote:
> On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <mmarkus(a)redhat.com>
wrote:
>
> On Oct 10, 2014, at 15:18, Radim Vansa <rvansa(a)redhat.com> wrote:
>
> > That we should expose that as one method, not forcing people to
> > implement the sum() themselves.
>
> Hmm, isn't the method you mention cache.size() ? :-)
>
> Nope, because we decided to make cache.size() precise-but-slow :)
It's not possible to make it precise unless we provide snapshot isolation
/MVCC support.
IMO the formula Tristan provides is a good enough approximation of the size
of the data. And definitely way better than what we currently have.
(Looking at CHM.size() they offer an "accurate" size of the map by
counting it in a loop and making sure that the size is reproducible. I
don't think that's accurate in the general case, though, as you might count
intermediate sizes in that loop).
CHM.size() actually tracks the modCount of each segment and locks all the
segments in the final retry, so the result should be accurate. CHMV8
doesn't do that, instead it keeps a striped counter and doesn't try very
hard to get a reproducible sum from the counter cells.
I thought we concluded size() should be stable (and accurate) when there is
no write activity, and the way to implement that is with the entry
iterator. The result of Tristan's formula can change without any write
activity on the cache, just because there is a state transfer in progress.
For monitoring tools I'd rather have separate methods entriesInMemory() ->
sum(dataContainer.size()) and entriesInStores() -> sum(cacheStore.size())
that are allowed to rise and fall as nodes join and leave the cache.
Cheers
Dan