This is an interesting analysis, and the work you did with the window operator is awesome.
It is the first non-time boundary example I have seen, and trust me when I say that I have
been looking for one :) I am curious about the percentage of the response time taken to do
the object mapping. I am inclined to think that these numbers are closer to a best case
scenario in terms of read performance.
How many SSTables did you have on disk? The more SSTables that you have to touch means
more I/O which most likely means higher latencies.
Was there any other load on Cassandra? If you were putting additional load on Cassandra
that triggers GC, any pauses would effect latencies as well.
How many rows were you fetching? The driver’s default page size is 5000. To put that into
perspective, consider a metric with a collection rate of one minute. We could fetch a
little more than 3 days worth of data points in a single page. Then we will also want to
consider the impact of HWKMETRICS-189. Each time series will be partitioned by day.
Numbers will be different with those changes, and they will also change based on cluster
topology.
On Aug 27, 2015, at 10:15 AM, Thomas Segismont
<tsegismo(a)redhat.com> wrote:
Hi,
Recently, we've ported to RxJava the stats computation code, a part not fully
"reactive" yet. The intention was to be able to compute statistics on raw data
for large period of times, with low requirements on memory (the initial implementation
needed all data loaded at once in a list).
If you're interested in the $subject and don't known what stats computation is,
it's an API metrics provides to give the user stats (min/max/avg/median/95th
percentile) on raw data. The stats are given in buckets (portions of time). As an example,
it's the API you use if you want monthly stats on a year of data, or hourly stats for
a day, ... etc.
The implementation makes use of the Observable#groupBy operator. This allows to determine
the bucket each data point belongs to, and then we #collect the points in each bucket to
compute the stats.
https://github.com/hawkular/hawkular-metrics/blob/master/core/metrics-cor...
As we were talking on a related topic (stats for sliding windows), John wondered if a
solution based on Observable#window would perform equally.
So I've implemented this solution and instrumented the code to determine where the
system was spending the execution time. Then I tried both solutions with different bucket
sizes, metric resolutions, numbers of buckets.
Here are my observations.
#groupBy and #window behave about the same in our case (I've tried with a few buckets
up to a thousand buckets)
Response time increases linearly with the number of data points loaded from the C*.
Response time here means time elapsed between the moment Wildfly invokes the JAX-RS
handler method and we resume the AsyncResponse.
The bottleneck is the data loading and mapping: 95% of execution time. C* Row to object
mapping is nearly 50% of it.
And my conclusions.
When working with non overlapping buckets (not sliding windows), readability should
determine the use of #groupBy or #window, not performance.
It seems possible to get a better response time. Currently we transform a ResultSet into
an Observable<Row> with a simple call to Observable#from (ResultSet is an
Iterable<Row>). By default, the C* driver fetches a page of data, and only when
it's entirely consumed, fetches the following one. But ResultSet has a
#fetchMoreResults method which we could use to fetch pages ahead of time (see
http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Res...).
Then, while Rx computation threads would spend time on mapping a Row to a
DataPoint<T>, the C* driver could load more Rows.
Attached are:
- a patch of the changes implemented for testing
- some of the results
Regards,
Thomas
<getXXXStats_Perf.iterable+window.txt><getXXXStats_Perf.patch><getXXXStats_Perf.toList+groupBy.txt><getXXXStats_Perf.toList+window.txt>_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev