Lukas,

That's excellent news.  Multiple backends is not something we really want to deal with.  Also, it might be nice to see a short presentation on the "best practices" for Tx handling.  But then again, that Tx stuff is handled at the Gremlin level?  So, perhaps not relevant to direct C* consumers like Alerts.


On 2/23/2016 12:43 PM, Lukas Krejci wrote:
Hi all,

lately I've become really dissatisfied with how Inventory performed and 
semi-publicly blamed Titan for that (because that was what looked like 
the cause of all world's problems in my then uneducated eyes ;) ).

I decided to do some performance comparisons. Because we didn't want 
Hawkular to ship with 2 different NoSQL backends (C* for metrics and 
whatever else for Inventory), I chose an RDBMS as a good conservative 
alternative (because people, IMHO, are still more comfortable dealing 
with an RDBMS than with NoSQL databases).

Currently, inventory is written against the graph DSL called Gremlin 
(from Tinkerpop 2.6.0). Fortunately, there exists a "toy" SQL backend 
for Tinkerpop 2 that we could try and see if it performed any good 
(which would frankly be surprising, given the fact it stores the graph 
data rather naively). With some luck, no code would have to be changed 
on our side to see the results.

We had no such luck.

Making the inventory run with the SQL backend was literally a day worth 
of work (if that) and the first preliminary tests showed that Inventory 
with Postgres backend performed much much better that Titan with 
embedded Cassandra. But the tests also uncovered some problems with the 
way Inventory code handled transactions.

Fast forward 3 weeks and see large parts of Hawkular inventory updated 
to correctly handle transactions. Now a single call to Inventory really 
results in at most 1 transaction in the backend.

So, I went and re-ran the tests. Also, I refrained from using embedded 
Cassandra and instead use a locally running 2-node cluster.

The results caught me by surprise. Not so much that the naive SQL 
backend didn't perform particularly well, but the difference between the 
performance of Titan before and after the transaction handling fixes.

To not keep you waiting any longer for the results: Titan + C* is the 
winner.

For nice charts that include comparison to the old misbehaving impl, see:
https://dashboards.ly/ua-tALzrY9rEoRBXvsLXbZJHT

Cheers,