Lukas do you have any insight into how Titan supports transactions with Cassandra? I know that Titan still uses the thrift API,which presumably rules out the light weight transactions introduced in C* 2.0. These are not to be confused with ACID transactions. It is more of an atomic update using a consensus protocol. Then you have atomic or logged batches which are atomic in the sense that either all of the statements will succeed or none will; however, they can succeed *eventually*. These have more overhead than unlogged batches because the mutations are first written to a batch log. Applying multiple mutations to the same partitions is an atomic operation. In other words, whether you update 1 or 50 columns, if it is done within the same partition (and within the same operation), it is atomic.

My point being that Cassandra is definitely not a transactional data store, so I am really curious about what Titan is doing.

On Feb 23, 2016, at 1:48 PM, Lukas Krejci <lkrejci@redhat.com> wrote:



On 02/23/2016 07:42 PM, Jay Shaughnessy wrote:

Lukas,

That's excellent news.  Multiple backends is not something we really
want to deal with.  Also, it might be nice to see a short presentation
on the "best practices" for Tx handling.  But then again, that Tx stuff
is handled at the Gremlin level?  So, perhaps not relevant to direct C*
consumers like Alerts.


You're right. Inventory uses Gremlin to handle transactions, so it 
doesn't directly "see" what is Titan doing behind the scenes.

On 2/23/2016 12:43 PM, Lukas Krejci wrote:
Hi all,

lately I've become really dissatisfied with how Inventory performed and
semi-publicly blamed Titan for that (because that was what looked like
the cause of all world's problems in my then uneducated eyes ;) ).

I decided to do some performance comparisons. Because we didn't want
Hawkular to ship with 2 different NoSQL backends (C* for metrics and
whatever else for Inventory), I chose an RDBMS as a good conservative
alternative (because people, IMHO, are still more comfortable dealing
with an RDBMS than with NoSQL databases).

Currently, inventory is written against the graph DSL called Gremlin
(from Tinkerpop 2.6.0). Fortunately, there exists a "toy" SQL backend
for Tinkerpop 2 that we could try and see if it performed any good
(which would frankly be surprising, given the fact it stores the graph
data rather naively). With some luck, no code would have to be changed
on our side to see the results.

We had no such luck.

Making the inventory run with the SQL backend was literally a day worth
of work (if that) and the first preliminary tests showed that Inventory
with Postgres backend performed much much better that Titan with
embedded Cassandra. But the tests also uncovered some problems with the
way Inventory code handled transactions.

Fast forward 3 weeks and see large parts of Hawkular inventory updated
to correctly handle transactions. Now a single call to Inventory really
results in at most 1 transaction in the backend.

So, I went and re-ran the tests. Also, I refrained from using embedded
Cassandra and instead use a locally running 2-node cluster.

The results caught me by surprise. Not so much that the naive SQL
backend didn't perform particularly well, but the difference between the
performance of Titan before and after the transaction handling fixes.

To not keep you waiting any longer for the results: Titan + C* is the
winner.

For nice charts that include comparison to the old misbehaving impl, see:
https://dashboards.ly/ua-tALzrY9rEoRBXvsLXbZJHT

Cheers,




_______________________________________________
hawkular-dev mailing list
hawkular-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev


-- 
Lukas Krejci
_______________________________________________
hawkular-dev mailing list
hawkular-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev