[Hawkular-dev] [Inventory] Performance of Tinkerpop3 backends

Thomas Heute theute at redhat.com
Thu Sep 8 10:03:09 EDT 2016


I discussed some more with Lukas and looked around a bit.

Titan is basically the link between TinkerPop (graph API) and Cassandra.

I may have jumped in conclusion but:
 - Datastax (the company behind Cassandra) bought ThinkAurelius (the
company behind Titan)
 - Since then Datastax built a product "inspired by Titan", a graph DB with
TinkerPop and Cassandra. TThe product is closed-source and completely
targeted to Cassandra. Datastax has no real incentive to maintain Titian as
it competes with their product and all engineers stopped contributing to it.
 - Last release of Titan was in Sept 2015 (they used to release ~ every 3
months)
 - While the community is relatively active, no Pull request was approved
after June, and we don't know about any fork that is well maintained
 - It's a fairly large and complex piece of code, too large and too
narrowed for us to take over.

Conclusion: medium/long term Titan is no longer an option.

The other concern is that there is no good solution to store to Cassandra
which is the only storage dependence for Hawkular Services today (and was
an important requirement).

So we have to make a difficult choice here but we don't seem to have many
options if we stick with TinkerPop at least...

According to Lukas, Sqlg is a good option for the following reasons:
   - Performance
   - Size/complexity
   - The "community" is really small but the lead developer is responsive
(and in the case he stops, it would be easier to fork and maintain).

The big drawback is that for production we would require Postgres (for
non-prod or for Hawkular Services users who don't use the inventory
service, we can use the embedded H2).

Thoughts ?

Thomas





On Thu, Jul 21, 2016 at 2:08 PM, Lukas Krejci <lkrejci at redhat.com> wrote:

> Hi all,
>
> to move inventory forward, we need to port it to Tinkerpop3 - a new(ish)
> and
> actively maintained version of the Tinkerpop graph API.
>
> Apart from the huge improvement in the API expressiveness and capabilities,
> the important thing is that it comes with a variety of backends, 2 of which
> are of particular interest to us ATM. The Titan backend (with Titan in
> version
> 1.0) and SQL backend (using the sqlg library).
>
> The SQL backend is a much improved (yet still unfinished in terms of
> optimizations and some corner case features) version of the toy SQL backend
> for Tinkerpop2.
>
> Back in March I ran performance comparisons for SQL/postgres and Titan
> (0.5.4)
> on Tinkerpop2 and concluded that Titan was the best choice then.
>
> After completing a simplistic port of inventory to Tinkerpop3 (not taking
> advantage of any new features or opportunities to simplify inventory
> codebase), I've run the performance tests again for the 2 new backends -
> Titan
> 1.0 and Sqlg (on postgres).
>
> This time the results are not so clear as the last time.
> >From the charts [1] you can see that Postgres is actually quite a bit
> faster
> on reads and can better handle concurrent read access while Titan shines in
> writes (arguably thanks to Cassandra as its storage).
>
> Of course, I can imagine that the read performance advantage of Postgres
> would
> decrease with the growing amount of data stored (the tests ran with the
> inventory size of ~10k entities) but I am quite positive we'd get
> competitive
> read performance from both solutions up to the sizes of inventory we
> anticipate (100k-1M entities).
>
> Now the question is whether the insert performance is something we should
> be
> worried about in Postgres too much. IMHO, there should be some room for
> improvement in Sqlg and also our move to /sync for agent synchronization
> would
> make this less of a problem (because there would be not that many initial
> imports that would create vast amounts of entities).
>
> Nevertheless I currently cannot say who is the "winner" here. Each backend
> has
> its pros and cons:
>
> Titan:
> Pros:
> - high write throughput
> - backed by cassandra
>
> Cons:
> - slower reads
> - project virtually dead
> - complex codebase (self-made fixes unlikely)
>
> Sqlg:
> Pros:
> - small codebase
> - everybody knows SQL
> - faster reads
> - faster concurrent reads
>
> Cons:
> - slow writes
> - another backend needed (Postgres)
>
> Therefore my intention here is to go forward with a "proper" port to
> Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if
> we
> can do anything with the write performance.
>
> IMHO, any choice we make is "workable" as it is even today but we need to
> weigh in the productization requirements. For those Sqlg with its small dep
> footprint and postgres backend seems preferable to the huge dependency
> mess of
> Titan.
>
> [1] https://dashboards.ly/ua-TtqrpCXcQ3fnjezP5phKhc
>
> --
> Lukas Krejci
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hawkular-dev/attachments/20160908/bee83435/attachment-0001.html 


More information about the hawkular-dev mailing list