[Hawkular-dev] [Inventory] Performance of Tinkerpop3 backends

Lukas Krejci lkrejci at redhat.com
Thu Sep 8 19:00:01 EDT 2016


On Thursday, September 8, 2016 10:31:27 AM CEST Jay Shaughnessy wrote:
> On 9/8/2016 10:13 AM, John Mazzitelli wrote:
> > Considering that the agent has already hit a major problem where Titan is
> > most likely the cause, I do see that we need to move away from it.
> > 
> > If we go to the next option (Sqlg), how difficult would it be to be able
> > to switch backends (embedded H2 versus external Postgres)?
> > 

At the moment, it's a matter of changing the connection string and having the 
appropriate plugins for sqlg (similar to Hibernate dialects) and jdbc drivers 
on the classpath.

> > And how bad would it be to just support H2 for single-node Hawkular
> > Services installations (even if in production)?
> > 

That's a question I haven't had time to answer yet. From previous testing 
though I know that H2 is faster than Postgres (it doesn't pay the 
serialization cost, is in-process, ...). Recently I compared perf of Titan and 
Postgres (Sqlg) and writes were quite a bit slower on Postgres but the reads 
though were actually faster on Postgres. Now these perf tests weren't really 
scaled up - the inventory had ~ 10k resources and a single tenant. In 
addition, I tried H2-backed Sqlg impl with agent itests and they consistently 
pass (unlike with Titan-backed inventory, with which they fail most of the 
time).

We need a truly scaled up perf test - tens of tenants, hunderds of feeds in 
each, all writing and reading in parallel - to see how badly we can handle 
that with any solution ;)

> > I don't think we'd want to try to cluster H2 DBs (is that even possible?)
> > in the case of a multi-node H-Services deployment, but for simple
> > deployments, why make people install and manage a separate Postgres
> > backend if we can avoid it? But still have the option to let the person
> > use Postgres if they want.
> > 
> > Obviously, for when running in MiQ environments, we could point inventory
> > to MiQ's Postgres DB (and that's most likely the vast majority of
> > production deployments - if not all of them). But for dev environments, I
> > wonder if being able to use the embedded H2 DB would be easier.
> I'm not sure this is obvious.  It's unclear to me that we could hijack
> the MIQ postgres db, maybe.
>

I'm not sure about that either and actually woudln't count on it.
 
> > In any event, we do need to move off of Titan soon - I can't merge this
> > until we do (or we fix Titan) -
> > https://github.com/hawkular/hawkular-agent/pull/249
> If there is any way we could find a way around this problem for the
> near-term, and maintain Titan for an initial release, it would seem like
> the safest approach until we find an alternative.
> 
> How extensive is our use of Tinkerpop?  Is it at all feasible for us to
> drop it and look into having our own C* persistence layer to support
> Inventory? 

Right now Inventory has its own layer of "query trees" on top of tinkerpop 
queries for this very reason. Frankly speaking though, I'm not a fan of 
rolling our own storage mechanism, because the queries we have to support can 
get QUITE complex (see all the insanities you can do with /traversal, which 
are there for a reason and which are fully described by our query trees). 
Sooner or later, we'd IMHO re-implement in large part everything Titan has 
already implemented and I am not confident enough to say that we'd get 
anywhere near the performance and stability they could achieve (even if we had 
a couple of years of dev time they had).

If anyone is willing to take a stab at it though, all you need to do is to 
implement this interface (especially the methods with Query parameters are 
tricky):

https://github.com/hawkular/hawkular-inventory/blob/master/hawkular-inventory-api/src/main/java/org/hawkular/inventory/base/spi/InventoryBackend.java
 
> I know we could again get beaten up for NIH syndrome, but
> we're paying a high price for trying to use a 3rd party OS solution.
> The idea of requiring more than one database, and the baggage that goes
> with it, is not a pleasant thought.

That's why I'm pushing Sqlg actually. Even though it's 3rd party, it's simple 
enough to be taken over if it should be assassinated the same way Titan was.

> 
> > ----- Original Message -----
> > 
> >> I discussed some more with Lukas and looked around a bit.
> >> 
> >> Titan is basically the link between TinkerPop (graph API) and Cassandra.
> >> 
> >> I may have jumped in conclusion but:
> >> - Datastax (the company behind Cassandra) bought ThinkAurelius (the
> >> company
> >> behind Titan)
> >> - Since then Datastax built a product "inspired by Titan", a graph DB
> >> with
> >> TinkerPop and Cassandra. TThe product is closed-source and completely
> >> targeted to Cassandra. Datastax has no real incentive to maintain Titian
> >> as
> >> it competes with their product and all engineers stopped contributing to
> >> it. - Last release of Titan was in Sept 2015 (they used to release ~
> >> every 3 months)
> >> - While the community is relatively active, no Pull request was approved
> >> after June, and we don't know about any fork that is well maintained
> >> - It's a fairly large and complex piece of code, too large and too
> >> narrowed
> >> for us to take over.
> >> 
> >> Conclusion: medium/long term Titan is no longer an option.
> >> 
> >> The other concern is that there is no good solution to store to Cassandra
> >> which is the only storage dependence for Hawkular Services today (and was
> >> an important requirement).
> >> 
> >> So we have to make a difficult choice here but we don't seem to have many
> >> options if we stick with TinkerPop at least...
> >> 
> >> According to Lukas, Sqlg is a good option for the following reasons:
> >> - Performance
> >> - Size/complexity
> >> - The "community" is really small but the lead developer is responsive
> >> (and
> >> in the case he stops, it would be easier to fork and maintain).
> >> 
> >> The big drawback is that for production we would require Postgres (for
> >> non-prod or for Hawkular Services users who don't use the inventory
> >> service, we can use the embedded H2).
> >> 
> >> Thoughts ?
> >> 
> >> Thomas
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Thu, Jul 21, 2016 at 2:08 PM, Lukas Krejci < lkrejci at redhat.com >
> >> wrote:
> >> 
> >> 
> >> Hi all,
> >> 
> >> to move inventory forward, we need to port it to Tinkerpop3 - a new(ish)
> >> and actively maintained version of the Tinkerpop graph API.
> >> 
> >> Apart from the huge improvement in the API expressiveness and
> >> capabilities,
> >> the important thing is that it comes with a variety of backends, 2 of
> >> which
> >> are of particular interest to us ATM. The Titan backend (with Titan in
> >> version
> >> 1.0) and SQL backend (using the sqlg library).
> >> 
> >> The SQL backend is a much improved (yet still unfinished in terms of
> >> optimizations and some corner case features) version of the toy SQL
> >> backend
> >> for Tinkerpop2.
> >> 
> >> Back in March I ran performance comparisons for SQL/postgres and Titan
> >> (0.5.4)
> >> on Tinkerpop2 and concluded that Titan was the best choice then.
> >> 
> >> After completing a simplistic port of inventory to Tinkerpop3 (not taking
> >> advantage of any new features or opportunities to simplify inventory
> >> codebase), I've run the performance tests again for the 2 new backends -
> >> Titan
> >> 1.0 and Sqlg (on postgres).
> >> 
> >> This time the results are not so clear as the last time.
> >> 
> >> >From the charts [1] you can see that Postgres is actually quite a bit
> >> >faster>> 
> >> on reads and can better handle concurrent read access while Titan shines
> >> in
> >> writes (arguably thanks to Cassandra as its storage).
> >> 
> >> Of course, I can imagine that the read performance advantage of Postgres
> >> would
> >> decrease with the growing amount of data stored (the tests ran with the
> >> inventory size of ~10k entities) but I am quite positive we'd get
> >> competitive read performance from both solutions up to the sizes of
> >> inventory we anticipate (100k-1M entities).
> >> 
> >> Now the question is whether the insert performance is something we should
> >> be worried about in Postgres too much. IMHO, there should be some room
> >> for improvement in Sqlg and also our move to /sync for agent
> >> synchronization would
> >> make this less of a problem (because there would be not that many initial
> >> imports that would create vast amounts of entities).
> >> 
> >> Nevertheless I currently cannot say who is the "winner" here. Each
> >> backend
> >> has
> >> its pros and cons:
> >> 
> >> Titan:
> >> Pros:
> >> - high write throughput
> >> - backed by cassandra
> >> 
> >> Cons:
> >> - slower reads
> >> - project virtually dead
> >> - complex codebase (self-made fixes unlikely)
> >> 
> >> Sqlg:
> >> Pros:
> >> - small codebase
> >> - everybody knows SQL
> >> - faster reads
> >> - faster concurrent reads
> >> 
> >> Cons:
> >> - slow writes
> >> - another backend needed (Postgres)
> >> 
> >> Therefore my intention here is to go forward with a "proper" port to
> >> Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if
> >> we can do anything with the write performance.
> >> 
> >> IMHO, any choice we make is "workable" as it is even today but we need to
> >> weigh in the productization requirements. For those Sqlg with its small
> >> dep
> >> footprint and postgres backend seems preferable to the huge dependency
> >> mess
> >> of
> >> Titan.
> >> 
> >> [1] https://dashboards.ly/ua-TtqrpCXcQ3fnjezP5phKhc
> >> 
> >> --
> >> Lukas Krejci
> >> _______________________________________________
> >> hawkular-dev mailing list
> >> hawkular-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> hawkular-dev mailing list
> >> hawkular-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> > 
> > _______________________________________________
> > hawkular-dev mailing list
> > hawkular-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hawkular-dev
> 
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev


-- 
Lukas Krejci


More information about the hawkular-dev mailing list