[Hawkular-dev] [Inventory] Performance of Tinkerpop3 backends
Lukas Krejci
lkrejci at redhat.com
Thu Sep 8 19:00:01 EDT 2016
On Thursday, September 8, 2016 10:31:27 AM CEST Jay Shaughnessy wrote:
> On 9/8/2016 10:13 AM, John Mazzitelli wrote:
> > Considering that the agent has already hit a major problem where Titan is
> > most likely the cause, I do see that we need to move away from it.
> >
> > If we go to the next option (Sqlg), how difficult would it be to be able
> > to switch backends (embedded H2 versus external Postgres)?
> >
At the moment, it's a matter of changing the connection string and having the
appropriate plugins for sqlg (similar to Hibernate dialects) and jdbc drivers
on the classpath.
> > And how bad would it be to just support H2 for single-node Hawkular
> > Services installations (even if in production)?
> >
That's a question I haven't had time to answer yet. From previous testing
though I know that H2 is faster than Postgres (it doesn't pay the
serialization cost, is in-process, ...). Recently I compared perf of Titan and
Postgres (Sqlg) and writes were quite a bit slower on Postgres but the reads
though were actually faster on Postgres. Now these perf tests weren't really
scaled up - the inventory had ~ 10k resources and a single tenant. In
addition, I tried H2-backed Sqlg impl with agent itests and they consistently
pass (unlike with Titan-backed inventory, with which they fail most of the
time).
We need a truly scaled up perf test - tens of tenants, hunderds of feeds in
each, all writing and reading in parallel - to see how badly we can handle
that with any solution ;)
> > I don't think we'd want to try to cluster H2 DBs (is that even possible?)
> > in the case of a multi-node H-Services deployment, but for simple
> > deployments, why make people install and manage a separate Postgres
> > backend if we can avoid it? But still have the option to let the person
> > use Postgres if they want.
> >
> > Obviously, for when running in MiQ environments, we could point inventory
> > to MiQ's Postgres DB (and that's most likely the vast majority of
> > production deployments - if not all of them). But for dev environments, I
> > wonder if being able to use the embedded H2 DB would be easier.
> I'm not sure this is obvious. It's unclear to me that we could hijack
> the MIQ postgres db, maybe.
>
I'm not sure about that either and actually woudln't count on it.
> > In any event, we do need to move off of Titan soon - I can't merge this
> > until we do (or we fix Titan) -
> > https://github.com/hawkular/hawkular-agent/pull/249
> If there is any way we could find a way around this problem for the
> near-term, and maintain Titan for an initial release, it would seem like
> the safest approach until we find an alternative.
>
> How extensive is our use of Tinkerpop? Is it at all feasible for us to
> drop it and look into having our own C* persistence layer to support
> Inventory?
Right now Inventory has its own layer of "query trees" on top of tinkerpop
queries for this very reason. Frankly speaking though, I'm not a fan of
rolling our own storage mechanism, because the queries we have to support can
get QUITE complex (see all the insanities you can do with /traversal, which
are there for a reason and which are fully described by our query trees).
Sooner or later, we'd IMHO re-implement in large part everything Titan has
already implemented and I am not confident enough to say that we'd get
anywhere near the performance and stability they could achieve (even if we had
a couple of years of dev time they had).
If anyone is willing to take a stab at it though, all you need to do is to
implement this interface (especially the methods with Query parameters are
tricky):
https://github.com/hawkular/hawkular-inventory/blob/master/hawkular-inventory-api/src/main/java/org/hawkular/inventory/base/spi/InventoryBackend.java
> I know we could again get beaten up for NIH syndrome, but
> we're paying a high price for trying to use a 3rd party OS solution.
> The idea of requiring more than one database, and the baggage that goes
> with it, is not a pleasant thought.
That's why I'm pushing Sqlg actually. Even though it's 3rd party, it's simple
enough to be taken over if it should be assassinated the same way Titan was.
>
> > ----- Original Message -----
> >
> >> I discussed some more with Lukas and looked around a bit.
> >>
> >> Titan is basically the link between TinkerPop (graph API) and Cassandra.
> >>
> >> I may have jumped in conclusion but:
> >> - Datastax (the company behind Cassandra) bought ThinkAurelius (the
> >> company
> >> behind Titan)
> >> - Since then Datastax built a product "inspired by Titan", a graph DB
> >> with
> >> TinkerPop and Cassandra. TThe product is closed-source and completely
> >> targeted to Cassandra. Datastax has no real incentive to maintain Titian
> >> as
> >> it competes with their product and all engineers stopped contributing to
> >> it. - Last release of Titan was in Sept 2015 (they used to release ~
> >> every 3 months)
> >> - While the community is relatively active, no Pull request was approved
> >> after June, and we don't know about any fork that is well maintained
> >> - It's a fairly large and complex piece of code, too large and too
> >> narrowed
> >> for us to take over.
> >>
> >> Conclusion: medium/long term Titan is no longer an option.
> >>
> >> The other concern is that there is no good solution to store to Cassandra
> >> which is the only storage dependence for Hawkular Services today (and was
> >> an important requirement).
> >>
> >> So we have to make a difficult choice here but we don't seem to have many
> >> options if we stick with TinkerPop at least...
> >>
> >> According to Lukas, Sqlg is a good option for the following reasons:
> >> - Performance
> >> - Size/complexity
> >> - The "community" is really small but the lead developer is responsive
> >> (and
> >> in the case he stops, it would be easier to fork and maintain).
> >>
> >> The big drawback is that for production we would require Postgres (for
> >> non-prod or for Hawkular Services users who don't use the inventory
> >> service, we can use the embedded H2).
> >>
> >> Thoughts ?
> >>
> >> Thomas
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 21, 2016 at 2:08 PM, Lukas Krejci < lkrejci at redhat.com >
> >> wrote:
> >>
> >>
> >> Hi all,
> >>
> >> to move inventory forward, we need to port it to Tinkerpop3 - a new(ish)
> >> and actively maintained version of the Tinkerpop graph API.
> >>
> >> Apart from the huge improvement in the API expressiveness and
> >> capabilities,
> >> the important thing is that it comes with a variety of backends, 2 of
> >> which
> >> are of particular interest to us ATM. The Titan backend (with Titan in
> >> version
> >> 1.0) and SQL backend (using the sqlg library).
> >>
> >> The SQL backend is a much improved (yet still unfinished in terms of
> >> optimizations and some corner case features) version of the toy SQL
> >> backend
> >> for Tinkerpop2.
> >>
> >> Back in March I ran performance comparisons for SQL/postgres and Titan
> >> (0.5.4)
> >> on Tinkerpop2 and concluded that Titan was the best choice then.
> >>
> >> After completing a simplistic port of inventory to Tinkerpop3 (not taking
> >> advantage of any new features or opportunities to simplify inventory
> >> codebase), I've run the performance tests again for the 2 new backends -
> >> Titan
> >> 1.0 and Sqlg (on postgres).
> >>
> >> This time the results are not so clear as the last time.
> >>
> >> >From the charts [1] you can see that Postgres is actually quite a bit
> >> >faster>>
> >> on reads and can better handle concurrent read access while Titan shines
> >> in
> >> writes (arguably thanks to Cassandra as its storage).
> >>
> >> Of course, I can imagine that the read performance advantage of Postgres
> >> would
> >> decrease with the growing amount of data stored (the tests ran with the
> >> inventory size of ~10k entities) but I am quite positive we'd get
> >> competitive read performance from both solutions up to the sizes of
> >> inventory we anticipate (100k-1M entities).
> >>
> >> Now the question is whether the insert performance is something we should
> >> be worried about in Postgres too much. IMHO, there should be some room
> >> for improvement in Sqlg and also our move to /sync for agent
> >> synchronization would
> >> make this less of a problem (because there would be not that many initial
> >> imports that would create vast amounts of entities).
> >>
> >> Nevertheless I currently cannot say who is the "winner" here. Each
> >> backend
> >> has
> >> its pros and cons:
> >>
> >> Titan:
> >> Pros:
> >> - high write throughput
> >> - backed by cassandra
> >>
> >> Cons:
> >> - slower reads
> >> - project virtually dead
> >> - complex codebase (self-made fixes unlikely)
> >>
> >> Sqlg:
> >> Pros:
> >> - small codebase
> >> - everybody knows SQL
> >> - faster reads
> >> - faster concurrent reads
> >>
> >> Cons:
> >> - slow writes
> >> - another backend needed (Postgres)
> >>
> >> Therefore my intention here is to go forward with a "proper" port to
> >> Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if
> >> we can do anything with the write performance.
> >>
> >> IMHO, any choice we make is "workable" as it is even today but we need to
> >> weigh in the productization requirements. For those Sqlg with its small
> >> dep
> >> footprint and postgres backend seems preferable to the huge dependency
> >> mess
> >> of
> >> Titan.
> >>
> >> [1] https://dashboards.ly/ua-TtqrpCXcQ3fnjezP5phKhc
> >>
> >> --
> >> Lukas Krejci
> >> _______________________________________________
> >> hawkular-dev mailing list
> >> hawkular-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> hawkular-dev mailing list
> >> hawkular-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/hawkular-dev
> >
> > _______________________________________________
> > hawkular-dev mailing list
> > hawkular-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hawkular-dev
--
Lukas Krejci
More information about the hawkular-dev
mailing list