On Thursday, September 8, 2016 10:31:27 AM CEST Jay Shaughnessy wrote:
On 9/8/2016 10:13 AM, John Mazzitelli wrote:
> Considering that the agent has already hit a major problem where Titan is
> most likely the cause, I do see that we need to move away from it.
>
> If we go to the next option (Sqlg), how difficult would it be to be able
> to switch backends (embedded H2 versus external Postgres)?
>
At the moment, it's a matter of changing the connection string and having the
appropriate plugins for sqlg (similar to Hibernate dialects) and jdbc drivers
on the classpath.
> And how bad would it be to just support H2 for single-node
Hawkular
> Services installations (even if in production)?
>
That's a question I haven't had time to answer yet. From previous testing
though I know that H2 is faster than Postgres (it doesn't pay the
serialization cost, is in-process, ...). Recently I compared perf of Titan and
Postgres (Sqlg) and writes were quite a bit slower on Postgres but the reads
though were actually faster on Postgres. Now these perf tests weren't really
scaled up - the inventory had ~ 10k resources and a single tenant. In
addition, I tried H2-backed Sqlg impl with agent itests and they consistently
pass (unlike with Titan-backed inventory, with which they fail most of the
time).
We need a truly scaled up perf test - tens of tenants, hunderds of feeds in
each, all writing and reading in parallel - to see how badly we can handle
that with any solution ;)
> I don't think we'd want to try to cluster H2 DBs (is
that even possible?)
> in the case of a multi-node H-Services deployment, but for simple
> deployments, why make people install and manage a separate Postgres
> backend if we can avoid it? But still have the option to let the person
> use Postgres if they want.
>
> Obviously, for when running in MiQ environments, we could point inventory
> to MiQ's Postgres DB (and that's most likely the vast majority of
> production deployments - if not all of them). But for dev environments, I
> wonder if being able to use the embedded H2 DB would be easier.
I'm not sure this is obvious. It's unclear to me that we could hijack
the MIQ postgres db, maybe.
I'm not sure about that either and actually woudln't count on it.
> In any event, we do need to move off of Titan soon - I can't
merge this
> until we do (or we fix Titan) -
>
https://github.com/hawkular/hawkular-agent/pull/249
If there is any way we could find a way around this problem for the
near-term, and maintain Titan for an initial release, it would seem like
the safest approach until we find an alternative.
How extensive is our use of Tinkerpop? Is it at all feasible for us to
drop it and look into having our own C* persistence layer to support
Inventory?
Right now Inventory has its own layer of "query trees" on top of tinkerpop
queries for this very reason. Frankly speaking though, I'm not a fan of
rolling our own storage mechanism, because the queries we have to support can
get QUITE complex (see all the insanities you can do with /traversal, which
are there for a reason and which are fully described by our query trees).
Sooner or later, we'd IMHO re-implement in large part everything Titan has
already implemented and I am not confident enough to say that we'd get
anywhere near the performance and stability they could achieve (even if we had
a couple of years of dev time they had).
If anyone is willing to take a stab at it though, all you need to do is to
implement this interface (especially the methods with Query parameters are
tricky):
https://github.com/hawkular/hawkular-inventory/blob/master/hawkular-inven...
I know we could again get beaten up for NIH syndrome, but
we're paying a high price for trying to use a 3rd party OS solution.
The idea of requiring more than one database, and the baggage that goes
with it, is not a pleasant thought.
That's why I'm pushing Sqlg actually. Even though it's 3rd party, it's
simple
enough to be taken over if it should be assassinated the same way Titan was.
> ----- Original Message -----
>
>> I discussed some more with Lukas and looked around a bit.
>>
>> Titan is basically the link between TinkerPop (graph API) and Cassandra.
>>
>> I may have jumped in conclusion but:
>> - Datastax (the company behind Cassandra) bought ThinkAurelius (the
>> company
>> behind Titan)
>> - Since then Datastax built a product "inspired by Titan", a graph DB
>> with
>> TinkerPop and Cassandra. TThe product is closed-source and completely
>> targeted to Cassandra. Datastax has no real incentive to maintain Titian
>> as
>> it competes with their product and all engineers stopped contributing to
>> it. - Last release of Titan was in Sept 2015 (they used to release ~
>> every 3 months)
>> - While the community is relatively active, no Pull request was approved
>> after June, and we don't know about any fork that is well maintained
>> - It's a fairly large and complex piece of code, too large and too
>> narrowed
>> for us to take over.
>>
>> Conclusion: medium/long term Titan is no longer an option.
>>
>> The other concern is that there is no good solution to store to Cassandra
>> which is the only storage dependence for Hawkular Services today (and was
>> an important requirement).
>>
>> So we have to make a difficult choice here but we don't seem to have many
>> options if we stick with TinkerPop at least...
>>
>> According to Lukas, Sqlg is a good option for the following reasons:
>> - Performance
>> - Size/complexity
>> - The "community" is really small but the lead developer is
responsive
>> (and
>> in the case he stops, it would be easier to fork and maintain).
>>
>> The big drawback is that for production we would require Postgres (for
>> non-prod or for Hawkular Services users who don't use the inventory
>> service, we can use the embedded H2).
>>
>> Thoughts ?
>>
>> Thomas
>>
>>
>>
>>
>>
>> On Thu, Jul 21, 2016 at 2:08 PM, Lukas Krejci < lkrejci(a)redhat.com >
>> wrote:
>>
>>
>> Hi all,
>>
>> to move inventory forward, we need to port it to Tinkerpop3 - a new(ish)
>> and actively maintained version of the Tinkerpop graph API.
>>
>> Apart from the huge improvement in the API expressiveness and
>> capabilities,
>> the important thing is that it comes with a variety of backends, 2 of
>> which
>> are of particular interest to us ATM. The Titan backend (with Titan in
>> version
>> 1.0) and SQL backend (using the sqlg library).
>>
>> The SQL backend is a much improved (yet still unfinished in terms of
>> optimizations and some corner case features) version of the toy SQL
>> backend
>> for Tinkerpop2.
>>
>> Back in March I ran performance comparisons for SQL/postgres and Titan
>> (0.5.4)
>> on Tinkerpop2 and concluded that Titan was the best choice then.
>>
>> After completing a simplistic port of inventory to Tinkerpop3 (not taking
>> advantage of any new features or opportunities to simplify inventory
>> codebase), I've run the performance tests again for the 2 new backends -
>> Titan
>> 1.0 and Sqlg (on postgres).
>>
>> This time the results are not so clear as the last time.
>>
>> >From the charts [1] you can see that Postgres is actually quite a bit
>> >faster>>
>> on reads and can better handle concurrent read access while Titan shines
>> in
>> writes (arguably thanks to Cassandra as its storage).
>>
>> Of course, I can imagine that the read performance advantage of Postgres
>> would
>> decrease with the growing amount of data stored (the tests ran with the
>> inventory size of ~10k entities) but I am quite positive we'd get
>> competitive read performance from both solutions up to the sizes of
>> inventory we anticipate (100k-1M entities).
>>
>> Now the question is whether the insert performance is something we should
>> be worried about in Postgres too much. IMHO, there should be some room
>> for improvement in Sqlg and also our move to /sync for agent
>> synchronization would
>> make this less of a problem (because there would be not that many initial
>> imports that would create vast amounts of entities).
>>
>> Nevertheless I currently cannot say who is the "winner" here. Each
>> backend
>> has
>> its pros and cons:
>>
>> Titan:
>> Pros:
>> - high write throughput
>> - backed by cassandra
>>
>> Cons:
>> - slower reads
>> - project virtually dead
>> - complex codebase (self-made fixes unlikely)
>>
>> Sqlg:
>> Pros:
>> - small codebase
>> - everybody knows SQL
>> - faster reads
>> - faster concurrent reads
>>
>> Cons:
>> - slow writes
>> - another backend needed (Postgres)
>>
>> Therefore my intention here is to go forward with a "proper" port to
>> Tinkerpop3 with Titan still enabled but focus primarily on Sqlg to see if
>> we can do anything with the write performance.
>>
>> IMHO, any choice we make is "workable" as it is even today but we need
to
>> weigh in the productization requirements. For those Sqlg with its small
>> dep
>> footprint and postgres backend seems preferable to the huge dependency
>> mess
>> of
>> Titan.
>>
>> [1]
https://dashboards.ly/ua-TtqrpCXcQ3fnjezP5phKhc
>>
>> --
>> Lukas Krejci
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>>
>>
>>
>>
>> _______________________________________________
>> hawkular-dev mailing list
>> hawkular-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
>
> _______________________________________________
> hawkular-dev mailing list
> hawkular-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev
--
Lukas Krejci