tl;dr - This is gonna be fun (in the worst sense of the word)
We had a very good discussion with Stefan and John from the metrics team and I
think we identified the main problematic areas of the cassandra backend and
brainstormed possible solutions to them.
1) Inconsistencies
Today, inventory guarantees that if some entity has a certain sync hash, the
users can be sure that the entity and all its children are in a certain state
(names, configurations, defined operations, etc, all exactly match). This is
also used to identify identical stuff across feeds (i.e. if find all identical
types defined by various feeds and operate on them as if they were a single
type - /traversal/f;feed/rt;Type/identical/rl;defines/type=resource). This can
no longer be guaranteed in Cassandra. We could try to overcome this by trying
several aproaches:
a) using a "staging" area for updates (copy current state, apply changes and
then "replace" the live area with staging) but that essentially means
implementing serialized transactions on top of an eventually consistent
storage - something I am not completely ecstatic about given the manpower and
time constraints we have.
b) Essentially considering C* as a blob store and just dump serialized
(portions of) the graph to it, with all the processing being done in memory on
the inventory server. This still means we have to implement transactional
behavior on our own (albeit in memory, not in C*) and it still means that
stored data could conflict if inventory was clustered.
c) Give up consistent sync and just write everything all the time.
Inconsistencies will arise because sync doesn't touch external relationships
(feed doesn't know that some glue code discovered that a war is part of
"something bigger", be it a cluster, a logical app, whatever). At that point
we can also outright get rid of the hashes, because they will never be
guaranteed to be consistent. This means that we will no longer be able to tell
whether two feeds define the same resource types, because that depends on the
rt's having the same hash.
2) Performant Traversals
Right now I am trying to implement a naive approach to graph traversals where
each "hop" between nodes of the graph is represented by (at least one) query
(possibly there can be very many queries for a single hop if it is required to
retrieve results per every incoming vertex in the traversal). This has been
identified as a potential performance problem.
The only "remedy" suggested for this was to consider 1b) - just store the
whole portions of the graph as a "blob" in C* and do the processing in-memory.
This scares me a little bit because it opens up many possibilities for
operating on stale/incorrect data, raises the question of how to "partition"
the graph (more granularly than by tenant) and at the same time avoid
complexity of handling inter-partition relationships, etc.
3) Conclusion
We will start with a naive implementation with no guarantees of consistency
and will try to identify the concrete problematic areas of the code (the above
already hints at some we assume will cause problems). Then we will try to
modify the implementation/storage model/functionality/guarantees iteratively
to fix the concrete problems identified.
Lukas
On Wednesday, November 30, 2016 12:08:05 PM CET theute(a)redhat.com wrote:
I can't join today, Heiko neither.
Feel free to go ahead with the call though and please send a feedback on
hawkular-dev.
Storing Inventory data in Cassandra
Currently, we're storing inventory data in an SQL database. Metrics on the
other hand store the data in Cassandra. We're exploring how to unify the
storage backends for Hawkular components and hence the title.
We'll use
https://docs.google.com/document/d/1Lgv8WE1j0r7rir5hTpV-xutKFChyoNPEH3XSz... as a
starting point for the discussion.
To join the Meeting:
https://bluejeans.com/8169978803
To join via Browser:
https://bluejeans.com/8169978803/browser
To join with Lync:
https://bluejeans.com/8169978803/lync
To join via Room System:
Video Conferencing System: bjn.vc -or-199.48.152.152
Meeting ID : 8169978803
To join via phone :
1) Dial:
+44 203 574 6870
(see all numbers -
https://www.intercallonline.com/
listNumbersByCode.action?confCode=8169978803)
2) Enter Conference ID : 8169978803
When
Wed Nov 30, 2016 3pm – 4pm Zurich
Where
https://bluejeans.com/8169978803 (map)
Who
•
lkrejci(a)redhat.com - creator
•
jsanda(a)redhat.com
•
jtakvori(a)redhat.com
•
hawkular-dev(a)lists.jboss.org
--
Lukas Krejci