RfC: Layered Hawkular-services vs packaged one
by Heiko W.Rupp
Hey,
right now when we deploy Hawkular-services (H-S) on OpenShift, we run
into a situation where the user may already have deployed
Hawkular-Metrics (HAM) in OpenShift[1] and thus by deploying H-S will
end up in a situation where HAM is deployed twice - once for the
platform and once for H-S.
One solution could be to 'just' deploy H-S in OpenShift instead of HAM,
but this has some drawbacks
- larger deployment
- inclusion of parts that are not needed in 'classic' OpenShift
- different security model for OS-HAM than for H-Services
Another option could be to logically split and layer H-S:
- Have a H-S container (H-S-2) , that does not contain HAM
- This container would provide everything of H-S without HAM
- Calls to H-S-2 HAM are forwarded to OS-HAM
Of course there is no such thing as a free lunch:
- need to reserve the 'hawkular' tenant in OS-Metrics
- OS-Metrics has a different security concept
H-S-2 could act as a proxy that receives calls to HAM from agents
and clients, but 'translates' credentials and then forwards the calls to
OS-HAM
Does the above idea make any sense?
I am sure I am missing a ton of items in the above list
Heiko
[1] (e.g. with oc cluster up --metrics=true)
7 years, 6 months
Some food for thought about improving the release of (large) features
by Heiko W.Rupp
Hey,
some of us just had a meeting to recapture parts of the switch from
Inventory.v2 to .v3, where things went less easy (on the java side) than
I expected.
We identified a few areas where we could improve:
- Timeouts. Some tests were failing on local machines but not on travis
(and we had seen that in the direction in the past as well). We need to
be better at not assuming timing, as we can't know timing in the target
environments as well.
Similarly the test against live server was waiting 500*a few seconds
until inventory(.old) came up. Some waiting is good, but the question is
if e.g. inventory does not come up after some reasonable time, if we
should not abort the test as this may show real issues.
- Test reliability (the above is part of this). We need to try to have
more unit and also integration tests and make them more reliable. During
the merge we saw test failures on developer machines while Travis was
good. It turned out that this was due to timing. In the (RHQ) past we
saw test failures because of test ordering. We should perhaps try to
make our (integration) tests in random order on purpose, as in reality,
the user will not run the code in the order we assume in tests either
(yes, that may make setup and tear-down more complex).
- Making tests more end-to-end. Right now we have no idea (from the java
side) about the consequences of e.g. renaming a resource in the agent to
the display of this resource in ManageIQ. Luckily we already have the
ruby-gem tests that run against the live server. Perhaps we can extend
this somehow into MiQ test suite, so that this also tests against latest
hawkular-services master. Or record some interactions of MiQ with
H-services via the gem and have those interactions be re-played against
the live server (there will be a need for placeholders, but that is
something that cassettes already support)
- Way of working for such all-over changes: We were talking that in this
case it could be good to do that in a series of feature branches which
can use src-deps so that the feature branches all applied give the
desired new state. And only if all that is good, send pull-requests and
apply them to merge the full stream of work into master and get releases
of the components out.
7 years, 6 months