Re: [Hawkular-dev] Hawkular Metrics + Hawkular Alerts - Phase 0

Tuesday, 10 November 2015

...
 On Nov 10, 2015, at 12:57 PM, Stefan Negrea
<snegrea(a)redhat.com&gt; wrote:

 The goal of this phase is to explore a few possibilities, come up with a decent solution,
prototype and get feedback. I know there are flaws in all the proposals, otherwise we
would have selected one already. I would like to refocus the discussion on working towards
a concrete proposal to implement.

>> 
>> There are two proposals in the document and both can achieve this:
>> 1) Collocate one Metrics and one Alert instance in every single jvm (or
>> container) and then make use of the existing public REST API over the
>> local interface to communicate between the two components.
>> a) No changes to the code in either service, just invoke the local public
>> API interface of the other service when required to communicate.
> 
> There should be no changes to the service implementations regardless of how
> components communicate. When changes were recently made in H-Metrics to push
> data onto the bus, there was no change in the service, just changes to
> support forwarding data to other interested parties. Some sort of changes
> will be necessary to forward data, regardless of whether we forward data
> using the REST API, JMS, or something else.

 I would like to avoid as much as possible having fragmented public APIs for a component.
With so many components, that will be our nemesis. Think about publishing those APIs in a
1.0 release and having to maintain them, deprecate them, and fix them. If we can keep a
single unified public interface into a service, we will be able to spend time focused on
features, not on maintenance. A single public API is much better than having special
purpose APIs over different channels. I do not want to have to answer questions like,
"is the tenant-id required when I insert metrics over JMS?", or "what is
the request structure for querying metrics over UDP?". 

 Now, realistically we will have to optimize for performance, hence idea of the
"fireshose" as described in the document. But selecting to move something to the
"firehose" should be a concious decision made for very specific reasons; the
default should be public published API.

> 
>> b) Publishing of metrics insertion events will happen over the REST
>> interface too; no longer use JMS.
>> c) Distribution and load-balancing is done at the component level. For
>> example: if Alerts needs to distribute the load across all the nodes
>> deployed, it will be doing with its own mechanism.
>> d) The challenge here is backpressure and availability of services. The
>> expectation is that the collocated services will be able to handle the
>> load that goes from one instance to the other.
> 
> Sorry, but I think that is a naive expectation. We are talking about a
> distributed environment in which we want to be scalable. That to me means
> expect and plan for failure not *if* but *when*, so please tell me how we
> handle load/backpressure when do everything, including inter-component
> communication, via REST. I suspect we will end up implementing functionality
> already provided by messaging systems.

 But here is another more simpler to look based on the first proposal. Since there are
multiple instances deployed, the expectation is that there will be a proxy in front of
them. If the proxy is round-robin (or any other load balancing technique), that means any
particular M+A instance will get hit randomly. If one of them fails, the user will have a
failure returned, and then retry the request which will be routed to a different server.
We add logic to note and count these failures in a M+A instance and shut it down after it
reaches a threshold. 

 From the discussions with Alerts folks, there are plans to implement a load distribution
mechanism only for their purpose. Metrics is stateless at this stage and from the early
designs will most like implement a different load distribution mechanism. So what does the
bus add? Communication redundancy? We can have the user retry a failed attempt if a
particular M+A instance is no longer viable.

 So the inconvenience here is the user might have to retry an operation. While we gain on
the API front by using a single public API that is well known, documented, and tested.
Trade-offs, trade-offs ...

> 
>> 
>> 2) Use JMS as the communication channel and then publish the REST API over
>> JMS. And by "publishing the REST API over JMS", I mean making every
single
>> public end-point available over the JMS channel so there is no need in
>> future phase to tinker with this. If we put in place a generic solution,
>> we will never have to touch this again and we will have the side-effect of
>> never having to think what is available between the two services; we will
>> just focus on building features.
>> a) No changes to the way metric insertion events are published, they will
>> keep going over JMS. Future event types will go over JMS too.
>> b) Metrics and Alerts components talk over JMS exclusively; the
>> request-reply communication will most likely happen over temporary queue.
>> c) The bus will facilitate some (or all) of the load distribution between
>> components.
>> c) The challenge with this approach is how to publish the public API over
>> JMS. It's actually easy and I have a prototype, so it's just a matter of
>> exploring more and finding a reasonable solution.
>> 
>> We could also do a hybrid: collocated services, keep JMS for metric
>> insertion events, everything else over local REST calls.
>> 
>> 
>> After this long introduction, there are three main reasons the current
>> solution needs improvements:
>> 
>> 1) Addressability -> the current solution does not work in the distributed
>> environment because there is no clear way how to access the public API of
>> the services deployed. Let's say the installation spread across 5
>> containers. How can I make a public API call from a Metrics instance to an
>> Alerts instance. There is no directory to know where the Alerts or Metrics
>> instances are deployed.
> 
> Addressability is provided by the messaging system. There is no need for a
> directory. You just to need to communicate with the messaging server/broker.
> Beyond that there are a lot of features around addressability and routing
> such as message selectors, message grouping, hierarchical topics, and more.

 Both solutions solve this in a form or another; it's about the trade-offs and what
choices to make.

> 
>> 2) Load distribution -> there is no clear way on how the load distribution
>> works or who is responsible for it.
> 
> Again, this is largely handled by the messaging system. Multiple consumers
> take messages from a queue where each message corresponds to work to be
> done.

 JMS solves load distribution for messaging, nothing more. But as a tread-off we will need
to make the public API available over JMS if we are to go with the second proposal.

> 
>> 3) Availability of the existing public API -> There is no reason to
>> implement a new API just for the purposes of communicating between the
>> components. Given that we strive for this micro-service architecture, the
>> one and single public API should be the main method for communicating
>> between components for request-reply.
> 
> I do not think it is a given that we strive for a micro-service architecture.
> It might make more sense in an OpenShift environment, but I don’t think it
> necessarily does in general.
> 
> 
>> We might need to extend what we have but the public API should be front and
>> centre. So if we use JMS or HTTP (or both, or UDP, or any other method),
>> the public API interface should be available over all the channels. Sure
>> there might be difference on how to make a request in JMS vs HTTP (one
>> with temporary queues, and the other with direct http connections) but the
>> functionality should be identical.
> 
> I don’t know that I agree with this. Suppose we decide to offer an API for
> inserting metric data over UDP to support really high throughput situations.
> Are you suggesting for example that the operations we provide via REST for
> reading metric data should also be available via the UDP API? And since the
> motivation for a UDP API is performance/throughput, we might event want to a
> more compact request format than JSON.

 As I said early, unless this particular API is upgraded to "firehose", it
should have an identical structure as the HTTP REST counter-part. The emphasis here is
"structure", such as routing address, request content, and response content. For
example, if tenant-id is required on one it should be required on both.

> 
> Lastly and most importantly, if you want push for an alternative
> communication mechanism between components in H-Metrics and H-Alerts, then
> you push for the same across all of Hawkular because it does not make to
> have to support two different mechanisms.

 I do not understand this contention point. The alternatives on the table are JMS and REST
over HTTP. Both of these are in Hawkular today. All what the document proposes up to this
point is to take one of those alternatives and deliver more functional on top of that
foundation (I outlined in my previous email the differences between alternatives).

 I also mentioned in my previous email the hybrid approach: collocated services, keep JMS
for metric insertion events, keep all the other inter-component communication over local
REST calls (it's a mix between 1 and 2 above). That is exactly Hawkular with an
extension to add needed pieces, such as event filtering and implementing the features
purposed in the document. Do you see even this proposal a departure from the current
Hawkular architecture? 
Yes, I do see the proposals as a departure from the current Hawkuar architecture which I
think is the most important aspect of the whole discussion. Let’s review the current state
of Hawkular with respect to the discussion. The public APIs in Hawkular are the REST APIs.
The REST APIs are the only “supported” public APIs. Components and services in Hawkular
communicate with each other via the bus, i.e., JMS. To the best of my knowledge,the APIs
we implement over JMS are for internal communication between components running in the
Hawkular server (regardless of whether those components are co-located in or separate
processes). I am not aware of any other plans, discussions, etc., to have the all of the
REST API also expose via JMS. Based on these things, it does seem like your proposals are
a departure from the current architecture. 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Hawkular-dev] Hawkular Metrics + Hawkular Alerts - Phase 0