On Nov 10, 2015, at 11:25 AM, Stefan Negrea
<snegrea(a)redhat.com> wrote:
>
> Can you elaborate on why what is in place today will not work with a
> distributed environment with multiple Metrics and Alerts deployments?
The user-facing feature proposed in the document will require request-reply communication
with endpoints in Alerts. Also, there is the part about introducing a filtering mechanism
for publishing only metrics of interest to Alerts. This also requires request-reply
communication between the two services; the end-points are not implemented in any of the
services but I do not see a reason not to make them in public REST API.
So just for Phase 0, we will need a good number (5 or 6) of request-reply operations
exposed over the communication channel. I am 100% sure in Phase 1 we will come up with
some more features and will require more request-reply operations, so we might as well
extend this to the all the public API in Phase 0. A goal for Phase 0 as stated in the
document is to extend the public API currently in place in both service over the
communication channel that we chose.
There are two proposals in the document and both can achieve this:
1) Collocate one Metrics and one Alert instance in every single jvm (or container) and
then make use of the existing public REST API over the local interface to communicate
between the two components.
a) No changes to the code in either service, just invoke the local public API interface
of the other service when required to communicate.
There should be no changes to the service implementations regardless of how components
communicate. When changes were recently made in H-Metrics to push data onto the bus, there
was no change in the service, just changes to support forwarding data to other interested
parties. Some sort of changes will be necessary to forward data, regardless of whether we
forward data using the REST API, JMS, or something else.
b) Publishing of metrics insertion events will happen over the REST
interface too; no longer use JMS.
c) Distribution and load-balancing is done at the component level. For example: if Alerts
needs to distribute the load across all the nodes deployed, it will be doing with its own
mechanism.
d) The challenge here is backpressure and availability of services. The expectation is
that the collocated services will be able to handle the load that goes from one instance
to the other.
Sorry, but I think that is a naive expectation. We are talking about a distributed
environment in which we want to be scalable. That to me means expect and plan for failure
not *if* but *when*, so please tell me how we handle load/backpressure when do everything,
including inter-component communication, via REST. I suspect we will end up implementing
functionality already provided by messaging systems.
2) Use JMS as the communication channel and then publish the REST API over JMS. And by
"publishing the REST API over JMS", I mean making every single public end-point
available over the JMS channel so there is no need in future phase to tinker with this. If
we put in place a generic solution, we will never have to touch this again and we will
have the side-effect of never having to think what is available between the two services;
we will just focus on building features.
a) No changes to the way metric insertion events are published, they will keep going over
JMS. Future event types will go over JMS too.
b) Metrics and Alerts components talk over JMS exclusively; the request-reply
communication will most likely happen over temporary queue.
c) The bus will facilitate some (or all) of the load distribution between components.
c) The challenge with this approach is how to publish the public API over JMS. It's
actually easy and I have a prototype, so it's just a matter of exploring more and
finding a reasonable solution.
We could also do a hybrid: collocated services, keep JMS for metric insertion events,
everything else over local REST calls.
After this long introduction, there are three main reasons the current solution needs
improvements:
1) Addressability -> the current solution does not work in the distributed environment
because there is no clear way how to access the public API of the services deployed.
Let's say the installation spread across 5 containers. How can I make a public API
call from a Metrics instance to an Alerts instance. There is no directory to know where
the Alerts or Metrics instances are deployed.
Addressability is provided by the messaging system. There is no need for a directory. You
just to need to communicate with the messaging server/broker. Beyond that there are a lot
of features around addressability and routing such as message selectors, message grouping,
hierarchical topics, and more.
2) Load distribution -> there is no clear way on how the load
distribution works or who is responsible for it.
Again, this is largely handled by the messaging system. Multiple consumers take messages
from a queue where each message corresponds to work to be done.
3) Availability of the existing public API -> There is no reason
to implement a new API just for the purposes of communicating between the components.
Given that we strive for this micro-service architecture, the one and single public API
should be the main method for communicating between components for request-reply.
I do not think it is a given that we strive for a micro-service architecture. It might
make more sense in an OpenShift environment, but I don’t think it necessarily does in
general.
We might need to extend what we have but the public API should be
front and centre. So if we use JMS or HTTP (or both, or UDP, or any other method), the
public API interface should be available over all the channels. Sure there might be
difference on how to make a request in JMS vs HTTP (one with temporary queues, and the
other with direct http connections) but the functionality should be identical.
I don’t know that I agree with this. Suppose we decide to offer an API for inserting
metric data over UDP to support really high throughput situations. Are you suggesting for
example that the operations we provide via REST for reading metric data should also be
available via the UDP API? And since the motivation for a UDP API is
performance/throughput, we might event want to a more compact request format than JSON.
Lastly and most importantly, if you want push for an alternative communication mechanism
between components in H-Metrics and H-Alerts, then you push for the same across all of
Hawkular because it does not make to have to support two different mechanisms.