https://github.com/hawkular/hawkular-apm/pull/456
Hawkular APM uses JMS as its event processing backbone, with events being passed around in
batches. When some or all of a batch of events fails to be processed by a particular
component, it has a retry mechanism that resubmits the failed events to be processed again
(subject to a max retry count).
The problem with the current approach is that it is resubmitting the events to the source
topic upon which the events were received - however this topic may also be used by other
processing components that had processed the events successfully. For example, processing
components A and B subscribe to a particular topic T - A processes all of the events in
batch B1 successfully, and B only processes half, retrying the other half. When the failed
half are published back to topic T again, then A receives them a second time, even though
they were processed successfully the first time.
Therefore the retry mechanism needs to be more targetted, so that the resubmitted events
are only processed by the component that failed to process the events before. There are
two possible ways to do this:
1) Define a retry queue - this means doubling up on all of the subscribers, so that each
subscriber (i.e. processing capability) would have a topic subscriber to receive the
initial events, and a retry queue to handle failed attempts. Its possible a single retry
queue could be used with a message selector to distinguish the target processing unit.
2) Use a message selector on the original topic - so in this case, no more destinations
need to be added, the initial message has no property defined so will be received by all
topic subscribers, whereas when the message is resubmitted, it will have a target
subscriber named, which will be routed through to the correct subscriber based on a
message selector.
This PR implements this second approach. On initial tests the performance hit of using a
message selector appears negligible, although this was my initial concern. If this does
prove to be an issue in the future, then the single queue with message selector approach
in (1) could be used.
Anyone able to review/merge the PR for me?
Thanks in advance.
Regards
Gary