[
https://issues.jboss.org/browse/RTGOV-425?page=com.atlassian.jira.plugin....
]
Gary Brown commented on RTGOV-425:
----------------------------------
Further requirements:
We need to make sure that local resources (i.e. disk space) are not consumed by the remote
server being down for a long period of time, and the information just accumulating
locally. So some thresholds need to be defined, with suitable notification mechanism, to
ensure administrators are kept informed of the issue.
Suitable thresholds may be related to the size of the information that has been reported,
size of remaining space (e.g. filesystem) getting below a certain level, length of time
oldest information has been stored without successful retry, etc.
We need to make sure, once past a threshold, that the notifications are not continuously
sent - only at the point where the threshold is breached.
Persistent retry capability
---------------------------
Key: RTGOV-425
URL:
https://issues.jboss.org/browse/RTGOV-425
Project: RTGov (Run Time Governance)
Issue Type: Feature Request
Reporter: Gary Brown
Assignee: Gary Brown
Fix For: 2.0.0.Final
In various parts of the rtgov architecture, information is reported to a remote service,
and if that service is not available, the information is currently discarded.
This task is to provide an infrastructure capability (as part of overlord-commons) that
can store information that cannot currently be reported to the remote service, and have it
replayed once the remote service becomes available.
Some of the characteristics required are:
* When the primary component (reporting the information to the remote sevice) first
starts up, it should register itself with the "persistent retry" capability,
providing a handler that can be used to retry sending information to the remote service.
* When a failure occurs, attempting to send information to the remote service, the
primary component will invoke the "persistent retry" capability to handle the
information.
* The persistent retry capability will store information in a persistent form (initially
file system, but other dbs etc should be supported in the future).
* When registering with the persistent retry capability, the primary component should be
able to register a retry strategy. One implementation of such a strategy could be:
** Periodically attempt to send retry information to its remote service via the
registered handler.
** If unsuccessful, then back off and wait for the next cycle.
** If successful, then schedule another task to send the next information for that
service until either a failure occurs, in which case wait for the next retry cycle, or no
further information exists. A thread pool should be used across all retry handlers.
Investigate whether there are existing open source solutions for this problem. If not,
then as part of building the solution, need to ensure that it is as efficient as possible
in a multi-thread situation - so (for example) when looking at a persistent file system
store, need to ensure that a suitable mechanism is used that caters for the multi-threaded
updates (i.e. writing new records and clearing successfully retried records). Possibly a
file based db may be one option to investigate.
JUnit tests need to be written that test the solution in a multi-threaded scenario.
Profiling should be performed to understand the impact of this component on normal
operations, and also when retries are being performed concurrently with new information
being reported.
Places where this mechanism may be used:
1) Activity collector
2) ElasticSearch KeyValueStore
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira