My question here was triggered by a specific case in Hibernate Search
but it applies well to ORM's datasources, caches, and very much to OGM
as well.
When creating an index on Elasticsearch, the index is not
"instantaneously" ready.
The REST request creates the definition, but the response will only
tell if the request to create it was accepted. Elasticsearch will then
start the creation process and gradually upgrades the index status to
"yellow" and finally "green" depending on its ability to quickly
propagate the needed changes across the clusters; it might even fail
and get to a status "Red".
Our current approach is:
- send a request to define the index
- start using it
Which is probably following our traditional pattern with static
systems, but becomes a naive approach in this modern world of dynamic
services.
Our approach works *almost* fine in the singleton nodes which we're
using for testing but it's not suited for a real cluster.
Even in our integration tests, it might happen that we didn't give it
enough time to boot.
Someone else asked me recently how to make Hibernate ORM not fail to
boot when he's starting VMs containing the database and the
application in parallel or in no specific order: sometimes it would
happen that ORM would attempt to connect before the RDBMs would have
started; all he needed was to have ORM stall and wait some seconds.
Often even starting the RDBMs VM first isn't good enough, as the VM
reports "started" but the DB might be needing to finish some
maintenance tasks.. Kubernetes provides hooks to check for actual
services to being ready but people seem to expect that Hibernate could
deal with some basics too.
In that case AFAIR my suggestion was that this could be solved as an
implementation detail of the datasource / connection pool.
Back to my Elasticsearch problem: I think the short term solution
would be that we actually have to check for the index state after
having it created, and keep checking in a loop until some short
timeout expires.
->
https://hibernate.atlassian.net/browse/HSEARCH-2146
But I don't think that's the right design to pursue in the longer
term; especially as we're not dealing with the fact that the index
might "downgrade" its state at any time after we started.
I think we need to check for "cluster health" headers regularly and
monitor for acceptance for each command we send, keeping track of its
health and probably keeping a persistent queue around for the
operations which couldn't be applied yet.
Our current design will throw a SearchException to the end user, which
I don't think is practical..
History might have shown that the current approach is fine with
Hibernate ORM, but:
- requirements and expectations evolve, people might soon expect more
- I suspect that with RDBMs's this is less of a need than with the
new crop of dynamic, self healing distributed systems we're dealing
with.
Not least, I noticed that the Elasticsearch native client actually
enters the cluster as a member of it. That's similar to an Infinispan
client using zero-weight vs a REST client.. Infinispan experts will
understand there are significant different capabilities. I don't mean
to remove the JEST client approach but we might want to study more of
the consequences of that, and help me define what we will consider an
acceptable tradeoff while still using the REST client approach.
Thanks for any thoughts,
Sanne