Hi,
I think waiting until the index has become available after creation is
fine for the time being. I'd wait and see how practical experiences
look like, i.e. how long does it take in practice to create indexes
with a realistic number of shards and replicas.
Also we discussed creating a separate tool akin to "schema creator",
which users can run whenever they like. So I think we have some
options here.
Regarding coming and going of the cluster at runtime, I am not so
concerned. Again I'd wait and see how realistic that a problem is. I
doubt it's a huge problem in practice, otherwise it'd render the
(synchronous) API useless to begin with. Sure, one can argue
synchronous interfaces are not appropriate for any system-to-system
communication. Actually I am arguing like that for a long time ;)
But that's exactly why I think it's great to combine the ES backend
with the JMS worker and a persistent queue: Messages can be
reprocessed whenever the cluster is back. That's a tool we already
provide and users can make use of it if they like. Of course we can
provide workers based on different techs (Kafka, AMQP, you name it),
but I don't think another general mechanism is needed. That's one
great advantage of using our integration IMO.
Regarding REST vs. native, I'd also wait for actual experiences.
Native is not an option for non-Java apps, so I have a hard time
believing REST will be prohibitively slow. If we do proper bulking it
might be good enough.
--Gunnar
2016-03-03 15:19 GMT+01:00 Sanne Grinovero <sanne(a)hibernate.org>:
My question here was triggered by a specific case in Hibernate
Search
but it applies well to ORM's datasources, caches, and very much to OGM
as well.
When creating an index on Elasticsearch, the index is not
"instantaneously" ready.
The REST request creates the definition, but the response will only
tell if the request to create it was accepted. Elasticsearch will then
start the creation process and gradually upgrades the index status to
"yellow" and finally "green" depending on its ability to quickly
propagate the needed changes across the clusters; it might even fail
and get to a status "Red".
Our current approach is:
- send a request to define the index
- start using it
Which is probably following our traditional pattern with static
systems, but becomes a naive approach in this modern world of dynamic
services.
Our approach works *almost* fine in the singleton nodes which we're
using for testing but it's not suited for a real cluster.
Even in our integration tests, it might happen that we didn't give it
enough time to boot.
Someone else asked me recently how to make Hibernate ORM not fail to
boot when he's starting VMs containing the database and the
application in parallel or in no specific order: sometimes it would
happen that ORM would attempt to connect before the RDBMs would have
started; all he needed was to have ORM stall and wait some seconds.
Often even starting the RDBMs VM first isn't good enough, as the VM
reports "started" but the DB might be needing to finish some
maintenance tasks.. Kubernetes provides hooks to check for actual
services to being ready but people seem to expect that Hibernate could
deal with some basics too.
In that case AFAIR my suggestion was that this could be solved as an
implementation detail of the datasource / connection pool.
Back to my Elasticsearch problem: I think the short term solution
would be that we actually have to check for the index state after
having it created, and keep checking in a loop until some short
timeout expires.
->
https://hibernate.atlassian.net/browse/HSEARCH-2146
But I don't think that's the right design to pursue in the longer
term; especially as we're not dealing with the fact that the index
might "downgrade" its state at any time after we started.
I think we need to check for "cluster health" headers regularly and
monitor for acceptance for each command we send, keeping track of its
health and probably keeping a persistent queue around for the
operations which couldn't be applied yet.
Our current design will throw a SearchException to the end user, which
I don't think is practical..
History might have shown that the current approach is fine with
Hibernate ORM, but:
- requirements and expectations evolve, people might soon expect more
- I suspect that with RDBMs's this is less of a need than with the
new crop of dynamic, self healing distributed systems we're dealing
with.
Not least, I noticed that the Elasticsearch native client actually
enters the cluster as a member of it. That's similar to an Infinispan
client using zero-weight vs a REST client.. Infinispan experts will
understand there are significant different capabilities. I don't mean
to remove the JEST client approach but we might want to study more of
the consequences of that, and help me define what we will consider an
acceptable tradeoff while still using the REST client approach.
Thanks for any thoughts,
Sanne
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev