On Thu 2016-03-03 14:19, Sanne Grinovero wrote:
Back to my Elasticsearch problem: I think the short term solution
would be that we actually have to check for the index state after
having it created, and keep checking in a loop until some short
timeout expires.
->
https://hibernate.atlassian.net/browse/HSEARCH-2146
Sounds like a reasonable first approach.
But I don't think that's the right design to pursue in the longer
term; especially as we're not dealing with the fact that the index
might "downgrade" its state at any time after we started.
I think we need to check for "cluster health" headers regularly and
monitor for acceptance for each command we send, keeping track of its
health and probably keeping a persistent queue around for the
operations which couldn't be applied yet.
Our current design will throw a SearchException to the end user, which
I don't think is practical..
Doesn't it call the error report API?
The idea of a persistent queue opens up a lot of complexity so I'm not
sure that's where we want to go - besides the existing master/slave +
JMS and whatever alternative we plan on implementing down the road.
My point is to wonder what the user expectation is when the ES cluster
goes down:
1. have HSearch be magical and keep data up in the air until the cluster
comes back up - if it even does
2. have HSearch report on indexing errors so that one can take reindexing
actions
3. do like a manual user implemented integration and pretend distributed
systems always work
I think 2 is the practical approach.
History might have shown that the current approach is fine with
Hibernate ORM, but:
- requirements and expectations evolve, people might soon expect more
- I suspect that with RDBMs's this is less of a need than with the
new crop of dynamic, self healing distributed systems we're dealing
with.
On a philosophical note, can a client expect to tolerate schema
changing, temporarily unavailable server system at any time, for any
length of time? I think that's what you expect HSearch to do in a way.
My answer is no and the degraded mode requiring reindexing is an
acceptable trade-off.
But +1 to be more lenient at startup time and "wait a bit more than
expected".
Not least, I noticed that the Elasticsearch native client actually
enters the cluster as a member of it. That's similar to an Infinispan
client using zero-weight vs a REST client.. Infinispan experts will
understand there are significant different capabilities. I don't mean
to remove the JEST client approach but we might want to study more of
the consequences of that, and help me define what we will consider an
acceptable tradeoff while still using the REST client approach.
What do you have in mind? As a layman, I'd say using the native Java
client will be "better" performance wise and could be the preferred
approach.