Sanne Grinovero Sorry, the link to the Elasticsearch issue was wrong; the correct link is https://github.com/elastic/elasticsearch/issues/25951 > If we're talking about the timeout option which the end user can set in its own application, then I'm not sure I'd classify this as a bug? In the "standard" case it's indeed what we want, but not when mass indexing. When we're mass indexing, each work will take quite some time before it can even be sent to the Elasticsearch server because we pile up requests in a queue. This delay before we send the request is a delay we expect, a delay we can't estimate, and more importantly a delay that doesn't affect the total mass indexing execution time. So taking this delay into consideration in the timeout is not a good idea. > Some systems set a timeout because they have to provide a reply within some t. After t they have to return with a failure and it's not really relevant what happened: the point is that it could not be performed in that time limit. Well that's the point in general, but not always. Timeouts can be used for two reasons:
- because after the timeout, it's simply too late, the client will abort anyway and executing the work won't achieve anything.
- OR because after the timeout, we can reasonably assume that something went wrong and the work will never terminate.
When mass indexing, we won't abort just because there's a 10 second delay before a given work finishes, especially not if this delay was caused by the work waiting in a client-side queue. Note that I'm not saying we should change the way timeouts work for every situation, just that they should be made more flexible. > Often in such cases it's also useful to cancel any further pending work to save resources from being utilized for a task which is no longer being requested. Yeah, I tried to implement it in
HSEARCH-2764 Pull Request Sent , but unfortunately the Elasticsearch Rest client doesn't provide APIs to cancel requests. I suppose I'll have to send them a PR someday. |