Message Title

About the analyzer not being extracted from elasticsearch.yml anymore, the main issue is that now, analyzer definitions won't be added to new indexes automatically, and users will have to do something even when using the CREATE/MERGE index management strategies. So we handle the schema automatically, but only half of it. And actually, I wonder if schema creation will even work when there are references to yet-undefined analyzers.

I was thinking we could provide a way for users to specify a JSON file containing analyzer definitions in the classpath, For instance something like that:

                                                                hibernate.properties
                                                            
                                                                hibernate.search.default.elasticsearch.analyzer_definitions=myanalyzers.json

                                                                myanalyzers.json
                                                            
 
                                                                        "my_custom_analyzer": {
          "type":      "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
 
                                                            

We would send those definitions whenever we create indices or update mappings, without processing whatever is inside (apart from a simple Gson parsing, maybe). This would make life easier for users, while still being relatively easy to implement. We could improve on that later, for instance creating the JSON from annotations instead of using a user-provided file.

By the way, we could even extend that to include all settings, not just analysis settings. But this would require additional checks to prevent users from tweaking what we don't want them to (I guess there are some, especially those about sharding and replicas that are already handled elsewhere in HSearch).

I doubt it would be relevant to ask the ES team to change the scope of analyzer definitions. I guess they did that for a reason, and even if they didn't and fix it in 5.1, we'll have to support ES 5.0 and analyzer definitions through APIs at some point.

Add Comment

This message was sent by Atlassian JIRA