| 1. We use "withTokenizer"/"withCharFilter"/etc. for Elasticsearch, but "tokenizer"/"charFilter"/etc. for Lucene. 2. We allow chaining multiple analyzer/normalizer definitions in the same statement for Lucene (context.analyzer("foo").tokenizer(...).analyzer("bar").tokenizer(...);), but not for Elasticsearch. I think it should be disallowed in both cases. 3. Tokenizers/char filters/token filters are defined as part of the analyzer/normalizer definition for Lucene, but separately for Elasticsearch. I don't think we want to force Lucene users to name their tokenizers/char filters/token filters, though, and we need to do it for Elasticsearch. So we probably shouldn't try to harmonize this. |