Message Title

Sanne Grinovero updated an issue

HSEARCH-2028

Change By:	Sanne Grinovero

In the ref guide (and also the web-site which has copied this bit) it says:

{quote}
The standard tokenizer splits words at punctuation characters and hyphens while keeping email addresses and internet hostnames intact.
{quote}

That used to be the case traditionally, but the behavior has changed on the Lucene side and e-mail addresses will be tokenized actually. In the SO answer I recommended to use { { ClassicTokenizer} } (which now has the traditional behavior), we either should recommend that or show a custom tokenizer with the required behavior.

Add Comment

This message was sent by Atlassian JIRA