In the ref guide (and also the web-site which has copied this bit) it says:
{quote} The standard tokenizer splits words at punctuation characters and hyphens while keeping email addresses and internet hostnames intact. {quote}
That used to be the case traditionally, but the behavior has changed on the Lucene side and e-mail addresses will be tokenized actually. In the SO answer I recommended to use { { ClassicTokenizer} } (which now has the traditional behavior), we either should recommend that or show a custom tokenizer with the required behavior.
|