I experimented with your example project - https://github.com/hferentschik/hibernate-search-tika/tree/tika-blob-based - and switched to a stream based approach across. This way you don't have to materialize the byte arrays. The Book entity uses java.sql.Blob now I am using the LobHelper to create the Blob (I have to revert to Hibernate specific APIs though).

Another side effect of using Blob_s is that I atm cannot use the mass indexer, but have to use either automatic indexing or the indexing API of _FullTextSession.

I tested this approach also against PostgreSQL and MySQL and in both cases the tests run much faster (6 to 8 seconds for me).

What do you think about this approach?

Another idea regarding Tika integration - we could add a TikaBridge to the Search code base. When used it would dynamically try to discover/load the Tika classes (eg it could look for AutoDetectParser). The bridge could handle multiple types (Blob, byte[], and whatever else we could come up with). WDYT? Is this a good approach to integrate Tika into Search? Any better ideas or suggestions?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira