[JIRA] (HSEARCH-4616) Rename automatic indexing to something more explicit
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *updated* an issue
Hibernate Search ( https://hibernate.atlassian.net/browse/HSEARCH?atlOrigin=eyJpIjoiOWYyOThj... ) / Improvement ( https://hibernate.atlassian.net/browse/HSEARCH-4616?atlOrigin=eyJpIjoiOWY... ) HSEARCH-4616 ( https://hibernate.atlassian.net/browse/HSEARCH-4616?atlOrigin=eyJpIjoiOWY... ) Rename automatic indexing to something more explicit ( https://hibernate.atlassian.net/browse/HSEARCH-4616?atlOrigin=eyJpIjoiOWY... )
Change By: Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... )
“Automatic indexing” is a problematic name because:
# It’s too generic. Mass indexing can also be called “automatic”, since it automatically load entities from the
database.
# In a way, it’s too specific. For example we document the synchronization strategy, or the coordination, as affecting automatic indexing, but they also affect manual indexing through an indexing plan, which is not technically speaking “automatic”.
So, we need to come up with two names:
# One for indexing that relies on an indexing plan, as opposed to indexing that relies on an Indexer/MassIndexer.
# One for indexing that happens as a consequence of Hibernate ORM entities being modified, as opposed to indexing requested explicitly by the user (explicit call to an indexing plan, mass indexer, JSR-352 indexing job, …). Note this indexing is specific to the ORM mapper: the Standalone POJO mapper doesn’t have that.
When we have those two names, we will have to use them in documentation (reference, README, …) and configuration properties (taking care to keep the old ones and deprecate them).
Note we may also need to restructure the documentation, in particular to clarify that manual indexing using indexing plans does follow similar rules as “automatic” indexing using indexing plans (synchronization, coordination).
( https://hibernate.atlassian.net/browse/HSEARCH-4616#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-4616#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100205- sha1:4a5192e )
2 years, 4 months
[JIRA] (HSEARCH-4678) select nextval ('hsearch_outbox_event_generator ') performs very slowly on CockroachDB,why not use UUID
by Yoann Rodière (JIRA)
Yoann Rodière ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%... ) *commented* on HSEARCH-4678 ( https://hibernate.atlassian.net/browse/HSEARCH-4678?atlOrigin=eyJpIjoiOGR... )
Re: select nextval ('hsearch_outbox_event_generator ') performs very slowly on CockroachDB,why not use UUID ( https://hibernate.atlassian.net/browse/HSEARCH-4678?atlOrigin=eyJpIjoiOGR... )
UUIDs support varies wildly between databases; in Hibernate ORM 5, last time I checked, it had serious problems on H2 in particular (and most likely other DBs as well). Sequence support being very slow, on the other hand, is specific to CockroachDB (because it’s distributed). Hence the initial choice.
That being said, UUID would indeed be interesting if it doesn’t require a database roundtrip.
There are some constraints:
* There needs to be at least decent support for all tested databases: H2, Postgresql, Oracle, MariaDB, MySQL, DB2, MS SQL, CockroachDB. Our CI can check that.
* We need to offer a migration strategy from the old schema to the new one (probably SQL scripts, one for each tested database): even though this feature is incubating, I’d rather be nice with people who already started using it. I’d add a dedicated test for that.
* It would be better for UUIDs to be time-based, i.e. at least somewhat ordered according to the time of the event. While not strictly necessary, this will help make sure that no event is left unprocessed for a very long time when there’s a lot of events being produced.
This means in particular using CustomVersionOneStrategy , at least by default (see https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibern... ( https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibern... ) ). I believe making the strategy configurable would be a good idea, since I’m not sure CustomVersionOneStrategy will always correctly detect the IP address of the application (it may pick 127.0.0.1 ).
* The implementation of UUID generation needs to really be conflict-free. We expect multiple application instances to generate events, many of them, all the time, and we really don’t want conflicts that would trigger a transaction rollback, especially for DBs other than CockroachDB where such rollbacks are not “business as usual”.
On that front, time-based UUIDs seem like a good option too, since they rule out conflicts between multiple instances (since they theoretically include a unique identifier for the application instance), as well as within a single application instance (since they include a counter whose value is unique in a given JVM, at least as long as there’s fewer than 65000 events in a single millisecond).
I would gladly accept a pull request that complies with all the constraints above. herzhang ( https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=62cbeb9... ) do you want to give it a shot?
( https://hibernate.atlassian.net/browse/HSEARCH-4678#add-comment?atlOrigin... ) Add Comment ( https://hibernate.atlassian.net/browse/HSEARCH-4678#add-comment?atlOrigin... )
Get Jira notifications on your phone! Download the Jira Cloud app for Android ( https://play.google.com/store/apps/details?id=com.atlassian.android.jira.... ) or iOS ( https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=EmailN... ) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100205- sha1:4a5192e )
2 years, 4 months