[hibernate-dev] [Search] Dynamic sharding configuration

Emmanuel Bernard emmanuel at hibernate.org
Tue Oct 8 09:59:10 EDT 2013


I don't have much stake in the specialized method vs context object
debate as indeed the interface is very specialized and prone to changes.
But as Sanne mentioned, there are memory pressure consequences if this
call is in the hot path.

It is correct that the current use of ForDeletion requires to use a non
remote non async backend at the moment. That's something I discussed
with Sanne back when I implemented it.
It's not hard to imagine how we could transport such information in a
later version but that would require additional contracts.

The use case I designed dynamic sharding for is to:

- create one index per user (think login)
- query only by a specific index
- apply mutation and deletion on a single index
- support 100s users (ie shards) per VM instance

I also had Bloom filters in mind when I designed the original sharding
strategy.

For these scenarios, a smart ForDeletion is necessary as you don't want to
open / query hundreds of indexes unnecessarily.

That Hardy thinks the use case is wrong is beyond me but if that's the
general feeling, that's fine, I'll fork Hibernate Search and make it
useful for me.
For the record, I handed over a working solution 6 months short of 4
days... I am sure it was not perfect, but certainly not 6 months
away from it. I know you guys wanted support to inject a Session to
resolve shards which has put some significant constraints on the life
cycle. But still.

Conclusion

Draw your own, I'm out of it.

Emmanuel

On Mon 2013-10-07 16:03, Sanne Grinovero wrote:
> On 2 October 2013 14:34, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> > On Tue 2013-09-24 14:30, Sanne Grinovero wrote:
> >> On 24 September 2013 14:12, Hardy Ferentschik <hardy at hibernate.org> wrote:
> >> > 2) remove 'String[] getShardIdentifiers(Class<?> entity, Serializable id, String idInString)' from ShardIdentifierProvider
> >>
> >> +1 we're automatically assuming a deletion needs to be routed to all
> >> identifiers.
> >
> > Bad idea as I explained in my previous email. Plus we could already make
> > use of that if we reuse Hibernate ORM's tenantid value.
> 
> I've tried hard to find an agreement on this, but it seems we're
> wasting time without making progress.
> I'm not happy in ignoring a strong recommendation from any of you,
> very hard choice :-(
> 
> Hardy are you going to reconnect later? Could you reply to this email
> of Emmanuel?
> 
> I'm inclined to add the method back, so that it's the users choice to
> pick his battle. As mentioned below, I don't think we should take
> options away from them.
> Of course our template implementation could provide a sensible default
> method, so all users looking for simplicity don't need to bother too
> much about the extra method.
> 
> @Emmanuel the last conversation we had on the subject is below:
> 
> --Sanne
> 
> [15:05] <sannegrinovero> hardy: on the dynamic shard id concerns rised
> by emmanuel, I thought you where going to propose a pair of method
> names that would suite you?
> [15:05] <sannegrinovero> I still think we need to restore the missing method.
> [15:05] <hardy> I don't think so
> [15:06] <sannegrinovero> emmanuel had quite a strong opinion about it,
> don't think it's fair to ignore that.
> [15:06] <hardy> I thought more about it and I think the use case is
> not even imlementable
> [15:06] <sannegrinovero> you have a point there.
> [15:06] <hardy> well, I think he is wrong
> [15:06] <hardy> take his use case
> [15:06] <hardy> he wants to use some sort of customer id or ORM shard identifier
> [15:07] <hardy> sounds great, but there is no such context to get it from
> [15:07] <hardy> so what can you do in this case?
> [15:07] <hardy> a ThreadLocal
> [15:07] <hardy> and that's exactly the problem
> [15:07] <hardy> if the shards id were determined at the document built
> time (as we want to do it ), it might be possible
> [15:07] <sebersole> amazing how often "context" is problematic :)
> [15:08] <hardy> he he
> [15:08] <sannegrinovero> no. as far as I understood,he was planning to
> get a reference to the Strategy, and then invoke setters on it to
> "program" the thing.
> [15:08] <hardy> but now the shard identifiers are "generated" when the
> changes are getting applied to the index
> [15:09] <hardy> that's happening on a different thread
> [15:09] <sannegrinovero> hardy: we're going in circles with this
> debate on abstract hypothesis. Emmanuel said he has a use case for it,
> and implemented it. that should be good enough for us?
> [15:09] <hardy> no
> [15:09] <hardy> I honestly would like to see the code first
> [15:09] <sannegrinovero> :-) let me try a proposal
> [15:10] <hardy> and how does it work in a clustered environment
> [15:10] <hardy> or JMS
> [15:10] <hardy> I seriously doubt we can implement this in a decent way atm
> [15:10] <sannegrinovero> WDYT of this plan: we re-introduce the
> method, and provide the abstract base class I've made; then the
> deletion method has a default implementation.
> [15:11] <hardy> one beauty of the new interface is, is that it is simpler
> [15:11] <hardy> and imo it removes something which was conceptually
> not working anyways
> [15:11] <sannegrinovero> then in future we can deprecate this method.
> [15:11] <hardy> I don't see a point of re-introducing it unless
> someone can actually provide a working example
> [15:12] <sannegrinovero> he made one already, he just couldn't show
> it, but described it.
> [15:12] <sannegrinovero> it's much easier to remove a method from an
> SPI interface.
> [15:12] <hardy> no
> [15:12] <hardy> so what was his example?
> [15:13] <sannegrinovero> I told you, he takes the reference to the
> sharding Strategy, and sets the context explicitly.
> [15:13] <hardy> again, how
> [15:14] <hardy> explain me how this is going to work on e.g. a JMS master
> [15:14] <sannegrinovero> hardy:
> org.hibernate.search.engine.spi.EntityIndexBinding.getShardIdentifierProvider()
> gives you access to it.
> [15:16] <sannegrinovero> hardy: consider that this feature is
> power-users only. Some will definitely want to control the deletion.
> Some will do it wrong, well that's not my problem.
> [15:17] <sannegrinovero> hardy: but it definitely is annoying to not
> provide *any* way for a user to hack his way into it.
> [15:17] <hardy> sannegrinovero: yes, there is
> EntityIndexBinding.getShardIdentifierProvider()
> [15:17] <hardy> but how would you use that on a JMS master
> [15:17] <hardy> you still need to know what to set
> [15:17] <sannegrinovero> hardy: I don't feel it's our right to
> consider our users dumb, some will definitely have smarter ideas.
> [15:17] <hardy> how would you determine the customer id on the JMS master
> [15:18] <hardy> I don't consider them as dumb at all
> [15:18] <hardy> but write me a test or example setup
> [15:18] <sannegrinovero> hardy: I can attach lots of custom attributes
> in a JMS message. I can do it from my custom backend, or even use
> routing options if I have something like Camel.
> [15:19] <sannegrinovero> hardy: you actually made an excellent use
> case with JMS :)
> [15:19] <sannegrinovero> hardy: but I'm not going to code a full JMS +
> Camel app to show you :-D
> [15:23] <hardy> sannegrinovero: and how do you create a custom message?
> [15:23] <hardy> how is this all wired up?
> [15:24] <sannegrinovero> hardy: that's system setup. For example, the
> shard id could be selected by the originating machine: the routing
> process of the JMS message could add this as context.
> [15:24] <hardy> but that's not how it works atm
> [15:24] <sannegrinovero> hardy: say I have an EAP6 instance per shard
> running as client, and have a single master shared across them.
> [15:25] <sannegrinovero> hardy: that works today. We're not
> controlling how the messages are sent around in a SOA environment.
> [15:25] <hardy> whatever
> [15:26] <sannegrinovero> WDYM ?
> [15:26] <sannegrinovero> it's a powerful use case, I'm not feeling
> comfortable in denying the option.
> [15:26] <hardy> apparently you want it back, but you also are not
> providing a working example
> [15:26] <hardy> it is contrieved
> [15:27] <hardy> and why would you return a set of shard ids
> [15:27] <hardy> as in the original proposal
> [15:27] <hardy> at least it should be a single id as well
> [15:30] <sannegrinovero> hardy: I never disagreed with you about the
> method name not being ideal. But this JMS integration point got me
> quite excited now on the routing options it provides.
> [15:30] <sannegrinovero> I mean, the API really feels lacking without
> the method for deletions.
> [15:30] <hardy> not at all
> [15:31] <hardy> but suite yourself. I got to go anyways
> [15:31] <-- hardy (~hardy at redhat/jboss/hardy) has left this server
> (Quit: bye bye).
> [15:37] <sannegrinovero> gmorling: I'm puzzled about the dyn shard
> SPI. Don't like to take an action with hardy being so fiercely against
> it. WDYT? You know if he'll be back soon?
> [15:38] <gmorling> sannegrinovero: no, unfortunately i don't know when
> he'll be back; would be nice though to come to a commonly agreed upon
> solution
> [15:41] <sannegrinovero> gmorling: right, which makes it even more
> important to make sure that we're not dropping existing use cases: the
> existing one is able to pick a single delete index.
> [15:42] <gmorling> yes, if SIP is intended as replacement it shouldn't
> offer less functionality (given the previously functionality was
> sound)


More information about the hibernate-dev mailing list