[hibernate-dev] [Search] Dynamic sharding configuration

Sanne Grinovero sanne at hibernate.org
Mon Oct 7 11:03:53 EDT 2013


On 2 October 2013 14:34, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> On Tue 2013-09-24 14:30, Sanne Grinovero wrote:
>> On 24 September 2013 14:12, Hardy Ferentschik <hardy at hibernate.org> wrote:
>> > 2) remove 'String[] getShardIdentifiers(Class<?> entity, Serializable id, String idInString)' from ShardIdentifierProvider
>>
>> +1 we're automatically assuming a deletion needs to be routed to all
>> identifiers.
>
> Bad idea as I explained in my previous email. Plus we could already make
> use of that if we reuse Hibernate ORM's tenantid value.

I've tried hard to find an agreement on this, but it seems we're
wasting time without making progress.
I'm not happy in ignoring a strong recommendation from any of you,
very hard choice :-(

Hardy are you going to reconnect later? Could you reply to this email
of Emmanuel?

I'm inclined to add the method back, so that it's the users choice to
pick his battle. As mentioned below, I don't think we should take
options away from them.
Of course our template implementation could provide a sensible default
method, so all users looking for simplicity don't need to bother too
much about the extra method.

@Emmanuel the last conversation we had on the subject is below:

--Sanne

[15:05] <sannegrinovero> hardy: on the dynamic shard id concerns rised
by emmanuel, I thought you where going to propose a pair of method
names that would suite you?
[15:05] <sannegrinovero> I still think we need to restore the missing method.
[15:05] <hardy> I don't think so
[15:06] <sannegrinovero> emmanuel had quite a strong opinion about it,
don't think it's fair to ignore that.
[15:06] <hardy> I thought more about it and I think the use case is
not even imlementable
[15:06] <sannegrinovero> you have a point there.
[15:06] <hardy> well, I think he is wrong
[15:06] <hardy> take his use case
[15:06] <hardy> he wants to use some sort of customer id or ORM shard identifier
[15:07] <hardy> sounds great, but there is no such context to get it from
[15:07] <hardy> so what can you do in this case?
[15:07] <hardy> a ThreadLocal
[15:07] <hardy> and that's exactly the problem
[15:07] <hardy> if the shards id were determined at the document built
time (as we want to do it ), it might be possible
[15:07] <sebersole> amazing how often "context" is problematic :)
[15:08] <hardy> he he
[15:08] <sannegrinovero> no. as far as I understood,he was planning to
get a reference to the Strategy, and then invoke setters on it to
"program" the thing.
[15:08] <hardy> but now the shard identifiers are "generated" when the
changes are getting applied to the index
[15:09] <hardy> that's happening on a different thread
[15:09] <sannegrinovero> hardy: we're going in circles with this
debate on abstract hypothesis. Emmanuel said he has a use case for it,
and implemented it. that should be good enough for us?
[15:09] <hardy> no
[15:09] <hardy> I honestly would like to see the code first
[15:09] <sannegrinovero> :-) let me try a proposal
[15:10] <hardy> and how does it work in a clustered environment
[15:10] <hardy> or JMS
[15:10] <hardy> I seriously doubt we can implement this in a decent way atm
[15:10] <sannegrinovero> WDYT of this plan: we re-introduce the
method, and provide the abstract base class I've made; then the
deletion method has a default implementation.
[15:11] <hardy> one beauty of the new interface is, is that it is simpler
[15:11] <hardy> and imo it removes something which was conceptually
not working anyways
[15:11] <sannegrinovero> then in future we can deprecate this method.
[15:11] <hardy> I don't see a point of re-introducing it unless
someone can actually provide a working example
[15:12] <sannegrinovero> he made one already, he just couldn't show
it, but described it.
[15:12] <sannegrinovero> it's much easier to remove a method from an
SPI interface.
[15:12] <hardy> no
[15:12] <hardy> so what was his example?
[15:13] <sannegrinovero> I told you, he takes the reference to the
sharding Strategy, and sets the context explicitly.
[15:13] <hardy> again, how
[15:14] <hardy> explain me how this is going to work on e.g. a JMS master
[15:14] <sannegrinovero> hardy:
org.hibernate.search.engine.spi.EntityIndexBinding.getShardIdentifierProvider()
gives you access to it.
[15:16] <sannegrinovero> hardy: consider that this feature is
power-users only. Some will definitely want to control the deletion.
Some will do it wrong, well that's not my problem.
[15:17] <sannegrinovero> hardy: but it definitely is annoying to not
provide *any* way for a user to hack his way into it.
[15:17] <hardy> sannegrinovero: yes, there is
EntityIndexBinding.getShardIdentifierProvider()
[15:17] <hardy> but how would you use that on a JMS master
[15:17] <hardy> you still need to know what to set
[15:17] <sannegrinovero> hardy: I don't feel it's our right to
consider our users dumb, some will definitely have smarter ideas.
[15:17] <hardy> how would you determine the customer id on the JMS master
[15:18] <hardy> I don't consider them as dumb at all
[15:18] <hardy> but write me a test or example setup
[15:18] <sannegrinovero> hardy: I can attach lots of custom attributes
in a JMS message. I can do it from my custom backend, or even use
routing options if I have something like Camel.
[15:19] <sannegrinovero> hardy: you actually made an excellent use
case with JMS :)
[15:19] <sannegrinovero> hardy: but I'm not going to code a full JMS +
Camel app to show you :-D
[15:23] <hardy> sannegrinovero: and how do you create a custom message?
[15:23] <hardy> how is this all wired up?
[15:24] <sannegrinovero> hardy: that's system setup. For example, the
shard id could be selected by the originating machine: the routing
process of the JMS message could add this as context.
[15:24] <hardy> but that's not how it works atm
[15:24] <sannegrinovero> hardy: say I have an EAP6 instance per shard
running as client, and have a single master shared across them.
[15:25] <sannegrinovero> hardy: that works today. We're not
controlling how the messages are sent around in a SOA environment.
[15:25] <hardy> whatever
[15:26] <sannegrinovero> WDYM ?
[15:26] <sannegrinovero> it's a powerful use case, I'm not feeling
comfortable in denying the option.
[15:26] <hardy> apparently you want it back, but you also are not
providing a working example
[15:26] <hardy> it is contrieved
[15:27] <hardy> and why would you return a set of shard ids
[15:27] <hardy> as in the original proposal
[15:27] <hardy> at least it should be a single id as well
[15:30] <sannegrinovero> hardy: I never disagreed with you about the
method name not being ideal. But this JMS integration point got me
quite excited now on the routing options it provides.
[15:30] <sannegrinovero> I mean, the API really feels lacking without
the method for deletions.
[15:30] <hardy> not at all
[15:31] <hardy> but suite yourself. I got to go anyways
[15:31] <-- hardy (~hardy at redhat/jboss/hardy) has left this server
(Quit: bye bye).
[15:37] <sannegrinovero> gmorling: I'm puzzled about the dyn shard
SPI. Don't like to take an action with hardy being so fiercely against
it. WDYT? You know if he'll be back soon?
[15:38] <gmorling> sannegrinovero: no, unfortunately i don't know when
he'll be back; would be nice though to come to a commonly agreed upon
solution
[15:41] <sannegrinovero> gmorling: right, which makes it even more
important to make sure that we're not dropping existing use cases: the
existing one is able to pick a single delete index.
[15:42] <gmorling> yes, if SIP is intended as replacement it shouldn't
offer less functionality (given the previously functionality was
sound)


More information about the hibernate-dev mailing list