Sanne Grinovero wrote:
>> You could use the cluster view, which is the same across all
nodes,
>> and pick the first element in the list. Or you could run an
>> agreement protocol, which deterministically elects a master.
> Looks simple, deterministic and elegant.
Bela Ban wrote:
Agreed, that looks like a clean solution, but would scale better if we
could decide a master for each different index, to spread the load
instead of choosing always the same node for all the work, ideas on that?
I could hash the indexname (the identifier) and % cluster view list
size, WDYT ?
Absolutely, this makes a lot of sense. However, this will require moving
indices around when the cluster view changes.
With a simple modulo based algorithm you'll have a lot of rebalancing
going on, so maybe something like a consistent hash could determine the
server or servers on which a given index is stored. You could even store
an index on more than 1 nodes, similar to what Infinispan does with DIST
and numOwners=2 (for example).
Even better: *if* an index can be recreated from the real data, you
could use a consistent hash and only store an index *once*, and you
could use a Cache instance for it !
Emmanuel Bernard wrote:
> So basically, following Bela's advice, you would:
> - use the cluster view to elect the master
> - send changes to the master using JGroups
>
> We still need one guaranty: make sure queued changes not applied on
> the master are going to be processed by the new master (that's why
> Sanne was thining of using Infinispan as a messaging system I guess)
This is something SEQUENCER guarantees: a sender sends the message to
the sequencer (coordinator), but queues the message locally as well.
Only when it receives its own message wil it be removed from the queue.
When a coordinator crashes, all queues messages will get resent to the
new coordinator.
Because this is done above NAKACK (multicast messages) or UNICAST
(unicast messages), we're also guaranteed that no duplicates will be
delivered and no message will get dropped.
Yes that's what I thought, if I store changes-to-be-done in
Infinispan
I expect that to be reliable enough - as much as people wants to
configure. You're right about networking inefficiency, ideally I
should need something that was able to select the main storing node,
and have secondary (third..) "buddies" elected by hash.
Total ordering is not needed
Yes, in this case you actually implement global ordering by always
sending everything for a given index to the same master node. This is fine.
but we need to make sure that different
works applying to the same entity instance are applied in the same
order - and it's also fine if they happen to cancel out each other, so
writing changes in a map I could atomically remove work which is done,
or atomically replace outdated work which can be replaced to a new
version when the writer is lagging behind the work production - this
lag could accumulate the whole day, and still have updates to the
index reflected as fast as possible (in current design terms) and
possibly have guarantess that at night all work will be processed -
and duplicates removed as a side effect of key collisions during the
storing.
When using a global queue I'll have to guarantee that all received
work is processed in the same order as they where produced - order has
to match the database transactions to which Hibernate Search was
listening to - when using a map approach I'm confident that MVCC will
help in "seeing" only the most recently put work, which always is the
correct ordering.
The short to identify JGroups ordering contexts is way too limited for
this problem, the context should be defined as unique
type+id+operation to solve this same problem, which is a nice key for
my Map.
I don't want to push for the "Cache bus" approach, but in other cases
I would definitely need some help on the issues above, while the Map
sounds easy to implement now that I also see an easy way to know if
I'm the master or not.
I got puzzled by this SEQUENCER and just read the docs, I'll ask some
more details in an appropriate thread
Sure, or ping me on skype or IRC (#jgroups @
irc.freenode.net)
Manik, you have an example of how I should send a message to a
selected member by integrating in Infinispan? I think I can figure out
how to send it with some code reading, but I'm mostly lost about how
to receive it? Or maybe you suggest I should share the
JGroupsTransport with totally infinispan-independent code?
--
Bela Ban
Lead JGroups / Clustering Team
JBoss