[infinispan-dev] asynchronous change manager for high load Lucene

Tue Mar 30 09:09:59 EDT 2010

Sanne Grinovero wrote:
>>> You could use the cluster view, which is the same across all nodes, 
>>> and pick the first element in the list. Or you could run an 
>>> agreement protocol, which deterministically elects a master.
>> Looks simple, deterministic and elegant.
> Bela Ban wrote:
>
> Agreed, that looks like a clean solution, but would scale better if we 
> could decide a master for each different index, to spread the load 
> instead of choosing always the same node for all the work, ideas on that?
> I could hash the indexname (the identifier) and % cluster view list 
> size, WDYT ?

Absolutely, this makes a lot of sense. However, this will require moving 
indices around when the cluster view changes.

With a simple modulo based algorithm you'll have a lot of rebalancing 
going on, so maybe something like a consistent hash could determine the 
server or servers on which a given index is stored. You could even store 
an index on more than 1 nodes, similar to what Infinispan does with DIST 
and numOwners=2 (for example).

Even better: *if* an index can be recreated from the real data, you 
could use a consistent hash and only store an index *once*, and you 
could use a Cache instance for it !

> Emmanuel Bernard wrote:
>> So basically, following Bela's advice, you would:
>> - use the cluster view to elect the master
>> - send changes to the master using JGroups
>>
>> We still need one guaranty: make sure queued changes not applied on 
>> the master are going to be processed by the new master (that's why 
>> Sanne was thining of using Infinispan as a messaging system I guess)

This is something SEQUENCER guarantees: a sender sends the message to 
the sequencer (coordinator), but queues the message locally as well. 
Only when it receives its own message wil it be removed from the queue. 
When a coordinator crashes, all queues messages will get resent to the 
new coordinator.

Because this is done above NAKACK (multicast messages) or UNICAST 
(unicast messages), we're also guaranteed that no duplicates will be 
delivered and no message will get dropped.

> Yes that's what I thought, if I store changes-to-be-done in Infinispan
> I expect that to be reliable enough - as much as people wants to
> configure. You're right about networking inefficiency, ideally I
> should need something that was able to select the main storing node,
> and have secondary (third..) "buddies" elected by hash.
> Total ordering is not needed

Yes, in this case you actually implement global ordering by always 
sending everything for a given index to the same master node. This is fine.

> but we need to make sure that different
> works applying to the same entity instance are applied in the same
> order - and it's also fine if they happen to cancel out each other, so
> writing changes in a map I could atomically remove work which is done,
> or atomically replace outdated work which can be replaced to a new
> version when the writer is lagging behind the work production - this
> lag could accumulate the whole day, and still have updates to the
> index reflected as fast as possible (in current design terms) and
> possibly have guarantess that at night all work will be processed -
> and duplicates removed as a side effect of key collisions during the
> storing.
> When using a global queue I'll have to guarantee that all received
> work is processed in the same order as they where produced - order has
> to match the database transactions to which Hibernate Search was
> listening to - when using a map approach I'm confident that MVCC will
> help in "seeing" only the most recently put work, which always is the
> correct ordering.
> The short to identify JGroups ordering contexts is way too limited for
> this problem, the context should be defined as unique
> type+id+operation to solve this same problem, which is a nice key for
> my Map.
>
> I don't want to push for the "Cache bus" approach, but in other cases
> I would definitely need some help on the issues above, while the Map
> sounds easy to implement now that I also see an easy way to know if
> I'm the master or not.
> I got puzzled by this SEQUENCER and just read the docs, I'll ask some
> more details in an appropriate thread

Sure, or ping me on skype or IRC (#jgroups @ irc.freenode.net)

> Manik, you have an example of how I should send a message to a
> selected member by integrating in Infinispan? I think I can figure out
> how to send it with some code reading, but I'm mostly lost about how
> to receive it? Or maybe you suggest I should share the
> JGroupsTransport with totally infinispan-independent code?

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss