[hibernate-dev] Re: Pushing indexes through JGroups

Łukasz Moreń lukasz.moren at gmail.com
Sun Jun 14 15:08:16 EDT 2009


I will follow your comments and advices and fix it.

+ JGroupsBackendQueueProcessorFactory:
Yes, I think there should be default cluster name.

+ JGroupsAbstractMessageReceiver
getState / setState. These are JGroups methods to get/set current cluster
state. I think their implementation is not necessary in our case.

+ Tests
I used plain SQL in test for master node, to check if master can correctly
receive Lucene works (or i.e. they are not corrupted ) and do indexing. I
didn't want to trigger indexing with hibernate, just insert data. Lucene
document for inserted data is created separately and sent to master which
updates index.

+ Configuration for JGroups
Yes that's right, possibility to JGroups customization should be added.
Default JG configuration works in most cases but not in all, like Sanne has
reported. I noticed that Infinispan has it done. Xml file, properties file
and string with properties as possibilities. So I suppose it is good idea.



2009/6/14 Emmanuel Bernard <emmanuel at hibernate.org>

> ah right.
> Contrary to JMS where the MDB is not bootstrapped by HSearch, we can do
> that with JGroups.
>
>
> On  Jun 14, 2009, at 11:48, Sanne Grinovero wrote:
>
>  About 5#, I think you could avoid the need for an hibernate Session,
>> you could forward the work list you receive from the network directly
>> to the Lucene backend.
>> This means you only need a reference to the SearchFactory; you get a
>> reference during initialize() of the backend.
>>
>>
>> 2009/6/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>
>>> Hi,
>>> #2. I am using Intellij 8.1. I have downloaded from wiki xml file
>>> with codestyle for intellij, but it's still
>>> a little bit different from currently existing in HS  (i.e. imports order
>>> or
>>> line length).
>>> #4. The "listenerClassName" from JGroupsBackendQueueProcessorFactory
>>> is mandatory but only in case it is master node, then it specifies class
>>> responsible for receiving messages from
>>> slave's. I assumed if that option exist it is master node, otherwise
>>> slave.
>>> Falling back to a Lucene implementation
>>> was intended to avoid problem from JMS backend; setting up master node to
>>> act also like slave node led to message
>>> duplication - MDB received message, recreate it and put it back to the
>>> queue. JGroupsBackendQueueProcessorFactory
>>> for master node create also LuceneFactory, and received
>>> from slaves Lucene works are processed by LuceneProcessor -
>>> message duplication is avoided. It is the only one reason why I used
>>> that.
>>> User is supposed to create readers and receivers with this same
>>> - JGroupsBackendQueueProcessorFactory.
>>> //receiver sample configuration
>>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>> value="hs_jg_channel"/>
>>> <property name="hibernate.search.worker.jgroups.listener_class"
>>> value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>
>>>
>>> //producers sample configuraion
>>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>> value="hs_jg_channel"/>
>>> However, in receiver's case JGroups Factory create Lucene Processors in
>>> the
>>> context of the problem described above.
>>> JGroups Factory for master node just initialize communication channel,
>>> but
>>> later all work is done by Lucene Backend, so this same like in JMS.
>>> That solution is not fully consistent with JMS backend one, it was forced
>>> by
>>> I think different nature of JMS and
>>> JGroups.
>>> In JMS receiver configuration was done in app server config file,
>>> in JGroups I placed it in hibernate configuration.
>>> What do you think about such design?
>>>
>>> #5.That is question, I was also thinking about, to make clustering more
>>> transparent.
>>> Hovever I haven't found good idea for that. The purpose of
>>> extending JGroupsAbstractMessageReceiver's by user was to
>>> implement getSession method, where session used by backend could be
>>> created.
>>> I suppose that without this method I would not know if
>>> Hibernate sessionFactory should come from i.e. persistence context,
>>> looked
>>> up from JNDI or some helper class.
>>> #6. Problem with JGroups exists if it tries to multicast traffic towards
>>> the ISP, not internal network, then I think packages are dropped. I will
>>> look through that in the weekend.
>>>
>>> Lukasz
>>>
>>> 2009/6/10 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>
>>>>
>>>> Hi Lukasz,
>>>> I've been looking into your code; I have some comments but please
>>>> forgive me as I don't have any real experience about JGroups, so I'll
>>>> only tell you how much I see this code fit into Hibernate Search.
>>>>
>>>> 1) The maven dependency upon JGroups should probably be of type
>>>> "optional", so please make sure search is also going to work fine for
>>>> people which are not really interested in this work and having the
>>>> jgroups.jar around, even if others will love it.
>>>>
>>>> 2) Look out for style, especially white spacing; for example in
>>>> BatchedQueueingProcessor look at the formatting difference of line 91
>>>> compared to 82,85 or 88.
>>>> Which IDE are you using? we have some template settings ready to help
>>>> with this, you can find them on the hibernate.org wiki's.
>>>>
>>>> 3) JGroupsBackendQueueProcessor is a Runnable setting two arguments in
>>>> the constructor, they should be final to make sure they are set before
>>>> they are run in another thread;
>>>> just add "final" modifier to lines 19 and 20.
>>>>
>>>> 4)JGroupsBackendQueueProcessorFactory's design:
>>>> It looks like a "listenerClassName" is mandatory and not providing a
>>>> default implementation; it actually falls back to a Lucene
>>>> implementation when this option is missing. This looks like IMHO
>>>> adding some extra complexity into the class which you don't really
>>>> need. Is there a good reason for that? Someone could forget some
>>>> option in the configuration, it would be better to throw an exception
>>>> to notify the user about the configuration inconsistency than to do
>>>> something differently, or rely on a good default.
>>>> Maybe I'm wrong, but then some comments could help me out. Is the user
>>>> supposed to configure both message producers and receivers with the
>>>> same kind of BackendQeueProcessorFactory? That's probably not needed,
>>>> and not consistent with the way the JMS backend is configured.
>>>>
>>>> 5)JGroupsAbstractMessageReceiver's design:
>>>> This is very similar to the JMS abstract receiver, but in case of JMS
>>>> I'd expect to have to "annotate" something in my ejb classes, so that
>>>> it gets deployed by the container and associated to the queue, so in
>>>> case of JMS it's mandatory for the user to write some class.
>>>> Your solution is fine, but wouldn't it be possible to have a "no-code"
>>>> solution? The user could just configure this deployment to say
>>>> something like "this is the jgroups configuration, this is the
>>>> hibernate configuration, you know where to find the entity classes...
>>>> please listen to the channels and do your job".
>>>> It would be very cool to have just to package the search jar with some
>>>> configuration lines (and the entities of course to read some more
>>>> Search configuration) and be ready to start listening for messages.
>>>> Actually some future version could avoid the entities and receive the
>>>> serialized configuration.. just a thought, but that would enable us to
>>>> prepackage a whole server ready as a Search backend without even
>>>> needing to deploy any user code.
>>>>
>>>> 6) testing... I couldn't start them as JGroups was failing to bind to
>>>> ports on my machine, I'm sure I am doing something wrong, will try
>>>> again after reading some docs about it.
>>>> But anyway I got a bit confused about the notion of "Master"s and
>>>> "Receivers"; I'm used in the JMS to see the master as the one taking
>>>> care of the index, so receiving the docs not sending them.
>>>>
>>>> Generally speaking, add some comments and debug log statements (using
>>>> the {} instead of string + concatenation);
>>>> I'll try this weekend to try it on remote staging servers, it looks
>>>> promising!
>>>>
>>>> Sanne
>>>>
>>>> 2009/6/10 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've finished task concerning JMS replacement with JGroups. The patch
>>>>> is
>>>>> attached. The general idea of pushing indexes through JG is assured,
>>>>> however
>>>>> there are issues to improve (i.e. flexible JG protocol stack
>>>>> configuration).
>>>>> Any review or advices would be welcome to make sure that I am not going
>>>>> into
>>>>> blind alley.
>>>>>
>>>>> Thanks,
>>>>> Lukasz
>>>>>
>>>>> 2009/5/27 Emmanuel Bernard <emmanuel at hibernate.org>
>>>>>
>>>>>>
>>>>>> Lukasz,
>>>>>> I have been discussing with Manik on #3 and we think that JBoss Cache
>>>>>> /
>>>>>> Infinispan are probably a better fit than plain JGroups for that as
>>>>>> all
>>>>>> the
>>>>>> plumbing will be configured for you.
>>>>>> When you reach this problem, let's revive this discussion.
>>>>>>
>>>>>> On  May 25, 2009, at 11:07, Hardy Ferentschik wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> I talked with Łukasz about this last wekk. Definitely, #1 and #3.
>>>>>>> #2 I don't like either.
>>>>>>>
>>>>>>> The befefit of #3 would also be that one could drop the requirement
>>>>>>> of
>>>>>>> having a shared file system (NFS, NAS, ...) #3 should be quite easy
>>>>>>> to
>>>>>>> implement. Maybe easy to get started with.
>>>>>>>
>>>>>>> --Hardy
>>>>>>>
>>>>>>> On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
>>>>>>> <emmanuel at hibernate.org> wrote:
>>>>>>>
>>>>>>>  Hello
>>>>>>>> I am not sure this is where we should go, or at least, it depends.
>>>>>>>> here
>>>>>>>> are three scenarii
>>>>>>>>
>>>>>>>>
>>>>>>>> #1 JMS replacement
>>>>>>>> If you want to use JGroups as a replacement for the JMS backend,
>>>>>>>> then
>>>>>>>> I
>>>>>>>> think you should write a jgroups backend. Check
>>>>>>>> org.hibernate.search.backend.impl.jms
>>>>>>>> In this case all changes are sent via JGroups to a "master". The
>>>>>>>> master
>>>>>>>> could be voted by the cluster possibly dynamically but that's not
>>>>>>>> necessary
>>>>>>>> for the first version.
>>>>>>>>
>>>>>>>> #2 apply indexing on all nodes
>>>>>>>> JGroups could send the work queue to all nodes and each node could
>>>>>>>> apply
>>>>>>>> the change.
>>>>>>>> for various reasons I am not fan of this solution as it creates
>>>>>>>> overhead
>>>>>>>> in CPU / memory usage and does nto scale very well from a
>>>>>>>> theoretical
>>>>>>>> PoV.
>>>>>>>>
>>>>>>>> #3 Index copy
>>>>>>>> this is what you are describing, copying the index using JGroups
>>>>>>>> instead
>>>>>>>> of my file system approach. This might have merits esp as we could
>>>>>>>> diminish
>>>>>>>> network traffic using multicast but it also require to rethink the
>>>>>>>> master /
>>>>>>>> slave modus operandi.
>>>>>>>> Today the master copy on a regular basis a clean index to a shared
>>>>>>>> directory
>>>>>>>> On a regular basis, the slave go and copy the clean index from the
>>>>>>>> shared directory.
>>>>>>>> In your approach, the master would send changes to the slaves and
>>>>>>>> slaves
>>>>>>>> would have to apply them "right away" (on their passive version)
>>>>>>>>
>>>>>>>> I think #1 is more interesting than #3, we probably should start
>>>>>>>> with
>>>>>>>> that. #3 might be interesting too, thoughts?
>>>>>>>>
>>>>>>>> Emmanuel
>>>>>>>>
>>>>>>>> PS: refactoring is a fact of life, so feel free to do so. Just don't
>>>>>>>> break public contracts.
>>>>>>>>
>>>>>>>> On  May 21, 2009, at 22:14, Łukasz Moreń wrote:
>>>>>>>>
>>>>>>>>  Hi,
>>>>>>>>>
>>>>>>>>> I have few questions that concern using JGroups to copy index
>>>>>>>>> files.
>>>>>>>>> I
>>>>>>>>> think to create sender(for master) and receiver(slave) directory
>>>>>>>>> providers.
>>>>>>>>> Sender class mainly based on existing FSMasterDirectoryProvider,
>>>>>>>>> first
>>>>>>>>> create local index copy and send later to slave nodes
>>>>>>>>> (or send without copying, but that may cause lower performance?).
>>>>>>>>> To avoid code redundancy it would be good to refactor a little
>>>>>>>>> FSMasterDirectoryProvider class, so then I can use copying
>>>>>>>>> functionality in
>>>>>>>>> new DirectoryProvider and add sending one; or rather I should work
>>>>>>>>> around
>>>>>>>>> it?
>>>>>>>>>
>>>>>>>>> I do not understand completely how does the multithreading access
>>>>>>>>> to
>>>>>>>>> index file work. Does FileChannel class assure that, when index is
>>>>>>>>> copied
>>>>>>>>> and new Lucene works are pushed?
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> hibernate-dev mailing list
>>>>> hibernate-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>
>>>>>
>>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090614/2318f165/attachment.html 


More information about the hibernate-dev mailing list