[hibernate-dev] Re: Pushing indexes through JGroups

Sun Jun 14 12:25:14 EDT 2009

ah right.
Contrary to JMS where the MDB is not bootstrapped by HSearch, we can  
do that with JGroups.

On  Jun 14, 2009, at 11:48, Sanne Grinovero wrote:

> About 5#, I think you could avoid the need for an hibernate Session,
> you could forward the work list you receive from the network directly
> to the Lucene backend.
> This means you only need a reference to the SearchFactory; you get a
> reference during initialize() of the backend.
>
>
> 2009/6/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>> Hi,
>> #2. I am using Intellij 8.1. I have downloaded from wiki xml file
>> with codestyle for intellij, but it's still
>> a little bit different from currently existing in HS  (i.e. imports  
>> order or
>> line length).
>> #4. The "listenerClassName" from JGroupsBackendQueueProcessorFactory
>> is mandatory but only in case it is master node, then it specifies  
>> class
>> responsible for receiving messages from
>> slave's. I assumed if that option exist it is master node,  
>> otherwise slave.
>> Falling back to a Lucene implementation
>> was intended to avoid problem from JMS backend; setting up master  
>> node to
>> act also like slave node led to message
>> duplication - MDB received message, recreate it and put it back to  
>> the
>> queue. JGroupsBackendQueueProcessorFactory
>> for master node create also LuceneFactory, and received
>> from slaves Lucene works are processed by LuceneProcessor -
>> message duplication is avoided. It is the only one reason why I  
>> used that.
>> User is supposed to create readers and receivers with this same
>> - JGroupsBackendQueueProcessorFactory.
>> //receiver sample configuration
>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>> <property name="hibernate.search.worker.jgroups.channel_name"
>> value="hs_jg_channel"/>
>> <property name="hibernate.search.worker.jgroups.listener_class"
>> value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>
>>
>> //producers sample configuraion
>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>> <property name="hibernate.search.worker.jgroups.channel_name"
>> value="hs_jg_channel"/>
>> However, in receiver's case JGroups Factory create Lucene  
>> Processors in the
>> context of the problem described above.
>> JGroups Factory for master node just initialize communication  
>> channel, but
>> later all work is done by Lucene Backend, so this same like in JMS.
>> That solution is not fully consistent with JMS backend one, it was  
>> forced by
>> I think different nature of JMS and
>> JGroups.
>> In JMS receiver configuration was done in app server config file,
>> in JGroups I placed it in hibernate configuration.
>> What do you think about such design?
>>
>> #5.That is question, I was also thinking about, to make clustering  
>> more
>> transparent.
>> Hovever I haven't found good idea for that. The purpose of
>> extending JGroupsAbstractMessageReceiver's by user was to
>> implement getSession method, where session used by backend could be  
>> created.
>> I suppose that without this method I would not know if
>> Hibernate sessionFactory should come from i.e. persistence context,  
>> looked
>> up from JNDI or some helper class.
>> #6. Problem with JGroups exists if it tries to multicast traffic  
>> towards
>> the ISP, not internal network, then I think packages are dropped. I  
>> will
>> look through that in the weekend.
>>
>> Lukasz
>>
>> 2009/6/10 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>
>>> Hi Lukasz,
>>> I've been looking into your code; I have some comments but please
>>> forgive me as I don't have any real experience about JGroups, so  
>>> I'll
>>> only tell you how much I see this code fit into Hibernate Search.
>>>
>>> 1) The maven dependency upon JGroups should probably be of type
>>> "optional", so please make sure search is also going to work fine  
>>> for
>>> people which are not really interested in this work and having the
>>> jgroups.jar around, even if others will love it.
>>>
>>> 2) Look out for style, especially white spacing; for example in
>>> BatchedQueueingProcessor look at the formatting difference of line  
>>> 91
>>> compared to 82,85 or 88.
>>> Which IDE are you using? we have some template settings ready to  
>>> help
>>> with this, you can find them on the hibernate.org wiki's.
>>>
>>> 3) JGroupsBackendQueueProcessor is a Runnable setting two  
>>> arguments in
>>> the constructor, they should be final to make sure they are set  
>>> before
>>> they are run in another thread;
>>> just add "final" modifier to lines 19 and 20.
>>>
>>> 4)JGroupsBackendQueueProcessorFactory's design:
>>> It looks like a "listenerClassName" is mandatory and not providing a
>>> default implementation; it actually falls back to a Lucene
>>> implementation when this option is missing. This looks like IMHO
>>> adding some extra complexity into the class which you don't really
>>> need. Is there a good reason for that? Someone could forget some
>>> option in the configuration, it would be better to throw an  
>>> exception
>>> to notify the user about the configuration inconsistency than to do
>>> something differently, or rely on a good default.
>>> Maybe I'm wrong, but then some comments could help me out. Is the  
>>> user
>>> supposed to configure both message producers and receivers with the
>>> same kind of BackendQeueProcessorFactory? That's probably not  
>>> needed,
>>> and not consistent with the way the JMS backend is configured.
>>>
>>> 5)JGroupsAbstractMessageReceiver's design:
>>> This is very similar to the JMS abstract receiver, but in case of  
>>> JMS
>>> I'd expect to have to "annotate" something in my ejb classes, so  
>>> that
>>> it gets deployed by the container and associated to the queue, so in
>>> case of JMS it's mandatory for the user to write some class.
>>> Your solution is fine, but wouldn't it be possible to have a "no- 
>>> code"
>>> solution? The user could just configure this deployment to say
>>> something like "this is the jgroups configuration, this is the
>>> hibernate configuration, you know where to find the entity  
>>> classes...
>>> please listen to the channels and do your job".
>>> It would be very cool to have just to package the search jar with  
>>> some
>>> configuration lines (and the entities of course to read some more
>>> Search configuration) and be ready to start listening for messages.
>>> Actually some future version could avoid the entities and receive  
>>> the
>>> serialized configuration.. just a thought, but that would enable  
>>> us to
>>> prepackage a whole server ready as a Search backend without even
>>> needing to deploy any user code.
>>>
>>> 6) testing... I couldn't start them as JGroups was failing to bind  
>>> to
>>> ports on my machine, I'm sure I am doing something wrong, will try
>>> again after reading some docs about it.
>>> But anyway I got a bit confused about the notion of "Master"s and
>>> "Receivers"; I'm used in the JMS to see the master as the one taking
>>> care of the index, so receiving the docs not sending them.
>>>
>>> Generally speaking, add some comments and debug log statements  
>>> (using
>>> the {} instead of string + concatenation);
>>> I'll try this weekend to try it on remote staging servers, it looks
>>> promising!
>>>
>>> Sanne
>>>
>>> 2009/6/10 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>> Hi,
>>>>
>>>> I've finished task concerning JMS replacement with JGroups. The  
>>>> patch is
>>>> attached. The general idea of pushing indexes through JG is  
>>>> assured,
>>>> however
>>>> there are issues to improve (i.e. flexible JG protocol stack
>>>> configuration).
>>>> Any review or advices would be welcome to make sure that I am not  
>>>> going
>>>> into
>>>> blind alley.
>>>>
>>>> Thanks,
>>>> Lukasz
>>>>
>>>> 2009/5/27 Emmanuel Bernard <emmanuel at hibernate.org>
>>>>>
>>>>> Lukasz,
>>>>> I have been discussing with Manik on #3 and we think that JBoss  
>>>>> Cache /
>>>>> Infinispan are probably a better fit than plain JGroups for that  
>>>>> as all
>>>>> the
>>>>> plumbing will be configured for you.
>>>>> When you reach this problem, let's revive this discussion.
>>>>>
>>>>> On  May 25, 2009, at 11:07, Hardy Ferentschik wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I talked with Łukasz about this last wekk. Definitely, #1 and  
>>>>>> #3.
>>>>>> #2 I don't like either.
>>>>>>
>>>>>> The befefit of #3 would also be that one could drop the  
>>>>>> requirement of
>>>>>> having a shared file system (NFS, NAS, ...) #3 should be quite  
>>>>>> easy to
>>>>>> implement. Maybe easy to get started with.
>>>>>>
>>>>>> --Hardy
>>>>>>
>>>>>> On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
>>>>>> <emmanuel at hibernate.org> wrote:
>>>>>>
>>>>>>> Hello
>>>>>>> I am not sure this is where we should go, or at least, it  
>>>>>>> depends.
>>>>>>> here
>>>>>>> are three scenarii
>>>>>>>
>>>>>>>
>>>>>>> #1 JMS replacement
>>>>>>> If you want to use JGroups as a replacement for the JMS  
>>>>>>> backend, then
>>>>>>> I
>>>>>>> think you should write a jgroups backend. Check
>>>>>>> org.hibernate.search.backend.impl.jms
>>>>>>> In this case all changes are sent via JGroups to a "master". The
>>>>>>> master
>>>>>>> could be voted by the cluster possibly dynamically but that's  
>>>>>>> not
>>>>>>> necessary
>>>>>>> for the first version.
>>>>>>>
>>>>>>> #2 apply indexing on all nodes
>>>>>>> JGroups could send the work queue to all nodes and each node  
>>>>>>> could
>>>>>>> apply
>>>>>>> the change.
>>>>>>> for various reasons I am not fan of this solution as it creates
>>>>>>> overhead
>>>>>>> in CPU / memory usage and does nto scale very well from a  
>>>>>>> theoretical
>>>>>>> PoV.
>>>>>>>
>>>>>>> #3 Index copy
>>>>>>> this is what you are describing, copying the index using JGroups
>>>>>>> instead
>>>>>>> of my file system approach. This might have merits esp as we  
>>>>>>> could
>>>>>>> diminish
>>>>>>> network traffic using multicast but it also require to rethink  
>>>>>>> the
>>>>>>> master /
>>>>>>> slave modus operandi.
>>>>>>> Today the master copy on a regular basis a clean index to a  
>>>>>>> shared
>>>>>>> directory
>>>>>>> On a regular basis, the slave go and copy the clean index from  
>>>>>>> the
>>>>>>> shared directory.
>>>>>>> In your approach, the master would send changes to the slaves  
>>>>>>> and
>>>>>>> slaves
>>>>>>> would have to apply them "right away" (on their passive version)
>>>>>>>
>>>>>>> I think #1 is more interesting than #3, we probably should  
>>>>>>> start with
>>>>>>> that. #3 might be interesting too, thoughts?
>>>>>>>
>>>>>>> Emmanuel
>>>>>>>
>>>>>>> PS: refactoring is a fact of life, so feel free to do so. Just  
>>>>>>> don't
>>>>>>> break public contracts.
>>>>>>>
>>>>>>> On  May 21, 2009, at 22:14, Łukasz Moreń wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have few questions that concern using JGroups to copy index  
>>>>>>>> files.
>>>>>>>> I
>>>>>>>> think to create sender(for master) and receiver(slave)  
>>>>>>>> directory
>>>>>>>> providers.
>>>>>>>> Sender class mainly based on existing  
>>>>>>>> FSMasterDirectoryProvider,
>>>>>>>> first
>>>>>>>> create local index copy and send later to slave nodes
>>>>>>>> (or send without copying, but that may cause lower  
>>>>>>>> performance?).
>>>>>>>> To avoid code redundancy it would be good to refactor a little
>>>>>>>> FSMasterDirectoryProvider class, so then I can use copying
>>>>>>>> functionality in
>>>>>>>> new DirectoryProvider and add sending one; or rather I should  
>>>>>>>> work
>>>>>>>> around
>>>>>>>> it?
>>>>>>>>
>>>>>>>> I do not understand completely how does the multithreading  
>>>>>>>> access to
>>>>>>>> index file work. Does FileChannel class assure that, when  
>>>>>>>> index is
>>>>>>>> copied
>>>>>>>> and new Lucene works are pushed?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>
>>>>
>>
>>