[hibernate-dev] Re: Pushing indexes through JGroups

Łukasz Moreń lukasz.moren at gmail.com
Tue Jun 23 16:23:31 EDT 2009


I have attached enhanced JGroups backend patch attached.

The only necessary Hibernate configuration come down to

<property name="hibernate.search.worker.backend" value="jgroupsMaster" />
or
<property name="hibernate.search.worker.backend" value="jgroupsSlave" />

depends if it is master or slave node.

Optional are configuration properties:

+optional name for JGroups channel
<property name="hibernate.search.worker.backend.jgroups.clusterName"
value="HSCluster" />

+configuration for JGroups
<property name="hibernate.search.worker.backend.jgroups.configurationFile"
value="udp.xml" />
udp.xml file must be located in classpath

<property name="hibernate.search.worker.backend.jgroups.configurationXml"
value="{configurationInXML}" />

<property name="hibernate.search.worker.backend.jgroups.configurationString"
value="{stringConfiguration}" />
Usage of last two properties is in test cases.

If there is no JGroups configuration provided, flush-udp.xml is default.
That and some other stack configurations are part of JGroups.jar, so we
don't need to create any if thery are enough.


Problem with running test (and not only) on JGroups appears on some
machines, I have noticed on Linux.
There is misunderstanding IPv4 vs. IPv6.
To avoid that, VM param: '-Djava.net.preferIPv4Stack=true' should be added.
Sometimes appears problem that nodes cannot find each other in cluster,
probably because they are binding to wrong interfaces. To assure that
multicast traffic is going to the right interface include
'bind_addr=192.168.168.2' (or -Djgroups.bind_addr=127.0.0.1 to VM params)
into JG configuration, where 192.168.168.2 is address used for multicasting.

Lukasz




W dniu 14 czerwca 2009 21:08 użytkownik Łukasz Moreń <lukasz.moren at gmail.com
> napisał:

> I will follow your comments and advices and fix it.
>
> + JGroupsBackendQueueProcessorFactory:
> Yes, I think there should be default cluster name.
>
> + JGroupsAbstractMessageReceiver
> getState / setState. These are JGroups methods to get/set current cluster
> state. I think their implementation is not necessary in our case.
>
> + Tests
> I used plain SQL in test for master node, to check if master can correctly
> receive Lucene works (or i.e. they are not corrupted ) and do indexing. I
> didn't want to trigger indexing with hibernate, just insert data. Lucene
> document for inserted data is created separately and sent to master which
> updates index.
>
> + Configuration for JGroups
> Yes that's right, possibility to JGroups customization should be added.
> Default JG configuration works in most cases but not in all, like Sanne has
> reported. I noticed that Infinispan has it done. Xml file, properties file
> and string with properties as possibilities. So I suppose it is good idea.
>
>
>
> 2009/6/14 Emmanuel Bernard <emmanuel at hibernate.org>
>
> ah right.
>> Contrary to JMS where the MDB is not bootstrapped by HSearch, we can do
>> that with JGroups.
>>
>>
>> On  Jun 14, 2009, at 11:48, Sanne Grinovero wrote:
>>
>>  About 5#, I think you could avoid the need for an hibernate Session,
>>> you could forward the work list you receive from the network directly
>>> to the Lucene backend.
>>> This means you only need a reference to the SearchFactory; you get a
>>> reference during initialize() of the backend.
>>>
>>>
>>> 2009/6/13 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>
>>>> Hi,
>>>> #2. I am using Intellij 8.1. I have downloaded from wiki xml file
>>>> with codestyle for intellij, but it's still
>>>> a little bit different from currently existing in HS  (i.e. imports
>>>> order or
>>>> line length).
>>>> #4. The "listenerClassName" from JGroupsBackendQueueProcessorFactory
>>>> is mandatory but only in case it is master node, then it specifies class
>>>> responsible for receiving messages from
>>>> slave's. I assumed if that option exist it is master node, otherwise
>>>> slave.
>>>> Falling back to a Lucene implementation
>>>> was intended to avoid problem from JMS backend; setting up master node
>>>> to
>>>> act also like slave node led to message
>>>> duplication - MDB received message, recreate it and put it back to the
>>>> queue. JGroupsBackendQueueProcessorFactory
>>>> for master node create also LuceneFactory, and received
>>>> from slaves Lucene works are processed by LuceneProcessor -
>>>> message duplication is avoided. It is the only one reason why I used
>>>> that.
>>>> User is supposed to create readers and receivers with this same
>>>> - JGroupsBackendQueueProcessorFactory.
>>>> //receiver sample configuration
>>>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>>> value="hs_jg_channel"/>
>>>> <property name="hibernate.search.worker.jgroups.listener_class"
>>>> value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>
>>>>
>>>> //producers sample configuraion
>>>> <property name="hibernate.search.worker.backend" value="jgroups"/>
>>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>>> value="hs_jg_channel"/>
>>>> However, in receiver's case JGroups Factory create Lucene Processors in
>>>> the
>>>> context of the problem described above.
>>>> JGroups Factory for master node just initialize communication channel,
>>>> but
>>>> later all work is done by Lucene Backend, so this same like in JMS.
>>>> That solution is not fully consistent with JMS backend one, it was
>>>> forced by
>>>> I think different nature of JMS and
>>>> JGroups.
>>>> In JMS receiver configuration was done in app server config file,
>>>> in JGroups I placed it in hibernate configuration.
>>>> What do you think about such design?
>>>>
>>>> #5.That is question, I was also thinking about, to make clustering more
>>>> transparent.
>>>> Hovever I haven't found good idea for that. The purpose of
>>>> extending JGroupsAbstractMessageReceiver's by user was to
>>>> implement getSession method, where session used by backend could be
>>>> created.
>>>> I suppose that without this method I would not know if
>>>> Hibernate sessionFactory should come from i.e. persistence context,
>>>> looked
>>>> up from JNDI or some helper class.
>>>> #6. Problem with JGroups exists if it tries to multicast traffic towards
>>>> the ISP, not internal network, then I think packages are dropped. I will
>>>> look through that in the weekend.
>>>>
>>>> Lukasz
>>>>
>>>> 2009/6/10 Sanne Grinovero <sanne.grinovero at gmail.com>
>>>>
>>>>>
>>>>> Hi Lukasz,
>>>>> I've been looking into your code; I have some comments but please
>>>>> forgive me as I don't have any real experience about JGroups, so I'll
>>>>> only tell you how much I see this code fit into Hibernate Search.
>>>>>
>>>>> 1) The maven dependency upon JGroups should probably be of type
>>>>> "optional", so please make sure search is also going to work fine for
>>>>> people which are not really interested in this work and having the
>>>>> jgroups.jar around, even if others will love it.
>>>>>
>>>>> 2) Look out for style, especially white spacing; for example in
>>>>> BatchedQueueingProcessor look at the formatting difference of line 91
>>>>> compared to 82,85 or 88.
>>>>> Which IDE are you using? we have some template settings ready to help
>>>>> with this, you can find them on the hibernate.org wiki's.
>>>>>
>>>>> 3) JGroupsBackendQueueProcessor is a Runnable setting two arguments in
>>>>> the constructor, they should be final to make sure they are set before
>>>>> they are run in another thread;
>>>>> just add "final" modifier to lines 19 and 20.
>>>>>
>>>>> 4)JGroupsBackendQueueProcessorFactory's design:
>>>>> It looks like a "listenerClassName" is mandatory and not providing a
>>>>> default implementation; it actually falls back to a Lucene
>>>>> implementation when this option is missing. This looks like IMHO
>>>>> adding some extra complexity into the class which you don't really
>>>>> need. Is there a good reason for that? Someone could forget some
>>>>> option in the configuration, it would be better to throw an exception
>>>>> to notify the user about the configuration inconsistency than to do
>>>>> something differently, or rely on a good default.
>>>>> Maybe I'm wrong, but then some comments could help me out. Is the user
>>>>> supposed to configure both message producers and receivers with the
>>>>> same kind of BackendQeueProcessorFactory? That's probably not needed,
>>>>> and not consistent with the way the JMS backend is configured.
>>>>>
>>>>> 5)JGroupsAbstractMessageReceiver's design:
>>>>> This is very similar to the JMS abstract receiver, but in case of JMS
>>>>> I'd expect to have to "annotate" something in my ejb classes, so that
>>>>> it gets deployed by the container and associated to the queue, so in
>>>>> case of JMS it's mandatory for the user to write some class.
>>>>> Your solution is fine, but wouldn't it be possible to have a "no-code"
>>>>> solution? The user could just configure this deployment to say
>>>>> something like "this is the jgroups configuration, this is the
>>>>> hibernate configuration, you know where to find the entity classes...
>>>>> please listen to the channels and do your job".
>>>>> It would be very cool to have just to package the search jar with some
>>>>> configuration lines (and the entities of course to read some more
>>>>> Search configuration) and be ready to start listening for messages.
>>>>> Actually some future version could avoid the entities and receive the
>>>>> serialized configuration.. just a thought, but that would enable us to
>>>>> prepackage a whole server ready as a Search backend without even
>>>>> needing to deploy any user code.
>>>>>
>>>>> 6) testing... I couldn't start them as JGroups was failing to bind to
>>>>> ports on my machine, I'm sure I am doing something wrong, will try
>>>>> again after reading some docs about it.
>>>>> But anyway I got a bit confused about the notion of "Master"s and
>>>>> "Receivers"; I'm used in the JMS to see the master as the one taking
>>>>> care of the index, so receiving the docs not sending them.
>>>>>
>>>>> Generally speaking, add some comments and debug log statements (using
>>>>> the {} instead of string + concatenation);
>>>>> I'll try this weekend to try it on remote staging servers, it looks
>>>>> promising!
>>>>>
>>>>> Sanne
>>>>>
>>>>> 2009/6/10 Łukasz Moreń <lukasz.moren at gmail.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've finished task concerning JMS replacement with JGroups. The patch
>>>>>> is
>>>>>> attached. The general idea of pushing indexes through JG is assured,
>>>>>> however
>>>>>> there are issues to improve (i.e. flexible JG protocol stack
>>>>>> configuration).
>>>>>> Any review or advices would be welcome to make sure that I am not
>>>>>> going
>>>>>> into
>>>>>> blind alley.
>>>>>>
>>>>>> Thanks,
>>>>>> Lukasz
>>>>>>
>>>>>> 2009/5/27 Emmanuel Bernard <emmanuel at hibernate.org>
>>>>>>
>>>>>>>
>>>>>>> Lukasz,
>>>>>>> I have been discussing with Manik on #3 and we think that JBoss Cache
>>>>>>> /
>>>>>>> Infinispan are probably a better fit than plain JGroups for that as
>>>>>>> all
>>>>>>> the
>>>>>>> plumbing will be configured for you.
>>>>>>> When you reach this problem, let's revive this discussion.
>>>>>>>
>>>>>>> On  May 25, 2009, at 11:07, Hardy Ferentschik wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>>
>>>>>>>> I talked with Łukasz about this last wekk. Definitely, #1 and #3.
>>>>>>>> #2 I don't like either.
>>>>>>>>
>>>>>>>> The befefit of #3 would also be that one could drop the requirement
>>>>>>>> of
>>>>>>>> having a shared file system (NFS, NAS, ...) #3 should be quite easy
>>>>>>>> to
>>>>>>>> implement. Maybe easy to get started with.
>>>>>>>>
>>>>>>>> --Hardy
>>>>>>>>
>>>>>>>> On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
>>>>>>>> <emmanuel at hibernate.org> wrote:
>>>>>>>>
>>>>>>>>  Hello
>>>>>>>>> I am not sure this is where we should go, or at least, it depends.
>>>>>>>>> here
>>>>>>>>> are three scenarii
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> #1 JMS replacement
>>>>>>>>> If you want to use JGroups as a replacement for the JMS backend,
>>>>>>>>> then
>>>>>>>>> I
>>>>>>>>> think you should write a jgroups backend. Check
>>>>>>>>> org.hibernate.search.backend.impl.jms
>>>>>>>>> In this case all changes are sent via JGroups to a "master". The
>>>>>>>>> master
>>>>>>>>> could be voted by the cluster possibly dynamically but that's not
>>>>>>>>> necessary
>>>>>>>>> for the first version.
>>>>>>>>>
>>>>>>>>> #2 apply indexing on all nodes
>>>>>>>>> JGroups could send the work queue to all nodes and each node could
>>>>>>>>> apply
>>>>>>>>> the change.
>>>>>>>>> for various reasons I am not fan of this solution as it creates
>>>>>>>>> overhead
>>>>>>>>> in CPU / memory usage and does nto scale very well from a
>>>>>>>>> theoretical
>>>>>>>>> PoV.
>>>>>>>>>
>>>>>>>>> #3 Index copy
>>>>>>>>> this is what you are describing, copying the index using JGroups
>>>>>>>>> instead
>>>>>>>>> of my file system approach. This might have merits esp as we could
>>>>>>>>> diminish
>>>>>>>>> network traffic using multicast but it also require to rethink the
>>>>>>>>> master /
>>>>>>>>> slave modus operandi.
>>>>>>>>> Today the master copy on a regular basis a clean index to a shared
>>>>>>>>> directory
>>>>>>>>> On a regular basis, the slave go and copy the clean index from the
>>>>>>>>> shared directory.
>>>>>>>>> In your approach, the master would send changes to the slaves and
>>>>>>>>> slaves
>>>>>>>>> would have to apply them "right away" (on their passive version)
>>>>>>>>>
>>>>>>>>> I think #1 is more interesting than #3, we probably should start
>>>>>>>>> with
>>>>>>>>> that. #3 might be interesting too, thoughts?
>>>>>>>>>
>>>>>>>>> Emmanuel
>>>>>>>>>
>>>>>>>>> PS: refactoring is a fact of life, so feel free to do so. Just
>>>>>>>>> don't
>>>>>>>>> break public contracts.
>>>>>>>>>
>>>>>>>>> On  May 21, 2009, at 22:14, Łukasz Moreń wrote:
>>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> I have few questions that concern using JGroups to copy index
>>>>>>>>>> files.
>>>>>>>>>> I
>>>>>>>>>> think to create sender(for master) and receiver(slave) directory
>>>>>>>>>> providers.
>>>>>>>>>> Sender class mainly based on existing FSMasterDirectoryProvider,
>>>>>>>>>> first
>>>>>>>>>> create local index copy and send later to slave nodes
>>>>>>>>>> (or send without copying, but that may cause lower performance?).
>>>>>>>>>> To avoid code redundancy it would be good to refactor a little
>>>>>>>>>> FSMasterDirectoryProvider class, so then I can use copying
>>>>>>>>>> functionality in
>>>>>>>>>> new DirectoryProvider and add sending one; or rather I should work
>>>>>>>>>> around
>>>>>>>>>> it?
>>>>>>>>>>
>>>>>>>>>> I do not understand completely how does the multithreading access
>>>>>>>>>> to
>>>>>>>>>> index file work. Does FileChannel class assure that, when index is
>>>>>>>>>> copied
>>>>>>>>>> and new Lucene works are pushed?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hibernate-dev mailing list
>>>>>> hibernate-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090623/9d320c5e/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JGroups_Backend_patch_-_23_06_2009.patch
Type: application/octet-stream
Size: 48994 bytes
Desc: not available
Url : http://lists.jboss.org/pipermail/hibernate-dev/attachments/20090623/9d320c5e/attachment.obj 


More information about the hibernate-dev mailing list