I have attached enhanced JGroups backend patch attached.
The only necessary Hibernate configuration come down to
<property name="hibernate.search.worker.backend"
value="jgroupsMaster" />
or
<property name="hibernate.search.worker.backend"
value="jgroupsSlave" />
depends if it is master or slave node.
Optional are configuration properties:
+optional name for JGroups channel
<property name="hibernate.search.worker.backend.jgroups.clusterName"
value="HSCluster" />
+configuration for JGroups
<property name="hibernate.search.worker.backend.jgroups.configurationFile"
value="udp.xml" />
udp.xml file must be located in classpath
<property name="hibernate.search.worker.backend.jgroups.configurationXml"
value="{configurationInXML}" />
<property name="hibernate.search.worker.backend.jgroups.configurationString"
value="{stringConfiguration}" />
Usage of last two properties is in test cases.
If there is no JGroups configuration provided, flush-udp.xml is default.
That and some other stack configurations are part of JGroups.jar, so we
don't need to create any if thery are enough.
Problem with running test (and not only) on JGroups appears on some
machines, I have noticed on Linux.
There is misunderstanding IPv4 vs. IPv6.
To avoid that, VM param: '-Djava.net.preferIPv4Stack=true' should be added.
Sometimes appears problem that nodes cannot find each other in cluster,
probably because they are binding to wrong interfaces. To assure that
multicast traffic is going to the right interface include
'bind_addr=192.168.168.2' (or -Djgroups.bind_addr=127.0.0.1 to VM params)
into JG configuration, where 192.168.168.2 is address used for multicasting.
Lukasz
W dniu 14 czerwca 2009 21:08 użytkownik Łukasz Moreń <lukasz.moren(a)gmail.com
I will follow your comments and advices and fix it.
+ JGroupsBackendQueueProcessorFactory:
Yes, I think there should be default cluster name.
+ JGroupsAbstractMessageReceiver
getState / setState. These are JGroups methods to get/set current cluster
state. I think their implementation is not necessary in our case.
+ Tests
I used plain SQL in test for master node, to check if master can correctly
receive Lucene works (or i.e. they are not corrupted ) and do indexing. I
didn't want to trigger indexing with hibernate, just insert data. Lucene
document for inserted data is created separately and sent to master which
updates index.
+ Configuration for JGroups
Yes that's right, possibility to JGroups customization should be added.
Default JG configuration works in most cases but not in all, like Sanne has
reported. I noticed that Infinispan has it done. Xml file, properties file
and string with properties as possibilities. So I suppose it is good idea.
2009/6/14 Emmanuel Bernard <emmanuel(a)hibernate.org>
ah right.
> Contrary to JMS where the MDB is not bootstrapped by HSearch, we can do
> that with JGroups.
>
>
> On Jun 14, 2009, at 11:48, Sanne Grinovero wrote:
>
> About 5#, I think you could avoid the need for an hibernate Session,
>> you could forward the work list you receive from the network directly
>> to the Lucene backend.
>> This means you only need a reference to the SearchFactory; you get a
>> reference during initialize() of the backend.
>>
>>
>> 2009/6/13 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>>
>>> Hi,
>>> #2. I am using Intellij 8.1. I have downloaded from wiki xml file
>>> with codestyle for intellij, but it's still
>>> a little bit different from currently existing in HS (i.e. imports
>>> order or
>>> line length).
>>> #4. The "listenerClassName" from
JGroupsBackendQueueProcessorFactory
>>> is mandatory but only in case it is master node, then it specifies class
>>> responsible for receiving messages from
>>> slave's. I assumed if that option exist it is master node, otherwise
>>> slave.
>>> Falling back to a Lucene implementation
>>> was intended to avoid problem from JMS backend; setting up master node
>>> to
>>> act also like slave node led to message
>>> duplication - MDB received message, recreate it and put it back to the
>>> queue. JGroupsBackendQueueProcessorFactory
>>> for master node create also LuceneFactory, and received
>>> from slaves Lucene works are processed by LuceneProcessor -
>>> message duplication is avoided. It is the only one reason why I used
>>> that.
>>> User is supposed to create readers and receivers with this same
>>> - JGroupsBackendQueueProcessorFactory.
>>> //receiver sample configuration
>>> <property name="hibernate.search.worker.backend"
value="jgroups"/>
>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>> value="hs_jg_channel"/>
>>> <property name="hibernate.search.worker.jgroups.listener_class"
>>> value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>
>>>
>>> //producers sample configuraion
>>> <property name="hibernate.search.worker.backend"
value="jgroups"/>
>>> <property name="hibernate.search.worker.jgroups.channel_name"
>>> value="hs_jg_channel"/>
>>> However, in receiver's case JGroups Factory create Lucene Processors in
>>> the
>>> context of the problem described above.
>>> JGroups Factory for master node just initialize communication channel,
>>> but
>>> later all work is done by Lucene Backend, so this same like in JMS.
>>> That solution is not fully consistent with JMS backend one, it was
>>> forced by
>>> I think different nature of JMS and
>>> JGroups.
>>> In JMS receiver configuration was done in app server config file,
>>> in JGroups I placed it in hibernate configuration.
>>> What do you think about such design?
>>>
>>> #5.That is question, I was also thinking about, to make clustering more
>>> transparent.
>>> Hovever I haven't found good idea for that. The purpose of
>>> extending JGroupsAbstractMessageReceiver's by user was to
>>> implement getSession method, where session used by backend could be
>>> created.
>>> I suppose that without this method I would not know if
>>> Hibernate sessionFactory should come from i.e. persistence context,
>>> looked
>>> up from JNDI or some helper class.
>>> #6. Problem with JGroups exists if it tries to multicast traffic towards
>>> the ISP, not internal network, then I think packages are dropped. I will
>>> look through that in the weekend.
>>>
>>> Lukasz
>>>
>>> 2009/6/10 Sanne Grinovero <sanne.grinovero(a)gmail.com>
>>>
>>>>
>>>> Hi Lukasz,
>>>> I've been looking into your code; I have some comments but please
>>>> forgive me as I don't have any real experience about JGroups, so
I'll
>>>> only tell you how much I see this code fit into Hibernate Search.
>>>>
>>>> 1) The maven dependency upon JGroups should probably be of type
>>>> "optional", so please make sure search is also going to work
fine for
>>>> people which are not really interested in this work and having the
>>>> jgroups.jar around, even if others will love it.
>>>>
>>>> 2) Look out for style, especially white spacing; for example in
>>>> BatchedQueueingProcessor look at the formatting difference of line 91
>>>> compared to 82,85 or 88.
>>>> Which IDE are you using? we have some template settings ready to help
>>>> with this, you can find them on the
hibernate.org wiki's.
>>>>
>>>> 3) JGroupsBackendQueueProcessor is a Runnable setting two arguments in
>>>> the constructor, they should be final to make sure they are set before
>>>> they are run in another thread;
>>>> just add "final" modifier to lines 19 and 20.
>>>>
>>>> 4)JGroupsBackendQueueProcessorFactory's design:
>>>> It looks like a "listenerClassName" is mandatory and not
providing a
>>>> default implementation; it actually falls back to a Lucene
>>>> implementation when this option is missing. This looks like IMHO
>>>> adding some extra complexity into the class which you don't really
>>>> need. Is there a good reason for that? Someone could forget some
>>>> option in the configuration, it would be better to throw an exception
>>>> to notify the user about the configuration inconsistency than to do
>>>> something differently, or rely on a good default.
>>>> Maybe I'm wrong, but then some comments could help me out. Is the
user
>>>> supposed to configure both message producers and receivers with the
>>>> same kind of BackendQeueProcessorFactory? That's probably not
needed,
>>>> and not consistent with the way the JMS backend is configured.
>>>>
>>>> 5)JGroupsAbstractMessageReceiver's design:
>>>> This is very similar to the JMS abstract receiver, but in case of JMS
>>>> I'd expect to have to "annotate" something in my ejb
classes, so that
>>>> it gets deployed by the container and associated to the queue, so in
>>>> case of JMS it's mandatory for the user to write some class.
>>>> Your solution is fine, but wouldn't it be possible to have a
"no-code"
>>>> solution? The user could just configure this deployment to say
>>>> something like "this is the jgroups configuration, this is the
>>>> hibernate configuration, you know where to find the entity classes...
>>>> please listen to the channels and do your job".
>>>> It would be very cool to have just to package the search jar with some
>>>> configuration lines (and the entities of course to read some more
>>>> Search configuration) and be ready to start listening for messages.
>>>> Actually some future version could avoid the entities and receive the
>>>> serialized configuration.. just a thought, but that would enable us to
>>>> prepackage a whole server ready as a Search backend without even
>>>> needing to deploy any user code.
>>>>
>>>> 6) testing... I couldn't start them as JGroups was failing to bind
to
>>>> ports on my machine, I'm sure I am doing something wrong, will try
>>>> again after reading some docs about it.
>>>> But anyway I got a bit confused about the notion of "Master"s
and
>>>> "Receivers"; I'm used in the JMS to see the master as the
one taking
>>>> care of the index, so receiving the docs not sending them.
>>>>
>>>> Generally speaking, add some comments and debug log statements (using
>>>> the {} instead of string + concatenation);
>>>> I'll try this weekend to try it on remote staging servers, it looks
>>>> promising!
>>>>
>>>> Sanne
>>>>
>>>> 2009/6/10 Łukasz Moreń <lukasz.moren(a)gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've finished task concerning JMS replacement with JGroups. The
patch
>>>>> is
>>>>> attached. The general idea of pushing indexes through JG is assured,
>>>>> however
>>>>> there are issues to improve (i.e. flexible JG protocol stack
>>>>> configuration).
>>>>> Any review or advices would be welcome to make sure that I am not
>>>>> going
>>>>> into
>>>>> blind alley.
>>>>>
>>>>> Thanks,
>>>>> Lukasz
>>>>>
>>>>> 2009/5/27 Emmanuel Bernard <emmanuel(a)hibernate.org>
>>>>>
>>>>>>
>>>>>> Lukasz,
>>>>>> I have been discussing with Manik on #3 and we think that JBoss
Cache
>>>>>> /
>>>>>> Infinispan are probably a better fit than plain JGroups for that
as
>>>>>> all
>>>>>> the
>>>>>> plumbing will be configured for you.
>>>>>> When you reach this problem, let's revive this discussion.
>>>>>>
>>>>>> On May 25, 2009, at 11:07, Hardy Ferentschik wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I talked with Łukasz about this last wekk. Definitely, #1 and
#3.
>>>>>>> #2 I don't like either.
>>>>>>>
>>>>>>> The befefit of #3 would also be that one could drop the
requirement
>>>>>>> of
>>>>>>> having a shared file system (NFS, NAS, ...) #3 should be
quite easy
>>>>>>> to
>>>>>>> implement. Maybe easy to get started with.
>>>>>>>
>>>>>>> --Hardy
>>>>>>>
>>>>>>> On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
>>>>>>> <emmanuel(a)hibernate.org> wrote:
>>>>>>>
>>>>>>> Hello
>>>>>>>> I am not sure this is where we should go, or at least, it
depends.
>>>>>>>> here
>>>>>>>> are three scenarii
>>>>>>>>
>>>>>>>>
>>>>>>>> #1 JMS replacement
>>>>>>>> If you want to use JGroups as a replacement for the JMS
backend,
>>>>>>>> then
>>>>>>>> I
>>>>>>>> think you should write a jgroups backend. Check
>>>>>>>> org.hibernate.search.backend.impl.jms
>>>>>>>> In this case all changes are sent via JGroups to a
"master". The
>>>>>>>> master
>>>>>>>> could be voted by the cluster possibly dynamically but
that's not
>>>>>>>> necessary
>>>>>>>> for the first version.
>>>>>>>>
>>>>>>>> #2 apply indexing on all nodes
>>>>>>>> JGroups could send the work queue to all nodes and each
node could
>>>>>>>> apply
>>>>>>>> the change.
>>>>>>>> for various reasons I am not fan of this solution as it
creates
>>>>>>>> overhead
>>>>>>>> in CPU / memory usage and does nto scale very well from
a
>>>>>>>> theoretical
>>>>>>>> PoV.
>>>>>>>>
>>>>>>>> #3 Index copy
>>>>>>>> this is what you are describing, copying the index using
JGroups
>>>>>>>> instead
>>>>>>>> of my file system approach. This might have merits esp as
we could
>>>>>>>> diminish
>>>>>>>> network traffic using multicast but it also require to
rethink the
>>>>>>>> master /
>>>>>>>> slave modus operandi.
>>>>>>>> Today the master copy on a regular basis a clean index to
a shared
>>>>>>>> directory
>>>>>>>> On a regular basis, the slave go and copy the clean index
from the
>>>>>>>> shared directory.
>>>>>>>> In your approach, the master would send changes to the
slaves and
>>>>>>>> slaves
>>>>>>>> would have to apply them "right away" (on their
passive version)
>>>>>>>>
>>>>>>>> I think #1 is more interesting than #3, we probably
should start
>>>>>>>> with
>>>>>>>> that. #3 might be interesting too, thoughts?
>>>>>>>>
>>>>>>>> Emmanuel
>>>>>>>>
>>>>>>>> PS: refactoring is a fact of life, so feel free to do so.
Just
>>>>>>>> don't
>>>>>>>> break public contracts.
>>>>>>>>
>>>>>>>> On May 21, 2009, at 22:14, Łukasz Moreń wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have few questions that concern using JGroups to
copy index
>>>>>>>>> files.
>>>>>>>>> I
>>>>>>>>> think to create sender(for master) and
receiver(slave) directory
>>>>>>>>> providers.
>>>>>>>>> Sender class mainly based on existing
FSMasterDirectoryProvider,
>>>>>>>>> first
>>>>>>>>> create local index copy and send later to slave
nodes
>>>>>>>>> (or send without copying, but that may cause lower
performance?).
>>>>>>>>> To avoid code redundancy it would be good to refactor
a little
>>>>>>>>> FSMasterDirectoryProvider class, so then I can use
copying
>>>>>>>>> functionality in
>>>>>>>>> new DirectoryProvider and add sending one; or rather
I should work
>>>>>>>>> around
>>>>>>>>> it?
>>>>>>>>>
>>>>>>>>> I do not understand completely how does the
multithreading access
>>>>>>>>> to
>>>>>>>>> index file work. Does FileChannel class assure that,
when index is
>>>>>>>>> copied
>>>>>>>>> and new Lucene works are pushed?
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> hibernate-dev mailing list
>>>>> hibernate-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>
>>>>>
>>>>>
>>>
>>>
>