I have attached enhanced JGroups backend patch attached.

The only necessary Hibernate configuration come down to

<property name="hibernate.search.worker.backend" value="jgroupsMaster" />
or
<property name="hibernate.search.worker.backend" value="jgroupsSlave" />

depends if it is master or slave node.

Optional are configuration properties:

+optional name for JGroups channel
<property name="hibernate.search.worker.backend.jgroups.clusterName" value="HSCluster" />

+configuration for JGroups
<property name="hibernate.search.worker.backend.jgroups.configurationFile" value="udp.xml" />
udp.xml file must be located in classpath

<property name="hibernate.search.worker.backend.jgroups.configurationXml" value="{configurationInXML}" />

<property name="hibernate.search.worker.backend.jgroups.configurationString" value="{stringConfiguration}" />
Usage of last two properties is in test cases.

If there is no JGroups configuration provided, flush-udp.xml is default. That and some other stack configurations are part of JGroups.jar, so we don't need to create any if thery are enough.


Problem with running test (and not only) on JGroups appears on some machines, I have noticed on Linux.
There is misunderstanding IPv4 vs. IPv6.
To avoid that, VM param: '-Djava.net.preferIPv4Stack=true' should be added.
Sometimes appears problem that nodes cannot find each other in cluster, probably because they are binding to wrong interfaces. To assure that multicast traffic is going to the right interface include 'bind_addr=192.168.168.2' (or -Djgroups.bind_addr=127.0.0.1 to VM params) into JG configuration, where 192.168.168.2 is address used for multicasting.

Lukasz




W dniu 14 czerwca 2009 21:08 użytkownik Łukasz Moreń <lukasz.moren@gmail.com> napisał:
I will follow your comments and advices and fix it.

+ JGroupsBackendQueueProcessorFactory:

Yes, I think there should be default cluster name.

+ JGroupsAbstractMessageReceiver
getState / setState. These are JGroups methods to get/set current cluster state. I think their implementation is not necessary in our case.

+ Tests
I used plain SQL in test for master node, to check if master can correctly receive Lucene works (or i.e. they are not corrupted ) and do indexing. I didn't want to trigger indexing with hibernate, just insert data. Lucene document for inserted data is created separately and sent to master which updates index.

+ Configuration for JGroups
Yes that's right, possibility to JGroups customization should be added. Default JG configuration works in most cases but not in all, like Sanne has reported. I noticed that Infinispan has it done. Xml file, properties file and string with properties as possibilities. So I suppose it is good idea.



2009/6/14 Emmanuel Bernard <emmanuel@hibernate.org>

ah right.
Contrary to JMS where the MDB is not bootstrapped by HSearch, we can do that with JGroups.


On  Jun 14, 2009, at 11:48, Sanne Grinovero wrote:

About 5#, I think you could avoid the need for an hibernate Session,
you could forward the work list you receive from the network directly
to the Lucene backend.
This means you only need a reference to the SearchFactory; you get a
reference during initialize() of the backend.


2009/6/13 Łukasz Moreń <lukasz.moren@gmail.com>:
Hi,
#2. I am using Intellij 8.1. I have downloaded from wiki xml file
with codestyle for intellij, but it's still
a little bit different from currently existing in HS  (i.e. imports order or
line length).
#4. The "listenerClassName" from JGroupsBackendQueueProcessorFactory
is mandatory but only in case it is master node, then it specifies class
responsible for receiving messages from
slave's. I assumed if that option exist it is master node, otherwise slave.
Falling back to a Lucene implementation
was intended to avoid problem from JMS backend; setting up master node to
act also like slave node led to message
duplication - MDB received message, recreate it and put it back to the
queue. JGroupsBackendQueueProcessorFactory
for master node create also LuceneFactory, and received
from slaves Lucene works are processed by LuceneProcessor -
message duplication is avoided. It is the only one reason why I used that.
User is supposed to create readers and receivers with this same
- JGroupsBackendQueueProcessorFactory.
//receiver sample configuration
<property name="hibernate.search.worker.backend" value="jgroups"/>
<property name="hibernate.search.worker.jgroups.channel_name"
value="hs_jg_channel"/>
<property name="hibernate.search.worker.jgroups.listener_class"
value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>

//producers sample configuraion
<property name="hibernate.search.worker.backend" value="jgroups"/>
<property name="hibernate.search.worker.jgroups.channel_name"
value="hs_jg_channel"/>
However, in receiver's case JGroups Factory create Lucene Processors in the
context of the problem described above.
JGroups Factory for master node just initialize communication channel, but
later all work is done by Lucene Backend, so this same like in JMS.
That solution is not fully consistent with JMS backend one, it was forced by
I think different nature of JMS and
JGroups.
In JMS receiver configuration was done in app server config file,
in JGroups I placed it in hibernate configuration.
What do you think about such design?

#5.That is question, I was also thinking about, to make clustering more
transparent.
Hovever I haven't found good idea for that. The purpose of
extending JGroupsAbstractMessageReceiver's by user was to
implement getSession method, where session used by backend could be created.
I suppose that without this method I would not know if
Hibernate sessionFactory should come from i.e. persistence context, looked
up from JNDI or some helper class.
#6. Problem with JGroups exists if it tries to multicast traffic towards
the ISP, not internal network, then I think packages are dropped. I will
look through that in the weekend.

Lukasz

2009/6/10 Sanne Grinovero <sanne.grinovero@gmail.com>

Hi Lukasz,
I've been looking into your code; I have some comments but please
forgive me as I don't have any real experience about JGroups, so I'll
only tell you how much I see this code fit into Hibernate Search.

1) The maven dependency upon JGroups should probably be of type
"optional", so please make sure search is also going to work fine for
people which are not really interested in this work and having the
jgroups.jar around, even if others will love it.

2) Look out for style, especially white spacing; for example in
BatchedQueueingProcessor look at the formatting difference of line 91
compared to 82,85 or 88.
Which IDE are you using? we have some template settings ready to help
with this, you can find them on the hibernate.org wiki's.

3) JGroupsBackendQueueProcessor is a Runnable setting two arguments in
the constructor, they should be final to make sure they are set before
they are run in another thread;
just add "final" modifier to lines 19 and 20.

4)JGroupsBackendQueueProcessorFactory's design:
It looks like a "listenerClassName" is mandatory and not providing a
default implementation; it actually falls back to a Lucene
implementation when this option is missing. This looks like IMHO
adding some extra complexity into the class which you don't really
need. Is there a good reason for that? Someone could forget some
option in the configuration, it would be better to throw an exception
to notify the user about the configuration inconsistency than to do
something differently, or rely on a good default.
Maybe I'm wrong, but then some comments could help me out. Is the user
supposed to configure both message producers and receivers with the
same kind of BackendQeueProcessorFactory? That's probably not needed,
and not consistent with the way the JMS backend is configured.

5)JGroupsAbstractMessageReceiver's design:
This is very similar to the JMS abstract receiver, but in case of JMS
I'd expect to have to "annotate" something in my ejb classes, so that
it gets deployed by the container and associated to the queue, so in
case of JMS it's mandatory for the user to write some class.
Your solution is fine, but wouldn't it be possible to have a "no-code"
solution? The user could just configure this deployment to say
something like "this is the jgroups configuration, this is the
hibernate configuration, you know where to find the entity classes...
please listen to the channels and do your job".
It would be very cool to have just to package the search jar with some
configuration lines (and the entities of course to read some more
Search configuration) and be ready to start listening for messages.
Actually some future version could avoid the entities and receive the
serialized configuration.. just a thought, but that would enable us to
prepackage a whole server ready as a Search backend without even
needing to deploy any user code.

6) testing... I couldn't start them as JGroups was failing to bind to
ports on my machine, I'm sure I am doing something wrong, will try
again after reading some docs about it.
But anyway I got a bit confused about the notion of "Master"s and
"Receivers"; I'm used in the JMS to see the master as the one taking
care of the index, so receiving the docs not sending them.

Generally speaking, add some comments and debug log statements (using
the {} instead of string + concatenation);
I'll try this weekend to try it on remote staging servers, it looks
promising!

Sanne

2009/6/10 Łukasz Moreń <lukasz.moren@gmail.com>:
Hi,

I've finished task concerning JMS replacement with JGroups. The patch is
attached. The general idea of pushing indexes through JG is assured,
however
there are issues to improve (i.e. flexible JG protocol stack
configuration).
Any review or advices would be welcome to make sure that I am not going
into
blind alley.

Thanks,
Lukasz

2009/5/27 Emmanuel Bernard <emmanuel@hibernate.org>

Lukasz,
I have been discussing with Manik on #3 and we think that JBoss Cache /
Infinispan are probably a better fit than plain JGroups for that as all
the
plumbing will be configured for you.
When you reach this problem, let's revive this discussion.

On  May 25, 2009, at 11:07, Hardy Ferentschik wrote:

Hi,

I talked with Łukasz about this last wekk. Definitely, #1 and #3.
#2 I don't like either.

The befefit of #3 would also be that one could drop the requirement of
having a shared file system (NFS, NAS, ...) #3 should be quite easy to
implement. Maybe easy to get started with.

--Hardy

On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
<emmanuel@hibernate.org> wrote:

Hello
I am not sure this is where we should go, or at least, it depends.
here
are three scenarii


#1 JMS replacement
If you want to use JGroups as a replacement for the JMS backend, then
I
think you should write a jgroups backend. Check
org.hibernate.search.backend.impl.jms
In this case all changes are sent via JGroups to a "master". The
master
could be voted by the cluster possibly dynamically but that's not
necessary
for the first version.

#2 apply indexing on all nodes
JGroups could send the work queue to all nodes and each node could
apply
the change.
for various reasons I am not fan of this solution as it creates
overhead
in CPU / memory usage and does nto scale very well from a theoretical
PoV.

#3 Index copy
this is what you are describing, copying the index using JGroups
instead
of my file system approach. This might have merits esp as we could
diminish
network traffic using multicast but it also require to rethink the
master /
slave modus operandi.
Today the master copy on a regular basis a clean index to a shared
directory
On a regular basis, the slave go and copy the clean index from the
shared directory.
In your approach, the master would send changes to the slaves and
slaves
would have to apply them "right away" (on their passive version)

I think #1 is more interesting than #3, we probably should start with
that. #3 might be interesting too, thoughts?

Emmanuel

PS: refactoring is a fact of life, so feel free to do so. Just don't
break public contracts.

On  May 21, 2009, at 22:14, Łukasz Moreń wrote:

Hi,

I have few questions that concern using JGroups to copy index files.
I
think to create sender(for master) and receiver(slave) directory
providers.
Sender class mainly based on existing FSMasterDirectoryProvider,
first
create local index copy and send later to slave nodes
(or send without copying, but that may cause lower performance?).
To avoid code redundancy it would be good to refactor a little
FSMasterDirectoryProvider class, so then I can use copying
functionality in
new DirectoryProvider and add sending one; or rather I should work
around
it?

I do not understand completely how does the multithreading access to
index file work. Does FileChannel class assure that, when index is
copied
and new Lucene works are pushed?






_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev