Re: [hibernate-dev] Re: Pushing indexes through JGroups

Sunday, 14 June 2009

I will follow your comments and advices and fix it.

+ JGroupsBackendQueueProcessorFactory:
Yes, I think there should be default cluster name.

+ JGroupsAbstractMessageReceiver
getState / setState. These are JGroups methods to get/set current cluster
state. I think their implementation is not necessary in our case.

+ Tests
I used plain SQL in test for master node, to check if master can correctly
receive Lucene works (or i.e. they are not corrupted ) and do indexing. I
didn't want to trigger indexing with hibernate, just insert data. Lucene
document for inserted data is created separately and sent to master which
updates index.

+ Configuration for JGroups
Yes that's right, possibility to JGroups customization should be added.
Default JG configuration works in most cases but not in all, like Sanne has
reported. I noticed that Infinispan has it done. Xml file, properties file
and string with properties as possibilities. So I suppose it is good idea.

2009/6/14 Emmanuel Bernard <emmanuel(a)hibernate.org&gt;

...
 ah right.
 Contrary to JMS where the MDB is not bootstrapped by HSearch, we can do
 that with JGroups.

 On  Jun 14, 2009, at 11:48, Sanne Grinovero wrote:

  About 5#, I think you could avoid the need for an hibernate Session,
> you could forward the work list you receive from the network directly
> to the Lucene backend.
> This means you only need a reference to the SearchFactory; you get a
> reference during initialize() of the backend.
>
>
> 2009/6/13 Łukasz Moreń <lukasz.moren(a)gmail.com&gt;:
>
>> Hi,
>> #2. I am using Intellij 8.1. I have downloaded from wiki xml file
>> with codestyle for intellij, but it's still
>> a little bit different from currently existing in HS  (i.e. imports order
>> or
>> line length).
>> #4. The "listenerClassName" from JGroupsBackendQueueProcessorFactory
>> is mandatory but only in case it is master node, then it specifies class
>> responsible for receiving messages from
>> slave's. I assumed if that option exist it is master node, otherwise
>> slave.
>> Falling back to a Lucene implementation
>> was intended to avoid problem from JMS backend; setting up master node to
>> act also like slave node led to message
>> duplication - MDB received message, recreate it and put it back to the
>> queue. JGroupsBackendQueueProcessorFactory
>> for master node create also LuceneFactory, and received
>> from slaves Lucene works are processed by LuceneProcessor -
>> message duplication is avoided. It is the only one reason why I used
>> that.
>> User is supposed to create readers and receivers with this same
>> - JGroupsBackendQueueProcessorFactory.
>> //receiver sample configuration
>> <property name="hibernate.search.worker.backend"
value="jgroups"/>
>> <property name="hibernate.search.worker.jgroups.channel_name"
>> value="hs_jg_channel"/>
>> <property name="hibernate.search.worker.jgroups.listener_class"
>> value="pl.lmoren.master.JGroupsMessageReceiverImpl"/>
>>
>> //producers sample configuraion
>> <property name="hibernate.search.worker.backend"
value="jgroups"/>
>> <property name="hibernate.search.worker.jgroups.channel_name"
>> value="hs_jg_channel"/>
>> However, in receiver's case JGroups Factory create Lucene Processors in
>> the
>> context of the problem described above.
>> JGroups Factory for master node just initialize communication channel,
>> but
>> later all work is done by Lucene Backend, so this same like in JMS.
>> That solution is not fully consistent with JMS backend one, it was forced
>> by
>> I think different nature of JMS and
>> JGroups.
>> In JMS receiver configuration was done in app server config file,
>> in JGroups I placed it in hibernate configuration.
>> What do you think about such design?
>>
>> #5.That is question, I was also thinking about, to make clustering more
>> transparent.
>> Hovever I haven't found good idea for that. The purpose of
>> extending JGroupsAbstractMessageReceiver's by user was to
>> implement getSession method, where session used by backend could be
>> created.
>> I suppose that without this method I would not know if
>> Hibernate sessionFactory should come from i.e. persistence context,
>> looked
>> up from JNDI or some helper class.
>> #6. Problem with JGroups exists if it tries to multicast traffic towards
>> the ISP, not internal network, then I think packages are dropped. I will
>> look through that in the weekend.
>>
>> Lukasz
>>
>> 2009/6/10 Sanne Grinovero <sanne.grinovero(a)gmail.com&gt;
>>
>>>
>>> Hi Lukasz,
>>> I've been looking into your code; I have some comments but please
>>> forgive me as I don't have any real experience about JGroups, so
I'll
>>> only tell you how much I see this code fit into Hibernate Search.
>>>
>>> 1) The maven dependency upon JGroups should probably be of type
>>> "optional", so please make sure search is also going to work fine
for
>>> people which are not really interested in this work and having the
>>> jgroups.jar around, even if others will love it.
>>>
>>> 2) Look out for style, especially white spacing; for example in
>>> BatchedQueueingProcessor look at the formatting difference of line 91
>>> compared to 82,85 or 88.
>>> Which IDE are you using? we have some template settings ready to help
>>> with this, you can find them on the hibernate.org wiki's.
>>>
>>> 3) JGroupsBackendQueueProcessor is a Runnable setting two arguments in
>>> the constructor, they should be final to make sure they are set before
>>> they are run in another thread;
>>> just add "final" modifier to lines 19 and 20.
>>>
>>> 4)JGroupsBackendQueueProcessorFactory's design:
>>> It looks like a "listenerClassName" is mandatory and not providing
a
>>> default implementation; it actually falls back to a Lucene
>>> implementation when this option is missing. This looks like IMHO
>>> adding some extra complexity into the class which you don't really
>>> need. Is there a good reason for that? Someone could forget some
>>> option in the configuration, it would be better to throw an exception
>>> to notify the user about the configuration inconsistency than to do
>>> something differently, or rely on a good default.
>>> Maybe I'm wrong, but then some comments could help me out. Is the user
>>> supposed to configure both message producers and receivers with the
>>> same kind of BackendQeueProcessorFactory? That's probably not needed,
>>> and not consistent with the way the JMS backend is configured.
>>>
>>> 5)JGroupsAbstractMessageReceiver's design:
>>> This is very similar to the JMS abstract receiver, but in case of JMS
>>> I'd expect to have to "annotate" something in my ejb classes,
so that
>>> it gets deployed by the container and associated to the queue, so in
>>> case of JMS it's mandatory for the user to write some class.
>>> Your solution is fine, but wouldn't it be possible to have a
"no-code"
>>> solution? The user could just configure this deployment to say
>>> something like "this is the jgroups configuration, this is the
>>> hibernate configuration, you know where to find the entity classes...
>>> please listen to the channels and do your job".
>>> It would be very cool to have just to package the search jar with some
>>> configuration lines (and the entities of course to read some more
>>> Search configuration) and be ready to start listening for messages.
>>> Actually some future version could avoid the entities and receive the
>>> serialized configuration.. just a thought, but that would enable us to
>>> prepackage a whole server ready as a Search backend without even
>>> needing to deploy any user code.
>>>
>>> 6) testing... I couldn't start them as JGroups was failing to bind to
>>> ports on my machine, I'm sure I am doing something wrong, will try
>>> again after reading some docs about it.
>>> But anyway I got a bit confused about the notion of "Master"s and
>>> "Receivers"; I'm used in the JMS to see the master as the one
taking
>>> care of the index, so receiving the docs not sending them.
>>>
>>> Generally speaking, add some comments and debug log statements (using
>>> the {} instead of string + concatenation);
>>> I'll try this weekend to try it on remote staging servers, it looks
>>> promising!
>>>
>>> Sanne
>>>
>>> 2009/6/10 Łukasz Moreń <lukasz.moren(a)gmail.com&gt;:
>>>
>>>> Hi,
>>>>
>>>> I've finished task concerning JMS replacement with JGroups. The
patch
>>>> is
>>>> attached. The general idea of pushing indexes through JG is assured,
>>>> however
>>>> there are issues to improve (i.e. flexible JG protocol stack
>>>> configuration).
>>>> Any review or advices would be welcome to make sure that I am not going
>>>> into
>>>> blind alley.
>>>>
>>>> Thanks,
>>>> Lukasz
>>>>
>>>> 2009/5/27 Emmanuel Bernard <emmanuel(a)hibernate.org&gt;
>>>>
>>>>>
>>>>> Lukasz,
>>>>> I have been discussing with Manik on #3 and we think that JBoss
Cache
>>>>> /
>>>>> Infinispan are probably a better fit than plain JGroups for that as
>>>>> all
>>>>> the
>>>>> plumbing will be configured for you.
>>>>> When you reach this problem, let's revive this discussion.
>>>>>
>>>>> On  May 25, 2009, at 11:07, Hardy Ferentschik wrote:
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> I talked with Łukasz about this last wekk. Definitely, #1 and
#3.
>>>>>> #2 I don't like either.
>>>>>>
>>>>>> The befefit of #3 would also be that one could drop the
requirement
>>>>>> of
>>>>>> having a shared file system (NFS, NAS, ...) #3 should be quite
easy
>>>>>> to
>>>>>> implement. Maybe easy to get started with.
>>>>>>
>>>>>> --Hardy
>>>>>>
>>>>>> On Mon, 25 May 2009 10:55:52 +0200, Emmanuel Bernard
>>>>>> <emmanuel(a)hibernate.org&gt; wrote:
>>>>>>
>>>>>>  Hello
>>>>>>> I am not sure this is where we should go, or at least, it
depends.
>>>>>>> here
>>>>>>> are three scenarii
>>>>>>>
>>>>>>>
>>>>>>> #1 JMS replacement
>>>>>>> If you want to use JGroups as a replacement for the JMS
backend,
>>>>>>> then
>>>>>>> I
>>>>>>> think you should write a jgroups backend. Check
>>>>>>> org.hibernate.search.backend.impl.jms
>>>>>>> In this case all changes are sent via JGroups to a
"master". The
>>>>>>> master
>>>>>>> could be voted by the cluster possibly dynamically but
that's not
>>>>>>> necessary
>>>>>>> for the first version.
>>>>>>>
>>>>>>> #2 apply indexing on all nodes
>>>>>>> JGroups could send the work queue to all nodes and each node
could
>>>>>>> apply
>>>>>>> the change.
>>>>>>> for various reasons I am not fan of this solution as it
creates
>>>>>>> overhead
>>>>>>> in CPU / memory usage and does nto scale very well from a
>>>>>>> theoretical
>>>>>>> PoV.
>>>>>>>
>>>>>>> #3 Index copy
>>>>>>> this is what you are describing, copying the index using
JGroups
>>>>>>> instead
>>>>>>> of my file system approach. This might have merits esp as we
could
>>>>>>> diminish
>>>>>>> network traffic using multicast but it also require to
rethink the
>>>>>>> master /
>>>>>>> slave modus operandi.
>>>>>>> Today the master copy on a regular basis a clean index to a
shared
>>>>>>> directory
>>>>>>> On a regular basis, the slave go and copy the clean index
from the
>>>>>>> shared directory.
>>>>>>> In your approach, the master would send changes to the slaves
and
>>>>>>> slaves
>>>>>>> would have to apply them "right away" (on their
passive version)
>>>>>>>
>>>>>>> I think #1 is more interesting than #3, we probably should
start
>>>>>>> with
>>>>>>> that. #3 might be interesting too, thoughts?
>>>>>>>
>>>>>>> Emmanuel
>>>>>>>
>>>>>>> PS: refactoring is a fact of life, so feel free to do so.
Just don't
>>>>>>> break public contracts.
>>>>>>>
>>>>>>> On  May 21, 2009, at 22:14, Łukasz Moreń wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>>
>>>>>>>> I have few questions that concern using JGroups to copy
index
>>>>>>>> files.
>>>>>>>> I
>>>>>>>> think to create sender(for master) and receiver(slave)
directory
>>>>>>>> providers.
>>>>>>>> Sender class mainly based on existing
FSMasterDirectoryProvider,
>>>>>>>> first
>>>>>>>> create local index copy and send later to slave nodes
>>>>>>>> (or send without copying, but that may cause lower
performance?).
>>>>>>>> To avoid code redundancy it would be good to refactor a
little
>>>>>>>> FSMasterDirectoryProvider class, so then I can use
copying
>>>>>>>> functionality in
>>>>>>>> new DirectoryProvider and add sending one; or rather I
should work
>>>>>>>> around
>>>>>>>> it?
>>>>>>>>
>>>>>>>> I do not understand completely how does the
multithreading access
>>>>>>>> to
>>>>>>>> index file work. Does FileChannel class assure that, when
index is
>>>>>>>> copied
>>>>>>>> and new Lucene works are pushed?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>
>>>>
>>>>
>>
>>

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Re: Pushing indexes through JGroups