Re: Pushing indexes through JGroups
by Emmanuel Bernard
Hello
I am not sure this is where we should go, or at least, it depends.
here are three scenarii
#1 JMS replacement
If you want to use JGroups as a replacement for the JMS backend, then
I think you should write a jgroups backend. Check
org.hibernate.search.backend.impl.jms
In this case all changes are sent via JGroups to a "master". The
master could be voted by the cluster possibly dynamically but that's
not necessary for the first version.
#2 apply indexing on all nodes
JGroups could send the work queue to all nodes and each node could
apply the change.
for various reasons I am not fan of this solution as it creates
overhead in CPU / memory usage and does nto scale very well from a
theoretical PoV.
#3 Index copy
this is what you are describing, copying the index using JGroups
instead of my file system approach. This might have merits esp as we
could diminish network traffic using multicast but it also require to
rethink the master / slave modus operandi.
Today the master copy on a regular basis a clean index to a shared
directory
On a regular basis, the slave go and copy the clean index from the
shared directory.
In your approach, the master would send changes to the slaves and
slaves would have to apply them "right away" (on their passive version)
I think #1 is more interesting than #3, we probably should start with
that. #3 might be interesting too, thoughts?
Emmanuel
PS: refactoring is a fact of life, so feel free to do so. Just don't
break public contracts.
On May 21, 2009, at 22:14, Łukasz Moreń wrote:
> Hi,
>
> I have few questions that concern using JGroups to copy index files.
> I think to create sender(for master) and receiver(slave) directory
> providers.
> Sender class mainly based on existing FSMasterDirectoryProvider,
> first create local index copy and send later to slave nodes
> (or send without copying, but that may cause lower performance?).
> To avoid code redundancy it would be good to refactor a little
> FSMasterDirectoryProvider class, so then I can use copying
> functionality in new DirectoryProvider and add sending one; or
> rather I should work around it?
>
> I do not understand completely how does the multithreading access to
> index file work. Does FileChannel class assure that, when index is
> copied and new Lucene works are pushed?
>
>
> I hope you had great holidays:)
>
> Lukas
15 years, 6 months
EntityPersister.initializeLazyProperty: ask for one, initialize all?
by Nikita Tovstoles
Our app fails to scale sufficiently and I'd traced our problems to eager loading of all OneToOne relations when any single one is accessed. I would like to fix that but wanted to get feedback first. I'm referring to Hibernate Core 3.3.1.GA below:
Currently in AbstractFieldInterceptor.intercept():
"uninitializedFields = null; //let's assume that there is only one lazy fetch group, for now!"
proposed fix:
-after 'result' is returned call uninitializedFields.remove(fieldname). Question: should this only be done if result != null?
And then AbstractEntityPersister.initializeLazyProperties() calls methods that initialize *all* properties even though a specific fieldname is supplied:
* initializeLazyPropertiesFromDatastore or
* initializeLazyPropertiesFromCache
Proposed fix:
-In both cases, determine appropriate 'j' value by searching lazyPropertyNames for 'fieldName'
-only call nullSafeGet, and initializeLazyProperty( fieldName, entity, session, snapshot, j, propValue ) once.
What do folks think?
Thanks,
-nikita
15 years, 7 months
Antlr3 / JDK 1.5
by Steve Ebersole
I am working wth Alexandre Porcelli on some major changes to the HQL
translators. Since the changes are drastic we decided to move from
Antlr2 to Antlr3 in the process as Antlr3 offers many benefits over
Antlr2.
The only concern I have with the move to Antlr3 is the fact that Antlr3
only works with JDK 1.5+. I had planned on incorporating this work into
Hibernate 3.5. The obvious question being the corresponding JDK move
for Hibernate users migrating to 3.5.
Anyone have strong reasons to not do this move in the 3.5 timeframe?
--
Steve Ebersole <steve(a)hibernate.org>
Hibernate.org
15 years, 7 months
Re: [seam-dev] 2.1.2 release
by Sanne Grinovero
Sure, thanks.
I've opened JBSEAM-4201, please edit it about the Fix version and give
me some directions about the logging libraries.
I'll try to attach a patch next weekend.
Sanne
2009/5/29 Pete Muir <pmuir(a)redhat.com>:
> Right, we should do this update?
>
> Sanne, do you want to submit a patch to a JIRA?
>
> On 28 May 2009, at 22:38, Sanne Grinovero wrote:
>
>> Hi,
>> about library versions I'm wondering why doesn't Seam bundle the
>> latest hibernate 3.3.x ?
>>
>> I guess it was a good thing to keep the same version as JBoss AS 4.2,
>> but since 5 is out these reason could be reconsidered.
>>
>> Hibernate Search 3.1.0 is available since months and has many
>> improvements over 3.0.1, but it requires an up-to-date hibernate core.
>>
>> Sanne
>>
>> 2009/5/28 Dan Allen <dan.j.allen(a)gmail.com>:
>>>
>>>>
>>>>> There was also
>>>>> question, if we can upgrade the spring integration to 2.5.x instead of
>>>>> 2.0.x.
>>>
>>> I've been using Seam together with Spring 2.5 for a long time. There
>>> really
>>> is no conflicts...and what is nice is that you get the Spring ELResolver
>>> (instead of the VariableResolver) in your hands for tighter integration.
>>>
>>> -Dan
>>>
>>> --
>>> Dan Allen
>>> Senior Software Engineer, Red Hat | Author of Seam in Action
>>>
>>> http://mojavelinux.com
>>> http://mojavelinux.com/seaminaction
>>> http://in.relation.to/Bloggers/Dan
>>>
>>> NOTE: While I make a strong effort to keep up with my email on a daily
>>> basis, personal or other work matters can sometimes keep me away
>>> from my email. If you contact me, but don't hear back for more than a
>>> week,
>>> it is very likely that I am excessively backlogged or the message was
>>> caught in the spam filters. Please don't hesitate to resend a message if
>>> you feel that it did not reach my attention.
>>>
>>> _______________________________________________
>>> seam-dev mailing list
>>> seam-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/seam-dev
>>>
>>>
>
>
15 years, 7 months
Re: [seam-dev] 2.1.2 release
by Sanne Grinovero
Hi,
about library versions I'm wondering why doesn't Seam bundle the
latest hibernate 3.3.x ?
I guess it was a good thing to keep the same version as JBoss AS 4.2,
but since 5 is out these reason could be reconsidered.
Hibernate Search 3.1.0 is available since months and has many
improvements over 3.0.1, but it requires an up-to-date hibernate core.
Sanne
2009/5/28 Dan Allen <dan.j.allen(a)gmail.com>:
>
>>
>>> There was also
>>> question, if we can upgrade the spring integration to 2.5.x instead of
>>> 2.0.x.
>
> I've been using Seam together with Spring 2.5 for a long time. There really
> is no conflicts...and what is nice is that you get the Spring ELResolver
> (instead of the VariableResolver) in your hands for tighter integration.
>
> -Dan
>
> --
> Dan Allen
> Senior Software Engineer, Red Hat | Author of Seam in Action
>
> http://mojavelinux.com
> http://mojavelinux.com/seaminaction
> http://in.relation.to/Bloggers/Dan
>
> NOTE: While I make a strong effort to keep up with my email on a daily
> basis, personal or other work matters can sometimes keep me away
> from my email. If you contact me, but don't hear back for more than a week,
> it is very likely that I am excessively backlogged or the message was
> caught in the spam filters. Please don't hesitate to resend a message if
> you feel that it did not reach my attention.
>
> _______________________________________________
> seam-dev mailing list
> seam-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/seam-dev
>
>
15 years, 7 months
[BV] Support for constraints by groups
by Emmanuel Bernard
Please review. Also we need to implement the // group validation in HV
to avoid to many passes.
Begin forwarded message:
> From: Emmanuel Bernard <emmanuel.bernard(a)JBOSS.COM>
> Date: May 26, 2009 16:43:53 CEDT
> To: JSR-303-EG(a)JCP.ORG
> Subject: Support for constraints by groups
> Reply-To: Java Community Process JSR #303 Expert List <JSR-303-EG(a)JCP.ORG
> >
>
> Hello,
> following a subject dear to Gerhard's heart :)
> I could think of two approaches, let's beat them to death and if
> possible find a better alternative :)
>
>
> # background
> When validating a specific group(s), Bean Validation expends the
> group into a sequence of groups to be validated (because a group can
> be a group sequence really). Depending on how clever the
> implementation is, it could sequencialize all group execution or try
> to validate several non sequenced group in parallel.
>
> The concept of discovering which group is validated based on the
> requested groups is complex enough so an API at the metadata level
> is required.
>
> examples
> public interface Minimal {}
>
> @GroupSequence(Minimal.class, Default.class)
> public interface Complete {}
>
>
> #1 define the API at the ElementDescriptor level
>
> BeanDescriptor bd = validator.getConstraintsForClass(Address.class);
> PropertyDescriptor pd = bd.getConstraintsForProperty("street1");
> List<Set<ConstraintDescriptor>> sequencedConstraints =
> pd.getConstraintDescriptorsForGroups(Complete.class);
> //all ConstraintDescriptor in a given set can be validated in the
> same pass
> //two sets of constraints must be validated sequencially. If one
> constraint fails, the following sets must be ignored
>
> for (Set<ConstraintDescriptor> subConstraints :
> sequencedConstraints) {
> if ( validate(subConstraints) == true ) break;
> }
>
> The API added is
> ElementDescriptor#getConstraintDescriptorsForGroups(Class<?>...
> groups)
>
> Optionally we can add
> ElementDescriptor#hasConstraints(Class<?>... groups)
>
> Pro: The metadata is never "contextual" hence has less risk of error.
> Con: Needs to pass the groups at several levels
>
> #2 define the API at the validator level (more aligned with
> Gerhard's proposal but supporting sequences)
>
> List<BeanDescriptor> sequencedDescriptors =
> validator.getConstraintsForClassAndGroups(Address.class,
> Complete.class);
> for (BeanDescriptor bd : sequencedDescriptor) {
> PropertyDescriptor pd = bd. getConstraintsForProperty("street1");
> Set<ConstraintDescriptor> subConstraints =
> pd.getConstraintDescriptors();
> if ( validate(subConstraints) == true ) break;
> }
>
> Pro: The metadata is "contextual" hence people can become confused
> Con: once contextualized, the API looks the same as the non
> contextual one
>
> Questions:
> - should we use List to represent a sequence r should we have a
> dedicated object implementing Iterable
> - we need to ensure that a given group always return the same
> sequencing regardless of the element (bean or property). This is
> necessary to ensure that one can validate all bean and property
> level constraints (and potentially the associated object) before
> going to the next sequence. How can we formalize that?
>
> WDYT of all that?
15 years, 7 months
DefaultLoadEventListener.assembleCacheEntry several times slower than fetching state from 2nd level cache
by Nikita Tovstoles
I am about to file a bug but wanted to run this by the experts first.
Let's say there is a User entity class that declares 20 lazy collections much like so:
@OneToMany( mappedBy = "owner")
@Cascade({CascadeType.REMOVE, CascadeType.DELETE_ORPHAN})
@LazyCollection(LazyCollectionOption.TRUE)
@BatchSize(size = 1000)
@Cache(usage = CacheConcurrencyStrategy.TRANSACTIONAL)
public Set<CustomSpace> getCustomSpaces()
In Hibernate 3.3.1.GA if I load ~1000 User entities from second level cache (w/o accessing any of the collections), I'll end up spending 5x time assembling cache entries vs. actually loading de-hydrated state from the cache provider (profiler snapshot attached). Most of the time is spent eagerly assembling CollectionTypes, all of which may or may not be needed during a given Session.
So, basically, I'm being penalized for merely declaring OneToMany's. Currently, the only way to avoid this penalty - as far as I can tell - is to complicate the domain model by introducing some number of new entities that will be OneToOne to User; and redistributing collection declarations amongst those. But even that may not help: what if services using the domain model access a variety of collections at different times? IMO a much more scalable approach would be to:
* stash serialized states representing collection properties during assembleCacheEntry
* call CollectionType.assemble lazily when collection property field is actually accessed (a la lazy properties)
Fwiw, a post covering the above in a little more detail can be found here:
https://forum.hibernate.org/viewtopic.php?p=2412263#p2412263
-nikita
15 years, 7 months
Re: [jbosscache-dev] JBoss Cache Lucene Directory
by Sanne Grinovero
Hello,
I'm forwarding this email to Emmanuel and Hibernate Search dev, as I
believe we should join the discussion.
Could we keep both dev-lists (jbosscache-dev(a)lists.jboss.org,
hibernate-dev(a)lists.jboss.org ) on CC ?
Sanne
2009/4/29 Manik Surtani <manik(a)jboss.org>:
>
> On 27 Apr 2009, at 05:18, Andrew Duckworth wrote:
>
>> Hello,
>>
>> I have been working on a Lucene Directory provider based on JBoss Cache,
>> my starting point was an implementation Manik had already written which
>> pretty much worked with a few minor tweaks. Our use case was to cluster a
>> Lucene index being used with Hibernate Search in our application, with the
>> requirements that searching needed to be fast, there was no shared file
>> system and it was important that the index was consistent across the cluster
>> in a relatively short time frame.
>>
>> Maniks code used a token node in the cache to implement the distributed
>> lock. During my testing I set up multiple cache copies with multiple threads
>> reading/writing to each cache copy. I was finding a lot of transactions to
>> acquire or release this lock were timing out, not understanding JBC well I
>> modified the distributed lock to use JGroups DistrubutedLockManager. This
>> worked quite well, however the time taken to acquire/release the lock (~100
>> ms for both) dwarfed the time to process the index update, lowering
>> throughput. Even using Hibernate Search with an async worker thread, there
>> was still a lot of contention for the single lock which seemed to limit the
>> scalability of the solution. I thinkl part of the problem was that our use
>> of HB Search generates a lot of small units of work (remove index entry, add
>> index entry) and each of these UOW acquire a new IndexWriter and new write
>> lock on the underlying Lucene Directory implementation.
>>
>>
>> Out of curiosity, I created an alternative implementation based on the
>> Hibernate Search JMS clustering strategy. Inside JBoss Cache I created a
>> queue node and each slave node in the cluster creates a separate queue
>> underneath where indexing work is written:
>>
>> /queue/slave1/[work0, work1, work2 ....]
>> /slave2
>> /slave3
>>
>> etc
>>
>> In each cluster member a background thread runs continuously when it wakes
>> up, it decides if it is the master node or not (currently checks if it is
>> the view coordinator, but I'm considering changing it to use a longer lived
>> distributed lock). If it is the master it merges the tasks from each slave
>> queue, and updates the JBCDirectory in one go, it can safely do this with
>> only local VM locking. This approach means that in all the slave nodes they
>> can write to their queue without needing a global lock that any other slave
>> or the master would be using. On the master, it can perform multiple updates
>> in the context of a single Lucene index writer. With a cache loader
>> configured, work that is written into the slave queue is persistent, so it
>> can survive the master node crashing with automatic fail over to a new
>> master meaning that eventually all updates should be applied to the index.
>> Each work element in the queue is time stamped to allow them to be processed
>> in order (requires!
>> time synchronisation across the cluster) by the master. For our workload
>> the master/slave pattern seems to improve the throughput of the system.
>>
>>
>> Currently I'm refining the code and I have a few JBoss Cache questions
>> which I hope you can help me with:
>>
>> 1) I have noticed that under high load I get LockTimeoutExceptions writing
>> to /queue/slave0 when the lock owner is a transaction working on
>> /queue/slave1 , i.e. the same lock seems to be used for 2 unrelated nodes in
>> the cache. I'm assuming this is a result of the lock striping algorithm, if
>> you could give me some insight into how this works that would be very
>> helpful. Bumping up the cache concurrency level from 500 to 2000 seemed to
>> reduce this problem, however I'm not sure if it just reduces the probability
>> of a random event of if there is some level that will be sufficient to
>> eliminate the issue.
>
> It could well be the lock striping at work. As of JBoss Cache 3.1.0 you can
> disable lock striping and have one lock per node. While this is expensive
> in that if you have a lot of nodes, you end up with a lot of locks, if you
> have a finite number of nodes this may help you a lot.
>
>> 2) Is there a reason to use separate nodes for each slave queue ? Will it
>> help with locking, or can each slave safely insert to the same parent node
>> in separate transactions without interfering or blocking each other ? If I
>> can reduce it to a single queue I thin that would be a more elegant
>> solution. I am setting the lockParentForChildInsertRemove to false for the
>> queue nodes.
>
> It depends. Are the work objects attributes in /queue/slaveN ? Remember
> that the granularity for all locks is the node itself so if all slaves write
> to a single node, they will all compete for the same lock.
>
>> 3) Similarly, is there any reason why the master should/shouldn't take
>> responsibility for removing work nodes that have been processed ?
>
> Not quite sure I understand your design - so this distributes the work
> objects and each cluster member maintains indexes locally? If so, you need
> to know when all members have processed the work objects before removing
> these.
>
>> Thanks in advance for help, I hope to make this solution general purpose
>> enough to be able to contribute back to Hibernate Search and JBC teams.
>
> Thanks for offering to contribute. :-) One other thing that may be of
> interest is that I just launched Infinispan [1] [2] - a new data grid
> product. You could implement a directory provider on Infinispan too - it is
> a lot more efficient than JBC at many things, including concurrency. Also,
> Infinispan's lock granularity is per-key/value pair. So a single
> distributed cache would be all you need for work objects. Also, another
> thing that could help is the eager locking we have on the roadmap [3] which
> may make a more traditional approach of locking + writing indexes to the
> cache more feasible. I'd encourage you to check it out.
>
> [1] http://www.infinispan.org
> [2]
> http://infinispan.blogspot.com/2009/04/infinispan-start-of-new-era-in-ope...
> [3] https://jira.jboss.org/jira/browse/ISPN-48
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
>
15 years, 7 months