[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-5608) Performance problem with cascadeOnFlush

Thu Sep 30 10:59:57 EDT 2010

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38566#action_38566 ] 

mikhailfranco commented on HHH-5608:
------------------------------------

It is not a pure batch process, it's a combination of streaming in event data to be stored, and various derived calculations. At the moment, the operations are interleaved through callbacks, but I suppose they could be separated in some way.

My current approach is try using detach() to evict individual entities from the session cache after they are created, but it involes touching a lot of code.

How can I create a stateless session through JPA ?
I assume I could cast the EntityManager.getDelegate() back to a Hibernate Session if I need to.

I still think the session cache should be bounded in some way, but I realize this would detach application objects at some arbitrary time, behind the application's back, and cause spurious 'detached object' errors, which would force the application to merge() all over the place. Which brings us back to everyone's favorite JPA issue: why does JPA throw a 'detached object' error and not just call merge itself ? Because merge creates a whole new object ... etc. etc.

> Performance problem with cascadeOnFlush 
> ----------------------------------------
>
>                 Key: HHH-5608
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-5608
>             Project: Hibernate Core
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.5.2
>         Environment: Hibernate 3.5.2 JPA 2.0 
> HSQLDB 2.0.0 and PostgreSQL 8.4
> Windows XP 2002
> JDK 1.6.0_11
>            Reporter: mikhailfranco
>
> Performance of adding entities to the database degrades as O(n^2) or worse.
> The problem seems to be in AbstractFlushingEventListener.
> The method prepareEntityFlushes() calls cascadeOnFlush() for every object in the session cache, so cascadeOnFlush ends up being called n^2 times.
> The cascadeOnFlush method seems to be very slow, even when there is nothing to do, i.e. no actual cascading. It appears to create a new Cascade object and do lots of work in cascade() for every invocation.
> Also see the description given in this thread:
> http://www.mail-archive.com/nhusers@googlegroups.com/msg14727.html
> > I think there is a problem with our mapping or something with how
> > we're using NHibernate.  Whenever we've done performance testing we've
> > always noticed that NH takes up a large amount of time and usually
> > it's a ridiculous amount of time based on how little data should be
> > saved.
> > 
> > I started doing some profiling to try to see what we're doing wrong
> > and I noticed during one test that there were 50 calls to
> > PrepareEntityFlushes which is an acceptable amount of calls but then I
> > noticed it resulted in 584,441 calls to CascadeOnFlush.
> >
> > So my question is does this number seem a bit excessive 
> > or is this normal?
> Clearly this is really very excessive !
> He goes on to say:
> > I've made another interesting discovery.  The problem was actually
> > caused by the fact we were calling a Flush() after every SaveOrUpdate
> > (silly way to try to make sure we had an Id -- we're handling things
> > differently now).  By switching to AutoFlush we were able to take
> > something that took an operation that took 2 hours (and spent +90% of
> > the time in NH code) and lower it to 4 minutes. 
> In my case, I do not call flush(), but add lots of entities in transactions. The profiler shows 800 calls to commit, and 85,000 calls to cascadeOnFlush. There are no cascade relationships declared for the entities being persisted. 
> Not sure what the solution could be, probably some combination of:
>  - making cascade check more efficient, don't create objects, 
>    truncate cascade quickly if there is no work to do, etc.
>  - limit the size of the cache, use bounded cache, maybe LRU algorithm, etc.
>  - give some kind of diagnostic warning if a large cache is configured, 
>    with lots of cascade relationships in the ORM

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira