[
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-598?pag...
]
Sanne Grinovero edited comment on HSEARCH-598 at 4/1/11 4:48 AM:
-----------------------------------------------------------------
Hi Adrien, thanks for all the insight.
I don't think it's safe to open the connection in a different thread, as you say
some containers might have issues with that; I'm not sure I could ask for opinions on
this but it doesn't feel right, let's explore alternatives first.
Also remember that opening a Session doesn't open a connection, and that a Session
isn't designed to be passed to a different thread - so even hacking in an eager
connection acquire there might be other issues.
About your case, you have 34 different indexed entities? I think that reducing each worker
to a single thread is likely going to hurt overall performance, did you consider launching
the MassIndexer on each type sequentially? You can have it reindex a single type only -
wondering if we should rather add an option to have it perform the different indexations
in sequence, or in a maximum configurable parallelism. A patch would be awesome.
Other possible approaches:
* Have a background thread monitor advancement, possible killing off threads when a lock
is spotted, so dynamically reducing parallelism to find a sweet spot.
* Have the jobs use a comment Executor, to control global thread amounts (requires some
care to have the same pipeline fully active)
* Revert the design in such a way that each stage is able to handle all different
entities - shouldn't be hard, and my original design was like that but it had a small
performance overhead compared to the current pipelines.
Could be fun to play with some work-stealing pattern, in which each thread starts working
on the first part of the pipeline to switch stealing work from the subsequent phases when
the output queue of the first stage is full. This way we could have a single pool of
threads working on the first part only, and doing some of the subsequent work when he
can't make progress in the first stage -> no deadlock possible, likely even better
performance assuming the context switch decision can be written with efficient code.
was (Author: sanne):
Hi Adrien, thanks for all the insight.
I don't think it's safe to open the connection in a different thread, as you say
some containers might have issues with that; I'm not sure I could ask for opinions of
colleagues but it doesn't feel right, let's explore alternatives first.
Also remember that opening a Session doesn't open a connection, and that a Session
isn't designed to be passed to a different thread - so even hacking in an eager
connection acquire there might be other issues.
About your case, you have 34 different indexed entities? I think that reducing each worker
to a single thread is likely going to hurt overall performance, did you consider launching
the MassIndexer on each type sequentially? You can have it reindex a single type only -
wondering if we should rather add an option to have it perform the different indexations
in sequence, or in a maximum configurable parallelism. A patch would be awesome.
Other possible approaches:
* Have a background thread monitor advancement, possible killing off threads when a lock
is spotted, so dynamically reducing parallelism to find a sweet spot.
* Have the jobs use a comment Executor, to control global thread amounts (requires some
care to have the same pipeline fully active)
* Revert the design in such a way that each stage is able to handle all different
entities - shouldn't be hard, and my original design was like that but it had a small
performance overhead compared to the current pipelines.
Could be fun to play with some work-stealing pattern, in which each thread starts working
on the first part of the pipeline to switch stealing work from the subsequent phases when
the output queue of the first stage is full. This way we could have a single pool of
threads working on the first part only, and doing some of the subsequent work when he
can't make progress in the first stage -> no deadlock possible, likely even better
performance assuming the context switch decision can be written with efficient code.
MassIndexer freezes when pool size is too low
---------------------------------------------
Key: HSEARCH-598
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-598
Project: Hibernate Search
Issue Type: Bug
Components: documentation, massindexer
Affects Versions: 3.2.1
Environment: Hibernate Core 3.5.4, PostgreSQL 8.4.4
Reporter: I D
Assignee: Sanne Grinovero
Fix For: 3.2.2, 3.4.0
In our application we use db connection pooling at the servlet container level - Jetty
instantiates a com.mchange.v2.c3p0.ComboPooledDataSource. We've disabled
Hibernate's connection pooling to avoid multiple connection pools.
Soon after starting to use MassIndexer we noticed that it SOMETIMES freezes during
operation - startAndWait() just hangs indefinitely. After some experimentation, we
realized that during this freeze the connection pool is maxed out and all the 15
connections (c3p0's default value for maxPoolSize is 15) are active.
We therefore experimented with various values for maxPoolSize and found that 10 or less
always seems to cause freezes, whereas 20 or more seems to work fine consistently. In
between is a grey area, where the freeze occurs inconsistently (this grey area may of
course extend to maxPoolSize<=10 and/or maxPoolSize>=20, since our tests only
provide a partial statistical sample).
If this is expected behavior, the minimal pool size / number of required connections
should be well documented.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira