2 is a strange default. If 1 is not optimal, it proves that I was wrong when theorizing that 1 could consistently be better than any other value, but then again which value you want in each specific situation is a crypto puzzle.
For example I could challenge you to repeat the above tests with:
Batch size: 25 Fetching threads: 8 Load threads: 8 RootIndexing threads: 1
I could bet that you would cross the 18 minutes record with some combination of these.. very likely actually as you're supposed to over-dimension your CPUs significantly to compensate for the JDBC loading delay (unless it's a local RDBMS?)
(BTW to tune the different options I use a profiler and look at which threads are starving for work and which for JDBC loads vs CPU usage : if you don't have 100% CPU usage you know you can still improve something, or that you have too many threads)
My reasoning to suggest 1 as a default is to be conservative on the JDBC connections; correct, it might not be the faster of all options but I suspect it's quite close to a reasonable performance figure.. and really there is so much that needs to be set right anyway that I don't think we can just achieve best performance with defaults anyway. (unless you have some good idea about that.. I'd love to).
|