Message Title

Each time we have to fetch a new block of data, we're idling in wait time and that's a waste of CPU. +1 for aggressive parallelism by default, we'll have to trust the implementation to know what's best and define a "within reason" threshold?

Assuming that the default (as mandated by the spec) gets capped to "within reason" automatically, I think that's actually good. If it's a smart implementation it won't actually spawn that same amount of threads, or at least try to not violate the system security limits. If it's not so smart, people will have to configure it

About fetch sizes & partittions: yes I think they should match. It looks like the batch standard is suggesting us to consider each input document as "one element" aka "one partition" as they are indeed independent and potentially processed in parallel. The only reason we go for a larger granularity is to optimise for larger fetch sizes.

Remember that this tunable is meant primarily to match the preferred payload sizes of the network infrastructure, and in extreme cases to not go beyond buffer pages of the rdbms to be able to perform join operations efficiently. It's not very related to our capability to perform parallel processing, other than say not to violate the number of maximum connections allowed the the data source. Unfortunately connection pools don't provide a standard API to let us know such parameters so we need to expose the option. In terms of defaults, I'm afraid we'll need to experiment and put down a wild guess. `10` is the traditional default in Hibernate; it's totally made up but it will do until someone can do better testing.

Add Comment

Get JIRA notifications on your phone! Download the JIRA Cloud app here.

This message was sent by Atlassian JIRA