[teiid-issues] [JBoss JIRA] (TEIID-2429) Large sort performance

Thu Mar 14 19:23:41 EDT 2013

    [ https://issues.jboss.org/browse/TEIID-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761287#comment-12761287 ] 

Steven Hawkins commented on TEIID-2429:
---------------------------------------

The initial commit addressed quite a few issues/improvements:
1. improved the size utility for object and lob estimates
2. improved TupleBuffer to expose a better estimate of batch memory size, which is used by SortUtility
3. Eliminated the needless trace log produced by tuple sources during SortUtility merge
4. effectively doubled the default maxProcessingKb to be inline with 7.4 sizing ~16 batches (7.5 switched to a default 256 row batch, but left the max processing default at approximately 8 batches).  Will add a note to the release notes.
5. Changed initial Sortutility processing to better distinguish between all relevant cases - pre-buffer by various amounts except when the incoming tuple source has already been buffered.
6. Changed the SortUtility buffer reservations to match the number of possible merge passes that need performed and force a hard limit of 3 passes.
7. Corrected the reserve additional buffer logic in BufferManager.  The existing logic was using an incorrect calculation, which meant that more buffer space was typically not granted - that coupled with the lack of prebuffering/creation of extra intermediate sort tuplebuffers meant that more passes were being performed than needed.  Also reintroduced the concept of a blocked on memory exception to with some protections on spin/aging to schedule work when there is memory pressure.

A performance test will be rounded out as well in the integration test suite.

> Large sort performance
> ----------------------
>
>                 Key: TEIID-2429
>                 URL: https://issues.jboss.org/browse/TEIID-2429
>             Project: Teiid
>          Issue Type: Quality Risk
>          Components: Query Engine
>    Affects Versions: 7.4
>            Reporter: Steven Hawkins
>            Assignee: Steven Hawkins
>             Fix For: 8.4
>
>
> Large sorts (high data volume over above several hundred thousand rows) experience a disproportionate performance degradation as the data set grows larger.
> This is due to the SortUtility default collection strategy that will create intermediate sort buffers too proactively. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira