[teiid-issues] [JBoss JIRA] (TEIID-5680) Improve performance of odata expand operations

Mon Mar 11 08:35:00 EDT 2019

    [ https://issues.jboss.org/browse/TEIID-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706440#comment-13706440 ] 

Steven Hawkins commented on TEIID-5680:
---------------------------------------

This isn't really an expand issue, it's just a general planning question.  At least on master I see this is planned as expected if cardinality metadata is added to FDBProducts and Diary.

The odata layer generates the query:

/*+ cache(ttl:300000 scope:USER) */ SELECT g0.idDiaryEntry, g0.AmountInG, /*+ MJ */ (SELECT ARRAY_AGG((g1.idCode, g1.product_name, g1.brands, g1.energy_100g) ORDER BY g1.idCode) FROM my_nutri_diary.FDBProducts AS g1 WHERE g1.idCode = g0.fkIdProductCode) FROM my_nutri_diary.Diary AS g0 WHERE ((g0.AddedDateTime >= CONVERT(?, timestamp)) AND (g0.AddedDateTime <= CONVERT(?, timestamp))) AND (g0.MealNumber = CONVERT(?, string)) ORDER BY g0.idDiaryEntry LIMIT 0, 100 /* gKjgnveX1UoB */

For 12.0 (TEIID-4498) that is rewritten as a join:

/*+ cache(ttl:300000 scope:USER) */ SELECT g0.idDiaryEntry, g0.AmountInG, X__1.expr1 AS expr3 FROM my_nutri_diary.Diary AS g0 LEFT OUTER JOIN (SELECT ARRAY_AGG((g1.idCode, g1.product_name, g1.brands, g1.energy_100g) ORDER BY g1.idCode) AS expr1, g1.idCode FROM my_nutri_diary.FDBProducts AS g1 GROUP BY g1.idCode) AS X__1 ON g0.fkIdProductCode = X__1.idCode WHERE (g0.AddedDateTime >= ?) AND (g0.AddedDateTime <= ?) AND (g0.MealNumber = ?) ORDER BY g0.idDiaryEntry LIMIT 100

Note that the aggregation is still being performed in the inline view rather than above the join.  This is because the logic is allowing full multiple such aggregations (such as what would be used with sibling expands) rather than seeing is that can be raised higher with more grouping - there is room for improvement there as it would allow for the join to be pushed.

With cardinality metadata on both Diary and FDBProducts, for example:

CREATE FOREIGN TABLE FDBProducts (
...
OPTIONS(NAMEINSOURCE '`FDBProducts`', UPDATABLE 'TRUE', CARDINALITY 600000);

The value doesn't have to be exact, it just needs to convey an approximate row count.  With that I do see a good plan being created - it will use a dependent join to take the key values from Diary and supply them as a predicate to FDBProducts:

 SELECT g_0.idCode, (g_0.idCode, g_0.product_name, g_0.brands, g_0.energy_100g) FROM my_nutri_diary.FDBProducts AS g_0 WHERE g_0.idCode IN (<dependent values>)

There is still an improvement to be had there though.  The rule to push limit values isn't being run until after the determination of the join type is made - we should instead see that Diary is limited to 100 rows and choose a dependent join regardless of the FDBProducts size.

I'd first rerun your scenario with the cardinality metadata, then we can determine if you need additional changes.

> Improve performance of odata expand operations
> ----------------------------------------------
>
>                 Key: TEIID-5680
>                 URL: https://issues.jboss.org/browse/TEIID-5680
>             Project: Teiid
>          Issue Type: Enhancement
>          Components: OData
>            Reporter: Christoph John
>            Assignee: Steven Hawkins
>            Priority: Major
>         Attachments: test2.txt
>
>
> Hello Ramesh and Steven,
> this is a follow up regarding an observation in the discussion from TEIID-5643. I thought I open an extra issue for the topic as this seems not to be related to TEIID-5500. 
> As you already know, I am using SAPUI5 as frontend for ODATA requests. SAPUI5 supports binding of a user interface control group (like a list with its list items) to a single ODATA path at a time only. If the control group items require additional information which is stored in a different table in the database, I have to expand those parameters in the odata query.
> When doing so, I am running in a serious performance issue with TEIID, which would render the approach of using sapui5 with Teiid infeasible if we cannot find a way to speedup the issue. At the moment I have a small table with entries (table Diary with about 20 records) for which the query extracts several items (just a single one in the example given below). Now the filtered item is expanded with data from a larger table in the database (FDBProducts with about 680.000 records). The whole query takes about 15s to be processed. The query is given as:
> https://morpheus.fritz.box/odata4/svc/my_nutri_diary/Diary?$select=AmountInG,idDiaryEntry&$expand=fkDiaryToFDBProducts($select=brands,energy_100g,idCode,product_name)&$filter=AddedDateTime%20ge%202019-03-06T00:00:00%2B01:00%20and%20AddedDateTime%20le%202019-03-07T00:00:00%2B01:00%20and%20MealNumber%20eq%20%270%27&$skip=0&$top=100
> I checked the output when using
>  <logger category="org.teiid.CONNECTOR"><level name="TRACE"/></logger>
> This shows the problem. It seems the join operation is not pushed down to the database but the data are rather joined within Teiid. Teiid therefore downloads the entire dataset of the large FDBProducts table, which makes the expand approach infeasible for real world datasets with a certain size. So  my question is, if you can modify Teiid to push down the entire join operation to the underlaying database (I assume this would be the most efficient approach), or alternatively query just the items from the table to be joined which where filtered from the first table if the first option is not possible?
> Thanks for your help.
>  Christoph

--
This message was sent by Atlassian Jira
(v7.12.1#712002)