[teiid-issues] [JBoss JIRA] (TEIID-5680) Improve performance of odata expand operations

Mon Mar 11 14:02:00 EDT 2019

    [ https://issues.jboss.org/browse/TEIID-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706781#comment-13706781 ] 

Steven Hawkins commented on TEIID-5680:
---------------------------------------

With 11.2.x the rewritten odata query is also as a join:

/*+ cache(ttl:300000 scope:USER) */ SELECT g0.idDiaryEntry, g0.AmountInG, X__1.expr1 AS expr3 FROM my_nutri_diary.Diary AS g0 LEFT OUTER JOIN (SELECT ARRAY_AGG((g1.idCode, g1.product_name, g1.brands, g1.energy_100g) ORDER BY g1.idCode) AS expr1, g1.idCode FROM my_nutri_diary.FDBProducts AS g1 GROUP BY g1.idCode) AS X__1 ON g0.fkIdProductCode = X__1.idCode WHERE (g0.AddedDateTime >= ?) AND (g0.AddedDateTime <= ?) AND (g0.MealNumber = ?) ORDER BY g0.idDiaryEntry LIMIT 100

And with the cardinality hints, does produce an acceptable plan.

>  I mean does Teiid behave differently with different orders of magnitude for the cardinality or is there just something like small or large table depending on a given threshold? 

There is some behavioral difference for "small" sizes - less than a single batch, typically 256 rows.  Beyond that relative approximate sizes are all that is needed.  Costing routines above the source node level are not based upon full column level histograms, but you can refine things further by setting the column level DISTINCT_VALUES, NULL_VALUE_COUNT, MAX_VALUE, and MIN_VALUE values as well.  Generally just setting the table cardinality is all that is sufficient to correctly influence join planning.

>  Is this something I need to update in the lifecycle when tables grow larger or do I just make an educated guess how the future will look like?

If the VDB imports the metadata itself, then it will pick up fresh estimates at import time - which can be triggered by either not caching the source metadata or deleting the metadata and reloading the vdb.

If the VDB specifies the metadata there is not a built-in facility yet that will attempt to update it's costing statistics at runtime.  There are several facilities for that including a custom metadata repository and the alter statement that can be run on an ephemeral basis without a metadata repository to set the cardinality of table.  In our openshift environment we will likely implement runtime update of costing metadata from source and from query results as we'll have a well defined persistent store handy.

So if you fully specify the metadata as ddl, then it may need updated if the relative sizes are no longer representative.

> Improve performance of odata expand operations
> ----------------------------------------------
>
>                 Key: TEIID-5680
>                 URL: https://issues.jboss.org/browse/TEIID-5680
>             Project: Teiid
>          Issue Type: Enhancement
>          Components: OData
>            Reporter: Christoph John
>            Assignee: Steven Hawkins
>            Priority: Major
>         Attachments: test2.txt
>
>
> Hello Ramesh and Steven,
> this is a follow up regarding an observation in the discussion from TEIID-5643. I thought I open an extra issue for the topic as this seems not to be related to TEIID-5500. 
> As you already know, I am using SAPUI5 as frontend for ODATA requests. SAPUI5 supports binding of a user interface control group (like a list with its list items) to a single ODATA path at a time only. If the control group items require additional information which is stored in a different table in the database, I have to expand those parameters in the odata query.
> When doing so, I am running in a serious performance issue with TEIID, which would render the approach of using sapui5 with Teiid infeasible if we cannot find a way to speedup the issue. At the moment I have a small table with entries (table Diary with about 20 records) for which the query extracts several items (just a single one in the example given below). Now the filtered item is expanded with data from a larger table in the database (FDBProducts with about 680.000 records). The whole query takes about 15s to be processed. The query is given as:
> https://morpheus.fritz.box/odata4/svc/my_nutri_diary/Diary?$select=AmountInG,idDiaryEntry&$expand=fkDiaryToFDBProducts($select=brands,energy_100g,idCode,product_name)&$filter=AddedDateTime%20ge%202019-03-06T00:00:00%2B01:00%20and%20AddedDateTime%20le%202019-03-07T00:00:00%2B01:00%20and%20MealNumber%20eq%20%270%27&$skip=0&$top=100
> I checked the output when using
>  <logger category="org.teiid.CONNECTOR"><level name="TRACE"/></logger>
> This shows the problem. It seems the join operation is not pushed down to the database but the data are rather joined within Teiid. Teiid therefore downloads the entire dataset of the large FDBProducts table, which makes the expand approach infeasible for real world datasets with a certain size. So  my question is, if you can modify Teiid to push down the entire join operation to the underlaying database (I assume this would be the most efficient approach), or alternatively query just the items from the table to be joined which where filtered from the first table if the first option is not possible?
> Thanks for your help.
>  Christoph

--
This message was sent by Atlassian Jira
(v7.12.1#712002)