[teiid-issues] [JBoss JIRA] (TEIID-5783) OUTER JOIN Yields Wrong Results When Used With GROUP BY, CASE and COUNT DISTINCT

Dmitrii Pogorelov (Jira) issues at jboss.org
Mon Jun 24 05:37:00 EDT 2019


     [ https://issues.jboss.org/browse/TEIID-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitrii Pogorelov updated TEIID-5783:
-------------------------------------
    Steps to Reproduce: 
1. Add file resource adapter in standalone-teiid.xml:
{code:xml}
                <resource-adapter id="file">
                    <module slot="main" id="org.jboss.teiid.resource-adapter.file"/>
                    <connection-definitions>
                        <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/files" enabled="true" pool-name="files">
                            <config-property name="AllowParentPaths">
                                true
                            </config-property>
                            <config-property name="ParentDirectory">
                                C:/testdata
                            </config-property>
                        </connection-definition>
                    </connection-definitions>
                </resource-adapter>
{code}

2. Add MySQL database configuration in standalone-teiid.xml:
{code:xml}
                <datasource jndi-name="java:/test_dwh_my" pool-name="test_dwh_my" enabled="true" use-java-context="true">
                    <connection-url>jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull</connection-url>
                    <driver>mysql</driver>
                    <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>70</max-pool-size>
                    </pool>
                    <security>
                        <user-name>root</user-name>
                        <password>XXXXXX</password>
                    </security>
                    <validation>
                        <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                    </validation>
                    <timeout>
                        <blocking-timeout-millis>120000</blocking-timeout-millis>
                        <idle-timeout-minutes>5</idle-timeout-minutes>
                    </timeout>
                </datasource>
{code}

3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource:
{code:xml}
    <model name="dwh_my">
        <property name="importer.useFullSchemaName" value="false"/>
        <property name="importer.tableTypes" value="TABLE,VIEW"/>
        <property name="importer.importKeys" value="false"/>
        <source name="test_dwh_my" translator-name="mylobs" connection-jndi-name="java:/test_dwh_my"/>
    </model>

    <translator name="mylobs" type="mysql5">
        <property name="CopyLobs" value="true" />
        <property name="SupportsNativeQueries" value="true"/>
    </translator>
{code}

4. Configure in the test-vdb.xml the following virtual view:
{code:xml}
    <model visible = "true" type = "VIRTUAL" name = "views">
        <metadata type = "DDL"><![CDATA[
          create view vLeft as
SELECT
"csv_table"."created_at",
"csv_table"."order_id",
"csv_table"."store_id" 
FROM
(call files.getFiles('test1.csv')) f,
	TEXTTABLE(to_chars(f.file,'UTF-8') 
		COLUMNS 
		"created_at" STRING ,
		"order_id" STRING ,
		"store_id" STRING 
		DELIMITER ';' 
		QUOTE '"' 
		HEADER 1 
	)
"csv_table"
        ]]>
        </metadata>
    </model>
{code}

5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types).

6. Run the statement below (rebuy_check column contains no NULLs, that's correct):
{code:sql}
select 
count( a."order_id") anzahl_orders, 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}

7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect):
{code:sql}
select 
count(distinct a."order_id") anzahl_orders, 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}

8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect):
{code:sql}
select 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}

  was:
1. Add file resource adapter in standalone-teiid.xml:
{code:xml}
                <resource-adapter id="file">
                    <module slot="main" id="org.jboss.teiid.resource-adapter.file"/>
                    <connection-definitions>
                        <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/files" enabled="true" pool-name="files">
                            <config-property name="AllowParentPaths">
                                true
                            </config-property>
                            <config-property name="ParentDirectory">
                                C:/testdata
                            </config-property>
                        </connection-definition>
                    </connection-definitions>
                </resource-adapter>
{code}

2. Add MySQL database configuration in standalone-teiid.xml:
{code:xml}
                <datasource jndi-name="java:/test_dwh_my" pool-name="test_dwh_my" enabled="true" use-java-context="true">
                    <connection-url>jdbc:mysql://localhost:3306/dwh?zeroDateTimeBehavior=convertToNull</connection-url>
                    <driver>mysql</driver>
                    <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>70</max-pool-size>
                    </pool>
                    <security>
                        <user-name>root</user-name>
                        <password>XXXXXX</password>
                    </security>
                    <validation>
                        <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                    </validation>
                    <timeout>
                        <blocking-timeout-millis>120000</blocking-timeout-millis>
                        <idle-timeout-minutes>5</idle-timeout-minutes>
                    </timeout>
                </datasource>
                <datasource jndi-name="java:/test" pool-name="test" enabled="true" use-java-context="true">
                    <connection-url>jdbc:mysql://localhost:3306/test_tables?zeroDateTimeBehavior=convertToNull</connection-url>
                    <driver>mysql</driver>
                    <new-connection-sql>set SESSION sql_mode = 'ANSI'</new-connection-sql>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>70</max-pool-size>
                    </pool>
                    <security>
                        <user-name>root</user-name>
                        <password>XXXXXX</password>
                    </security>
                    <validation>
                        <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLExceptionSorter"/>
                    </validation>
                    <timeout>
                        <blocking-timeout-millis>120000</blocking-timeout-millis>
                        <idle-timeout-minutes>5</idle-timeout-minutes>
                    </timeout>
                </datasource>
{code}

3. Add in test-vdb.xml java:/test_dwh_my configured in previous step as datasource:
{code:xml}
    <model name="dwh_my">
        <property name="importer.useFullSchemaName" value="false"/>
        <property name="importer.tableTypes" value="TABLE,VIEW"/>
        <property name="importer.importKeys" value="false"/>
        <source name="test_dwh_my" translator-name="mylobs" connection-jndi-name="java:/test_dwh_my"/>
    </model>

    <translator name="mylobs" type="mysql5">
        <property name="CopyLobs" value="true" />
        <property name="SupportsNativeQueries" value="true"/>
    </translator>
{code}

4. Configure in the test-vdb.xml the following virtual view:
{code:xml}
    <model visible = "true" type = "VIRTUAL" name = "views">
        <metadata type = "DDL"><![CDATA[
          create view vLeft as
SELECT
"csv_table"."created_at",
"csv_table"."order_id",
"csv_table"."store_id" 
FROM
(call files.getFiles('test1.csv')) f,
	TEXTTABLE(to_chars(f.file,'UTF-8') 
		COLUMNS 
		"created_at" STRING ,
		"order_id" STRING ,
		"store_id" STRING 
		DELIMITER ';' 
		QUOTE '"' 
		HEADER 1 
	)
"csv_table"
        ]]>
        </metadata>
    </model>
{code}

5. Load the data from the other file (test2.csv - attached to this issue) into a MySQL table or a MS SQL table (vRight table name in my case, all fields have varchar types).

6. Run the statement below (rebuy_check column contains no NULLs, that's correct):
{code:sql}
select 
count( a."order_id") anzahl_orders, 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}

7. Add a DISTINCT to the COUNT (rebuy_check column now contains NULLs, that's incorrect):
{code:sql}
select 
count(distinct a."order_id") anzahl_orders, 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}

8. Remove the COUNT entirely (rebuy_check column now contains NULLs, that's incorrect):
{code:sql}
select 
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) rebuy_check
FROM views.vLeft a
left JOIN dwh_my.vRight b on b.order_id=a.order_id
group by
(case when b."order_id" is not null then 'Direct Rebuy' else 'Standard' end) ;;
{code}



> OUTER JOIN Yields Wrong Results When Used With GROUP BY, CASE and COUNT DISTINCT
> --------------------------------------------------------------------------------
>
>                 Key: TEIID-5783
>                 URL: https://issues.jboss.org/browse/TEIID-5783
>             Project: Teiid
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: 12.0
>         Environment: teiid-12.0.0 on WildFly Full 14.0.1.Final (WildFly Core 6.0.2.Final)
>            Reporter: Dmitrii Pogorelov
>            Assignee: Steven Hawkins
>            Priority: Blocker
>         Attachments: test1.csv, test2.csv
>
>
> Some queries that involve an OUTER JOIN between objects from different data sources, a GROUP BY with a CASE statement and a COUNT DISTINCT yield inconsistent results on a SELECT expression which is identical to the GROUP BY expression with CASE.
> * Reproduced in combination of <File> LEFT JOIN <MySQL> and <File> LEFT JOIN <MS SQL> and the same for the other OUTER JOIN variants.
> * screenshot for the different queries are attached
> * sample data for reproduction is attached
> The same problem happens with LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN.



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the teiid-issues mailing list