[teiid-issues] [JBoss JIRA] (TEIID-4553) COUNT returns inconsistent results when column statistics are not gathered

Salvatore R (JIRA) issues at jboss.org
Wed Nov 2 10:15:00 EDT 2016


Salvatore R created TEIID-4553:
----------------------------------

             Summary: COUNT returns inconsistent results when column statistics are not gathered
                 Key: TEIID-4553
                 URL: https://issues.jboss.org/browse/TEIID-4553
             Project: Teiid
          Issue Type: Bug
            Reporter: Salvatore R
            Assignee: Steven Hawkins


I am experiencing a weird behavior in 9.1.0-Final. Depending if column statistics are gathered or not, a query with COUNT(1) generates different query plans and returns different results.

For example, I defined two simple tables ds1.test_count_1 and ds2.test_count_2 having just a single column and one row with value 1.

The following query:
{code:sql}
select count(1) from ds1.test_count_1 t1 join ds2.test_count_2 t2 on t1.a=t2.a group by t1.a
{code}

returns 4 with statistics and 2 without.

This is the plan generated when statistics are gathered (the correct one):
{code:sql}
ProjectNode
  + Relational Node ID:1
  + Output Columns:expr1 (integer)
  + Statistics:
    0: Node Output Rows: 1
    1: Node Next Batch Process Time: 0
    2: Node Cumulative Next Batch Process Time: 1
    3: Node Cumulative Process Time: 3
    4: Node Next Batch Calls: 2
    5: Node Blocks: 1
  + Cost Estimates:Estimated Node Cardinality: 1.5
  + Child 0:
    GroupingNode
      + Relational Node ID:2
      + Output Columns:agg0 (integer)
      + Statistics:
        0: Node Output Rows: 1
        1: Node Next Batch Process Time: 0
        2: Node Cumulative Next Batch Process Time: 1
        3: Node Cumulative Process Time: 3
        4: Node Next Batch Calls: 2
        5: Node Blocks: 1
      + Cost Estimates:Estimated Node Cardinality: 1.5
      + Child 0:
        JoinNode
          + Relational Node ID:3
          + Output Columns:a (string)
          + Statistics:
            0: Node Output Rows: 4
            1: Node Next Batch Process Time: 0
            2: Node Cumulative Next Batch Process Time: 1
            3: Node Cumulative Process Time: 3
            4: Node Next Batch Calls: 2
            5: Node Blocks: 1
          + Cost Estimates:Estimated Node Cardinality: 2.8284268
          + Child 0:
            AccessNode
              + Relational Node ID:4
              + Output Columns:a (string)
              + Statistics:
                0: Node Output Rows: 2
                1: Node Next Batch Process Time: 0
                2: Node Cumulative Next Batch Process Time: 0
                3: Node Cumulative Process Time: 2
                4: Node Next Batch Calls: 3
                5: Node Blocks: 2
              + Cost Estimates:Estimated Node Cardinality: 2.0
              + Query:SELECT g_0.a FROM ds1.test_count_1 AS g_0
              + Model Name:ds1
          + Child 1:
            AccessNode
              + Relational Node ID:5
              + Output Columns:a (string)
              + Statistics:
                0: Node Output Rows: 2
                1: Node Next Batch Process Time: 1
                2: Node Cumulative Next Batch Process Time: 1
                3: Node Cumulative Process Time: 3
                4: Node Next Batch Calls: 2
                5: Node Blocks: 1
              + Cost Estimates:Estimated Node Cardinality: 2.0
              + Query:SELECT g_0.a FROM ds2.test_count_2 AS g_0
              + Model Name:ds2
          + Join Strategy:ENHANCED SORT JOIN (SORT/SORT)
          + Join Type:INNER JOIN
          + Join Criteria:t1.a=t2.a
      + Grouping Columns:t1.a
      + Grouping Mapping:
        0: anon_grp0.gcol0=t1.a
        1: anon_grp0.agg0=COUNT(1)
      + Sort Mode:false
  + Select Columns:anon_grp0.agg0 AS expr1
  + Data Bytes Sent:17
  + Planning Time:3
{code}

and this is the one without statistics:
{code:sql}
ProjectNode
  + Relational Node ID:2
  + Output Columns:expr1 (integer)
  + Statistics:
    0: Node Output Rows: 1
    1: Node Next Batch Process Time: 0
    2: Node Cumulative Next Batch Process Time: 1
    3: Node Cumulative Process Time: 3
    4: Node Next Batch Calls: 2
    5: Node Blocks: 1
  + Cost Estimates:Estimated Node Cardinality: -1.0
  + Child 0:
    GroupingNode
      + Relational Node ID:3
      + Output Columns:agg0 (integer)
      + Statistics:
        0: Node Output Rows: 1
        1: Node Next Batch Process Time: 0
        2: Node Cumulative Next Batch Process Time: 1
        3: Node Cumulative Process Time: 3
        4: Node Next Batch Calls: 2
        5: Node Blocks: 1
      + Cost Estimates:Estimated Node Cardinality: -1.0
      + Child 0:
        JoinNode
          + Relational Node ID:4
          + Output Columns:gcol0 (string)
          + Statistics:
            0: Node Output Rows: 2
            1: Node Next Batch Process Time: 0
            2: Node Cumulative Next Batch Process Time: 1
            3: Node Cumulative Process Time: 3
            4: Node Next Batch Calls: 2
            5: Node Blocks: 1
          + Cost Estimates:Estimated Node Cardinality: -1.0
          + Child 0:
            AccessNode
              + Relational Node ID:5
              + Output Columns:gcol0 (string)
              + Statistics:
                0: Node Output Rows: 1
                1: Node Next Batch Process Time: 1
                2: Node Cumulative Next Batch Process Time: 1
                3: Node Cumulative Process Time: 3
                4: Node Next Batch Calls: 3
                5: Node Blocks: 2
              + Cost Estimates:Estimated Node Cardinality: -1.0
              + Query:SELECT g_0.a FROM ds1.test_count_1 AS g_0 GROUP BY g_0.a
              + Model Name:ds1
          + Child 1:
            AccessNode
              + Relational Node ID:6
              + Output Columns:a (string)
              + Statistics:
                0: Node Output Rows: 2
                1: Node Next Batch Process Time: 0
                2: Node Cumulative Next Batch Process Time: 0
                3: Node Cumulative Process Time: 3
                4: Node Next Batch Calls: 2
                5: Node Blocks: 1
              + Cost Estimates:Estimated Node Cardinality: -1.0
              + Query:SELECT g_0.a FROM ds2.test_count_2 AS g_0
              + Model Name:ds2
          + Join Strategy:ENHANCED SORT JOIN (SORT/SORT)
          + Join Type:INNER JOIN
          + Join Criteria:anon_grp1.gcol0=t2.a
      + Grouping Columns:anon_grp1.gcol0
      + Grouping Mapping:
        0: anon_grp0.gcol0=anon_grp1.gcol0
        1: anon_grp0.agg0=COUNT(1)
      + Sort Mode:false
  + Select Columns:anon_grp0.agg0 AS expr1
  + Data Bytes Sent:17
  + Planning Time:4
{code}





--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the teiid-issues mailing list