[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Kylin Soong (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Kylin Soong edited comment on TEIID-3219 at 11/20/14 4:45 AM:
--------------------------------------------------------------
I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
> the tricky thing in this test document is, there are many cells, but table is picking up that single cell as column
Assuming a sheet have 10 rows, each row have multiple cells, but all cells in 5th row are null, in the condition, the current excel tranlator only retun the first 4 rows, start from 5th row be discarded, actually we expected result is all 10 rows be returned and all colmuns in row 5 are null, this is what we should resolve in this issue.
So far, I think current fix is fine, no need submitting another patch for regression concern. The current tranlator use poi usermodel api, all rows be loaded to heap then tranlator interator all rows, from tranlator side, the last row should be sheet.getLastRowNum(), no need to determine which is the last.
[1] https://github.com/kylinsoong/data/tree/master/poi-examples-teiid/file
was (Author: kylin):
I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
> I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
> the tricky thing in this test document is, there are many cells, but table is picking up that single cell as column
Assuming a sheet have 10 rows, each row have multiple cells, but all cells in 5th row are null, in the condition, the current excel tranlator only retun the first 4 rows, start from 5th row be discarded, actually we expected result is all 10 rows be returned and all colmuns in row 5 are null, this is what we should resolve in this issue.
[1] https://github.com/kylinsoong/data/tree/master/poi-examples-teiid/file
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Kylin Soong (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Kylin Soong edited comment on TEIID-3219 at 11/20/14 4:23 AM:
--------------------------------------------------------------
I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
> I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
> the tricky thing in this test document is, there are many cells, but table is picking up that single cell as column
Assuming a sheet have 10 rows, each row have multiple cells, but all cells in 5th row are null, in the condition, the current excel tranlator only retun the first 4 rows, start from 5th row be discarded, actually we expected result is all 10 rows be returned and all colmuns in row 5 are null, this is what we should resolve in this issue.
[1] https://github.com/kylinsoong/data/tree/master/poi-examples-teiid/file
was (Author: kylin):
I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
[1] https://github.com/kylinsoong/data/tree/master/poi-examples-teiid/file
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Kylin Soong (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Kylin Soong commented on TEIID-3219:
------------------------------------
I have done some investigation, [1] are the sample sheets I used for testing. The poi parse the excel sheet will trim all null rows both bottom and top, only keep non-null rows as sheet data, if .xlsx file contains any of the formatting for the blank cells, the poi reading is not treating it as null, like test.xlsx in [1].
> take any document, just scroll empty doc and save and test.
I have tested both in windows with Excel 2007 and linux with OpenOffice, only if you scroll cell has style then it may cause empty iterations, like test.xlsx in [1], I have add style left-justified horizontal alignment, the poi reading isn't thinking it not null, all 20 rows be loaded, the cell with null value but have style as below:
~~~
<main:sheetData>
<main:row r="1" spans="1:4">
<main:c r="A1" s="1"/>
<main:c r="B1" s="1"/>
<main:c r="C1" s="1"/>
<main:c r="D1" s="1"/>
</main:row>
~~~
If you scroll a fresh empty sheet, you can't see that behavior.
[1] https://github.com/kylinsoong/data/tree/master/poi-examples-teiid/file
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Ramesh Reddy commented on TEIID-3219:
-------------------------------------
I believe any of the test documents we have should show the behavior. Otherwise they may have had some fixes in 3.10 version in regards to that.
> I thought by look ahead you meant more rows, then we are saying the same thing.
the tricky thing in this test document is, there are many cells, but table is picking up that single cell as column. So, for this case we have information only about that cell, so we can iterate only on that.
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-3219:
---------------------------------------
> take any document, just scroll empty doc and save and test.
I don't see that behavior with OpenOffice. Did you see that with Excel?
> I did mean entire row, not just one column. In this usecase he had only single column, that constitutes as entire row.
Sorry I thought by look ahead you meant more rows, then we are saying the same thing.
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Ramesh Reddy commented on TEIID-3219:
-------------------------------------
>Do you have an example of such a document?
take any document, just scroll empty doc and sane and test.
>That may not be appropriate either.
I did mean entire row, not just one column. In this usecase he had only single column, that constitutes as entire row.
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Ramesh Reddy edited comment on TEIID-3219 at 11/19/14 7:40 PM:
---------------------------------------------------------------
>Do you have an example of such a document?
take any document, just scroll empty doc and save and test.
>That may not be appropriate either.
I did mean entire row, not just one column. In this usecase he had only single column, that constitutes as entire row.
was (Author: rareddy):
>Do you have an example of such a document?
take any document, just scroll empty doc and sane and test.
>That may not be appropriate either.
I did mean entire row, not just one column. In this usecase he had only single column, that constitutes as entire row.
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-3219:
---------------------------------------
> When I tested, I easily went into millions of iterations with out a valid row.
Do you have an example of such a document? From my reading of the api the rows are stored in treemaps and should represent the intended rows of the spreadsheet.
> I suggested, look ahead on the row cursor for follow on column values to check if they are null or not to define a value is null vs it is end of file. I asked him configure the look-ahead number to be configurable.
That may not be appropriate either. If this kind of logic is needed, then it seems like we could be checking the entire row rather than just what is projected to determine if it's terminating (and based upon the above it may be optional behavior, such as "terminate on all blank").
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (TEIID-3219) Excel Translator Wrong Result From Columns With Interleaved Null Values
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-3219?page=com.atlassian.jira.plugin... ]
Ramesh Reddy commented on TEIID-3219:
-------------------------------------
It will allow all empty lines to be treated as valid lines with all null column values, that means for example a excel sheet can have 10 valid rows values and can have any number of rows that are nothing but nulls after that. The translator can keep looping forever as if it is reading valid values. When I tested, I easily went into millions of iterations with out a valid row. The previous code treated the very first line with a all column rows as null as the end of the sheet. That worked, but the current scenario is there are multiple columns in excel sheet with arbitrary column values as nulls, the table defined on the sheet is only extracting the single column back. Whenever it is seeing the null on the column value it is treating it as the end of file.
Kylin removed whole end of file concept, which is problematic. I suggested, look ahead on the row cursor for follow on column values to check if they are null or not to define a value is null vs it is end of file. I asked him configure the look-ahead number to be configurable.
> Excel Translator Wrong Result From Columns With Interleaved Null Values
> -----------------------------------------------------------------------
>
> Key: TEIID-3219
> URL: https://issues.jboss.org/browse/TEIID-3219
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.7.1
> Reporter: Jan Stastny
> Assignee: Kylin Soong
> Fix For: 8.7.1, 8.10
>
> Attachments: artifacts.zip
>
>
> When querying a column which has a null value somewhere in it, the result set is missing all the rows under the null value in the xlsx file. When querying more columns, the result set is missing some values only if there is a row, where both columns contains null value.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month