[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file

Wed Nov 9 23:18:45 EST 2011

    [ https://issues.jboss.org/browse/TEIID-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641625#comment-12641625 ] 

Peter Larsen commented on TEIID-1819:
-------------------------------------

Yes - the selector can be done in a lot of ways. I've even had cases, where records had special characters at the end as a "continuation marker". 

I think the secret to success is focusing on performance. I don't know if regex is the best way although it gives the biggest flexibility. The systems I've dealth with over 25 years have usually used a simple string or character to indicate the record type - a simple "equal" or "not equal" is usually enough.

Other generic issues with comma separated files is quoted vs. non quoted, and date/time masks. Finally, being able to derive values as expression from other columns is sometimes necessary to ensure values don't exceed subscript ranges etc (ie. setting the value as a substring of the value read to ensure a maximum length is kept). I've also used expressions to refer to values in parent records.

I want to be sure not to overdo the data format "extensions" since we can simply use the VDB model to late create derived values much easier and faster.

> Reading multi entity data from a single data file
> -------------------------------------------------
>
>                 Key: TEIID-1819
>                 URL: https://issues.jboss.org/browse/TEIID-1819
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Query Engine
>    Affects Versions: 7.6
>         Environment: Any
>            Reporter: Peter Larsen
>            Assignee: Steven Hawkins
>
> A common problem for data files is the concept of multiple data sets inclosed in the same file. An example is a data file of accounts receivable orders. You'll export at least two logical entities: Orders and OrderLines. Each of the two entities have very different data sets; the relate (OrderLines belong to a particular Order) and there are a dynamic number of OrderLines per Order.
> A common way to differentiate is to put a special "record type" selector as the first field in each record. Ie. A and B. The load program will based on this selector apply different templates to map the columns, and it will also know that the OrderLines are associated with the Order above it and create that relation column ID in the out put.
> Example:
> ;selector=A,orderdate,ordernumber,customernumber,ordertotal,ordertax
> ;selector=B,lineno,itemno,description,quantity,priceach,pricetotal
> A,10-dec-2011,12345,3322,3000,222
> B,1,123,Sprockets Black,30,50,1500
> B,2,333,Sprockets Blue,300,5,1500
> A,11-dec-2011,12346,3311,.....
> etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira