[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file

Wednesday, 9 November 2011

    [
https://issues.jboss.org/browse/TEIID-1819?page=com.atlassian.jira.plugin...
] 

Peter Larsen commented on TEIID-1819:
-------------------------------------

Yes - the selector can be done in a lot of ways. I've even had cases, where records
had special characters at the end as a "continuation marker". 

I think the secret to success is focusing on performance. I don't know if regex is the
best way although it gives the biggest flexibility. The systems I've dealth with over
25 years have usually used a simple string or character to indicate the record type - a
simple "equal" or "not equal" is usually enough.

Other generic issues with comma separated files is quoted vs. non quoted, and date/time
masks. Finally, being able to derive values as expression from other columns is sometimes
necessary to ensure values don't exceed subscript ranges etc (ie. setting the value as
a substring of the value read to ensure a maximum length is kept). I've also used
expressions to refer to values in parent records.

I want to be sure not to overdo the data format "extensions" since we can simply
use the VDB model to late create derived values much easier and faster.

...
 Reading multi entity data from a single data file
 -------------------------------------------------

                 Key: TEIID-1819
                 URL: https://issues.jboss.org/browse/TEIID-1819
             Project: Teiid
          Issue Type: Feature Request
          Components: Query Engine
    Affects Versions: 7.6
         Environment: Any
            Reporter: Peter Larsen
            Assignee: Steven Hawkins

 A common problem for data files is the concept of multiple data sets inclosed in the same
file. An example is a data file of accounts receivable orders. You'll export at least
two logical entities: Orders and OrderLines. Each of the two entities have very different
data sets; the relate (OrderLines belong to a particular Order) and there are a dynamic
number of OrderLines per Order.
 A common way to differentiate is to put a special "record type" selector as the
first field in each record. Ie. A and B. The load program will based on this selector
apply different templates to map the columns, and it will also know that the OrderLines
are associated with the Order above it and create that relation column ID in the out put.
 Example:
 ;selector=A,orderdate,ordernumber,customernumber,ordertotal,ordertax
 ;selector=B,lineno,itemno,description,quantity,priceach,pricetotal
 A,10-dec-2011,12345,3322,3000,222
 B,1,123,Sprockets Black,30,50,1500
 B,2,333,Sprockets Blue,300,5,1500
 A,11-dec-2011,12346,3311,.....
 etc.  
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file