[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file

Fri Nov 11 21:48:45 EST 2011

    [ https://issues.jboss.org/browse/TEIID-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642269#comment-12642269 ] 

Peter Larsen commented on TEIID-1819:
-------------------------------------

Steven - sorry for the late reply. Fridays are full of meetings for me.

I think I may be misunderstanding things here. Because I thought what I was talking about was pretty basic relational theory stuff. My assumption will always be that the data file being read can be read independent of other files. This means I will not need to refer to other schemas or even tables from within the load descriptor. When I create a record in a relation manner - and after all that's what we're doing with EDS - I need to be able to build the data so the query you wrote above can be done. To do joins, I need a primary key and an a referring foreign key. The data set being imported was NOT generated by a relational database dump - we cannot assume it will contain those foreign key columns. In particular because the relationship between selector A and B matters.

All I need to be able to refer to when I read B records, is the current values in the A record. Why couldn't I use the column name as a reference (like :orderid) to specify the value for a given column is given by it's parent? The result is, that once inside EDS, I have the primary and foreign keys available so I can do the required join.

The assumption for the generated file is that there's always going to be a A record first, followed by 0 or more B records, followed by 0 or more A records (and so on). In other words, the file is a file of A records with 0 to many child B records.

I've also had (rare) occasions where the A and B records were unrelated. Basically it's one file with content for more than one table which you define in one operation. Seed data for instance has often been given to me this way - 10s or 100s of tables with fixed content, given to me in a single file.

Multi levels could go deeper than two levels - but I cannot remember ever seeing that.

I'm not assuming the presence of header lines at all. They're a great help as the demo I've seen shows to pre-populate column names.

The challenge is that you cannot simply create two filters, and process the file twice - once with each filter. The file needs to be read ONCE only. The way I would program the referral between the records, is simply refer to the current in memory record of A, as I create B records one after the other.

When data content is specified, you have 3 ways of getting values:

1) Position inside file (either fixed columns, or column number given the separator)
2) Derived - a calculation/function based on existing values
3) Constant

I'm talking about case #2 here. Where I want to be able to refer to the existing values of separator A. I would propose the design be something that relate the filter to a table or set name. And when you refer to other values you can write :table.column to read the value where the table part is optional so simple loads doesn't have to be complicated.

> Reading multi entity data from a single data file
> -------------------------------------------------
>
>                 Key: TEIID-1819
>                 URL: https://issues.jboss.org/browse/TEIID-1819
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Query Engine
>    Affects Versions: 7.6
>         Environment: Any
>            Reporter: Peter Larsen
>            Assignee: Steven Hawkins
>
> A common problem for data files is the concept of multiple data sets inclosed in the same file. An example is a data file of accounts receivable orders. You'll export at least two logical entities: Orders and OrderLines. Each of the two entities have very different data sets; the relate (OrderLines belong to a particular Order) and there are a dynamic number of OrderLines per Order.
> A common way to differentiate is to put a special "record type" selector as the first field in each record. Ie. A and B. The load program will based on this selector apply different templates to map the columns, and it will also know that the OrderLines are associated with the Order above it and create that relation column ID in the out put.
> Example:
> ;selector=A,orderdate,ordernumber,customernumber,ordertotal,ordertax
> ;selector=B,lineno,itemno,description,quantity,priceach,pricetotal
> A,10-dec-2011,12345,3322,3000,222
> B,1,123,Sprockets Black,30,50,1500
> B,2,333,Sprockets Blue,300,5,1500
> A,11-dec-2011,12346,3311,.....
> etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira