[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file

Friday, 11 November 2011

    [
https://issues.jboss.org/browse/TEIID-1819?page=com.atlassian.jira.plugin...
] 

Peter Larsen commented on TEIID-1819:
-------------------------------------

Steven - sorry for the late reply. Fridays are full of meetings for me.

I think I may be misunderstanding things here. Because I thought what I was talking about
was pretty basic relational theory stuff. My assumption will always be that the data file
being read can be read independent of other files. This means I will not need to refer to
other schemas or even tables from within the load descriptor. When I create a record in a
relation manner - and after all that's what we're doing with EDS - I need to be
able to build the data so the query you wrote above can be done. To do joins, I need a
primary key and an a referring foreign key. The data set being imported was NOT generated
by a relational database dump - we cannot assume it will contain those foreign key
columns. In particular because the relationship between selector A and B matters.

All I need to be able to refer to when I read B records, is the current values in the A
record. Why couldn't I use the column name as a reference (like :orderid) to specify
the value for a given column is given by it's parent? The result is, that once inside
EDS, I have the primary and foreign keys available so I can do the required join.

The assumption for the generated file is that there's always going to be a A record
first, followed by 0 or more B records, followed by 0 or more A records (and so on). In
other words, the file is a file of A records with 0 to many child B records.

I've also had (rare) occasions where the A and B records were unrelated. Basically
it's one file with content for more than one table which you define in one operation.
Seed data for instance has often been given to me this way - 10s or 100s of tables with
fixed content, given to me in a single file.

Multi levels could go deeper than two levels - but I cannot remember ever seeing that.

I'm not assuming the presence of header lines at all. They're a great help as the
demo I've seen shows to pre-populate column names.

The challenge is that you cannot simply create two filters, and process the file twice -
once with each filter. The file needs to be read ONCE only. The way I would program the
referral between the records, is simply refer to the current in memory record of A, as I
create B records one after the other.

When data content is specified, you have 3 ways of getting values:

1) Position inside file (either fixed columns, or column number given the separator)
2) Derived - a calculation/function based on existing values
3) Constant

I'm talking about case #2 here. Where I want to be able to refer to the existing
values of separator A. I would propose the design be something that relate the filter to a
table or set name. And when you refer to other values you can write :table.column to read
the value where the table part is optional so simple loads doesn't have to be
complicated.

...
 Reading multi entity data from a single data file
 -------------------------------------------------

                 Key: TEIID-1819
                 URL: https://issues.jboss.org/browse/TEIID-1819
             Project: Teiid
          Issue Type: Feature Request
          Components: Query Engine
    Affects Versions: 7.6
         Environment: Any
            Reporter: Peter Larsen
            Assignee: Steven Hawkins

 A common problem for data files is the concept of multiple data sets inclosed in the same
file. An example is a data file of accounts receivable orders. You'll export at least
two logical entities: Orders and OrderLines. Each of the two entities have very different
data sets; the relate (OrderLines belong to a particular Order) and there are a dynamic
number of OrderLines per Order.
 A common way to differentiate is to put a special "record type" selector as the
first field in each record. Ie. A and B. The load program will based on this selector
apply different templates to map the columns, and it will also know that the OrderLines
are associated with the Order above it and create that relation column ID in the out put.
 Example:
 ;selector=A,orderdate,ordernumber,customernumber,ordertotal,ordertax
 ;selector=B,lineno,itemno,description,quantity,priceach,pricetotal
 A,10-dec-2011,12345,3322,3000,222
 B,1,123,Sprockets Black,30,50,1500
 B,2,333,Sprockets Blue,300,5,1500
 A,11-dec-2011,12346,3311,.....
 etc.  
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[teiid-issues] [JBoss JIRA] (TEIID-1819) Reading multi entity data from a single data file