Re: [teiid-dev] alternative to the text connector / translator

Monday, 24 May 2010

Yes, we would still support wild cards.

A more detailed text example:

There would still be a file resource adapter and an associated text translator exposing
procedures, such as ResultSet(file clob) getTextFiles(string path) - note that this
assumes an encoding.

To return all text files in the parent directory configured for the file resource adapter,
you would execute:

call modelname.getTextFiles(*.txt) 

By having the parent directory configured as a resource adapter, the translator/vdb do not
need to know about absolute paths.  However if you do not use the wildcard, you may still
need to specify the relative file name.

Today I'm adding nested table functionality that is an alternative to our current
procedural relational logic and that's more inline with features in oracle/db2.  This
feature allows for the use of correlated references to entries in the from clause
proceeding the nested table.

For example:

select proc.* from t, table(call proc(t.col1 as param1)) proc where t.col2 =
'foo'

Note that t.col1 is being used as a named parameter to the procedure rather than our
current approach, which would look something like:

select proc.* from t, proc where t.col2 = 'foo' and t.col1 = proc.param1

There are 3 major drawbacks to the second approach, first the use of defaults is not
allowed (since the expectation is that values are being supplied through criteria), second
null values cannot be "passed" through equality predicates, and lastly since we
don't support tuple in clauses ( t.col1, t.col2 in (...) ) repeated executions are
created off of a Cartesian product of the inputs.  None of these issues exist with nested
tables.

So what I propose adding for text extraction functionality is a texttable table function
that can implicitly be a nested table with the use of the table keyword:

texttable(file COLUMNS column [, column]* [DELIMITER string] [ESCAPE string] [HEADER
[int]] [SKIP int])

column := name datatype [ WIDTH int ]

If a width is specified for each column, then we'll use fixed width parsing and assume
that the columns are listed positionally.  Specifying a delimiter/escape/header would
raise an error.

DELIMITER defaults to ','
ESCAPE defaults to '"'

If HEADER is specified, then it's argument (default of 1) indicates which line
contains the column names.  It is not necessary to specify SKIP lines if the header is the
row preceding the data.
If SKIP is specified, then that number of rows will be skipped prior to parsing data. 
Should be greater than HEADER.

As a correlated nested table:

select t.* from (call modelname.getTextFiles(*.txt)) f, texttable(f.file COLUMNS string x,
int y SKIP 1) t

This could be used as a user query, put in a transformation, etc.

It will process all files returned by the procedure and look for 2 columns separated by
the default delimiter.

----- Original Message -----
From: "Ken Johnson" <kejohnso(a)redhat.com&gt;
To: "Charles Mosher" <cmosher(a)redhat.com&gt;, "teiid-dev"
<teiid-dev(a)lists.jboss.org&gt;
Sent: Monday, May 24, 2010 12:10:05 PM GMT -06:00 US/Canada Central
Subject: Re: [teiid-dev] alternative to the text connector / translator

+1 on some kind of dynamic path/filename capability.

Charles Mosher wrote:
...
 And would this still support the multiple-files-in-a-directory
use-case?  I.e., can wild-cards be incorporated into these filenames (wherever they are
stored)?  Perhaps using a function as suggested would be better for this example, where
logic could be used to determine the file/pathnames at query time.

 Best regards,

 Chuck

 ----- Original Message -----
 From: "John Doyle" <jdoyle(a)redhat.com&gt;
 To: "Steven Hawkins" <shawkins(a)redhat.com&gt;
 Cc: "teiid-dev" <teiid-dev(a)lists.jboss.org&gt;
 Sent: Friday, May 21, 2010 11:11:43 AM GMT -05:00 US/Canada Eastern
 Subject: Re: [teiid-dev] alternative to the text connector / translator

 ----- "Steven Hawkins" <shawkins(a)redhat.com&gt; wrote:

> Assuming the use of a simple file translator/ resource adapter
> procedure to get files, the downside to either approach is that
> relative file names would now be stored on either the model or in a
> function.
>
> I support the second, since it completely removes the need for a
> heavy-weight text translator and is feasible before RC1.
>
>     

 I like the second solution as well.  It's consistent with the move made with XML. 
But why relative paths, why not absolute paths?    

> Steve
>
>
> _______________________________________________
> teiid-dev mailing list
> teiid-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/teiid-dev
>     
 _______________________________________________
 teiid-dev mailing list
 teiid-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/teiid-dev
 _______________________________________________
 teiid-dev mailing list
 teiid-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/teiid-dev

-- 
Ken Johnson
Sr. Product Manager
JBoss Middleware Business Unit
Red Hat, Inc
978.392.3917
ken.johnson(a)redhat.com

_______________________________________________
teiid-dev mailing list
teiid-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/teiid-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009