[teiid-dev] alternative to the text connector / translator

Mon May 24 15:51:25 EDT 2010

Yes, we would still support wild cards.

A more detailed text example:

There would still be a file resource adapter and an associated text translator exposing procedures, such as ResultSet(file clob) getTextFiles(string path) - note that this assumes an encoding.

To return all text files in the parent directory configured for the file resource adapter, you would execute:

call modelname.getTextFiles(*.txt) 

By having the parent directory configured as a resource adapter, the translator/vdb do not need to know about absolute paths.  However if you do not use the wildcard, you may still need to specify the relative file name.

Today I'm adding nested table functionality that is an alternative to our current procedural relational logic and that's more inline with features in oracle/db2.  This feature allows for the use of correlated references to entries in the from clause proceeding the nested table.

For example:

select proc.* from t, table(call proc(t.col1 as param1)) proc where t.col2 = 'foo'

Note that t.col1 is being used as a named parameter to the procedure rather than our current approach, which would look something like:

select proc.* from t, proc where t.col2 = 'foo' and t.col1 = proc.param1

There are 3 major drawbacks to the second approach, first the use of defaults is not allowed (since the expectation is that values are being supplied through criteria), second null values cannot be "passed" through equality predicates, and lastly since we don't support tuple in clauses ( t.col1, t.col2 in (...) ) repeated executions are created off of a Cartesian product of the inputs.  None of these issues exist with nested tables.

So what I propose adding for text extraction functionality is a texttable table function that can implicitly be a nested table with the use of the table keyword:

texttable(file COLUMNS column [, column]* [DELIMITER string] [ESCAPE string] [HEADER [int]] [SKIP int])

column := name datatype [ WIDTH int ]

If a width is specified for each column, then we'll use fixed width parsing and assume that the columns are listed positionally.  Specifying a delimiter/escape/header would raise an error.

DELIMITER defaults to ','
ESCAPE defaults to '"'

If HEADER is specified, then it's argument (default of 1) indicates which line contains the column names.  It is not necessary to specify SKIP lines if the header is the row preceding the data.
If SKIP is specified, then that number of rows will be skipped prior to parsing data.  Should be greater than HEADER.

As a correlated nested table:

select t.* from (call modelname.getTextFiles(*.txt)) f, texttable(f.file COLUMNS string x, int y SKIP 1) t

This could be used as a user query, put in a transformation, etc.

It will process all files returned by the procedure and look for 2 columns separated by the default delimiter.

----- Original Message -----
From: "Ken Johnson" <kejohnso at redhat.com>
To: "Charles Mosher" <cmosher at redhat.com>, "teiid-dev" <teiid-dev at lists.jboss.org>
Sent: Monday, May 24, 2010 12:10:05 PM GMT -06:00 US/Canada Central
Subject: Re: [teiid-dev] alternative to the text connector / translator

+1 on some kind of dynamic path/filename capability.

Charles Mosher wrote:
> And would this still support the multiple-files-in-a-directory use-case?  I.e., can wild-cards be incorporated into these filenames (wherever they are stored)?  Perhaps using a function as suggested would be better for this example, where logic could be used to determine the file/pathnames at query time.
>
> Best regards,
>
> Chuck
>
> ----- Original Message -----
> From: "John Doyle" <jdoyle at redhat.com>
> To: "Steven Hawkins" <shawkins at redhat.com>
> Cc: "teiid-dev" <teiid-dev at lists.jboss.org>
> Sent: Friday, May 21, 2010 11:11:43 AM GMT -05:00 US/Canada Eastern
> Subject: Re: [teiid-dev] alternative to the text connector / translator
>
>
> ----- "Steven Hawkins" <shawkins at redhat.com> wrote:
>
>   
>> Assuming the use of a simple file translator/ resource adapter
>> procedure to get files, the downside to either approach is that
>> relative file names would now be stored on either the model or in a
>> function.
>>
>> I support the second, since it completely removes the need for a
>> heavy-weight text translator and is feasible before RC1.
>>
>>     
>
> I like the second solution as well.  It's consistent with the move made with XML.  But why relative paths, why not absolute paths?    
>
>   
>> Steve
>>
>>
>> _______________________________________________
>> teiid-dev mailing list
>> teiid-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/teiid-dev
>>     
> _______________________________________________
> teiid-dev mailing list
> teiid-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/teiid-dev
> _______________________________________________
> teiid-dev mailing list
> teiid-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/teiid-dev
>   

-- 
Ken Johnson
Sr. Product Manager
JBoss Middleware Business Unit
Red Hat, Inc
978.392.3917
ken.johnson at redhat.com

_______________________________________________
teiid-dev mailing list
teiid-dev at lists.jboss.org
https://lists.jboss.org/mailman/listinfo/teiid-dev