[teiid-issues] [JBoss JIRA] (TEIIDSB-206) Create native connector to interact with HDFS as a datasource

Steven Hawkins (Jira) issues at jboss.org
Tue Jun 23 12:59:06 EDT 2020


    [ https://issues.redhat.com/browse/TEIIDSB-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173047#comment-14173047 ] 

Steven Hawkins commented on TEIIDSB-206:
----------------------------------------

> File and HDFS and S3 all becoming synonymous to me, I am a little confused to see `file-hdfs` kind of naming.

They are all virtual file sources, but obviously hit different things.  I know we talked about this in the past, but I deferred on a strategy.

To explain this more, what we currently have:
CREATE SERVER x1 FOREIGN DATA WRAPPER file;  -- currently means file source with a file translator
CREATE SERVER x2 FOREIGN DATA WRAPPER excel; -- currently means file source with an excel translator
CREATE SERVER x3 FOREIGN DATA WRAPPER ftp; -- currently means an ftp source with an ftp translator, which is a bug because there is no ftp translator

What has just been added:
CREATE SERVER x1 FOREIGN DATA WRAPPER hdfs;  -- currently means an ftp source with an ftp translator, which is a bug because there is no hdfs translator

So obviously the last two need to point to file translator instead.

But the real issue is that this approach is baking in a single translator for a given source name.  excel can work against file, ftp, hdfs, and s3 once it's added.

>  I see that if you are going to design above `avro` as a language feature then we do not need this?

The rationale for doing the hdfs and s3 work first was actually so that avro and parquet support would make sense at a translator level.

> If not we can see how we can abstract all these types under `file` type and figure out dynamically as you suggest.

Option 1: use the create server type information.  You can default an alias to a given translator, but also specify it:

CREATE SERVER x2 FOREIGN DATA WRAPPER excel
could be fully expressed as: CREATE SERVER x2 TYPE file FOREIGN DATA WRAPPER excel
and would allow for the other combinations:  CREATE SERVER x2 TYPE hdfs FOREIGN DATA WRAPPER excel

The only downside is explicitly introducing the source type construct.

Option 2: explicit and verbose expansion of alias names / connection factories.  create excel-file, excel-ftp, excel-hdfs, avro-hdfs, etc.

Option 3: assume some kind of convention in the foreign data wrapper name.

> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
>                 Key: TEIIDSB-206
>                 URL: https://issues.redhat.com/browse/TEIIDSB-206
>             Project: Teiid Spring Boot
>          Issue Type: Feature Request
>          Components: datasource
>            Reporter: Van Halbert
>            Assignee: Aditya Manglam Sharma
>            Priority: Major
>             Fix For: 1.6.0
>
>   Original Estimate: 2 days, 4 hours
>          Time Spent: 3 hours, 30 minutes
>  Remaining Estimate: 2 days, 30 minutes
>
> Would like to have native connector 
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the teiid-issues mailing list