[teiid-issues] [JBoss JIRA] (TEIIDSB-206) Create native connector to interact with HDFS as a datasource
Steven Hawkins (Jira)
issues at jboss.org
Tue Jun 23 12:59:06 EDT 2020
[ https://issues.redhat.com/browse/TEIIDSB-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173047#comment-14173047 ]
Steven Hawkins commented on TEIIDSB-206:
----------------------------------------
> File and HDFS and S3 all becoming synonymous to me, I am a little confused to see `file-hdfs` kind of naming.
They are all virtual file sources, but obviously hit different things. I know we talked about this in the past, but I deferred on a strategy.
To explain this more, what we currently have:
CREATE SERVER x1 FOREIGN DATA WRAPPER file; -- currently means file source with a file translator
CREATE SERVER x2 FOREIGN DATA WRAPPER excel; -- currently means file source with an excel translator
CREATE SERVER x3 FOREIGN DATA WRAPPER ftp; -- currently means an ftp source with an ftp translator, which is a bug because there is no ftp translator
What has just been added:
CREATE SERVER x1 FOREIGN DATA WRAPPER hdfs; -- currently means an ftp source with an ftp translator, which is a bug because there is no hdfs translator
So obviously the last two need to point to file translator instead.
But the real issue is that this approach is baking in a single translator for a given source name. excel can work against file, ftp, hdfs, and s3 once it's added.
> I see that if you are going to design above `avro` as a language feature then we do not need this?
The rationale for doing the hdfs and s3 work first was actually so that avro and parquet support would make sense at a translator level.
> If not we can see how we can abstract all these types under `file` type and figure out dynamically as you suggest.
Option 1: use the create server type information. You can default an alias to a given translator, but also specify it:
CREATE SERVER x2 FOREIGN DATA WRAPPER excel
could be fully expressed as: CREATE SERVER x2 TYPE file FOREIGN DATA WRAPPER excel
and would allow for the other combinations: CREATE SERVER x2 TYPE hdfs FOREIGN DATA WRAPPER excel
The only downside is explicitly introducing the source type construct.
Option 2: explicit and verbose expansion of alias names / connection factories. create excel-file, excel-ftp, excel-hdfs, avro-hdfs, etc.
Option 3: assume some kind of convention in the foreign data wrapper name.
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIIDSB-206
> URL: https://issues.redhat.com/browse/TEIIDSB-206
> Project: Teiid Spring Boot
> Issue Type: Feature Request
> Components: datasource
> Reporter: Van Halbert
> Assignee: Aditya Manglam Sharma
> Priority: Major
> Fix For: 1.6.0
>
> Original Estimate: 2 days, 4 hours
> Time Spent: 3 hours, 30 minutes
> Remaining Estimate: 2 days, 30 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the teiid-issues
mailing list