[JBoss JIRA] (TEIIDSB-206) Create native connector to interact with HDFS as a datasource
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIIDSB-206?page=com.atlassian.jira.plug... ]
Steven Hawkins commented on TEIIDSB-206:
----------------------------------------
> File and HDFS and S3 all becoming synonymous to me, I am a little confused to see `file-hdfs` kind of naming.
They are all virtual file sources, but obviously hit different things. I know we talked about this in the past, but I deferred on a strategy.
To explain this more, what we currently have:
CREATE SERVER x1 FOREIGN DATA WRAPPER file; -- currently means file source with a file translator
CREATE SERVER x2 FOREIGN DATA WRAPPER excel; -- currently means file source with an excel translator
CREATE SERVER x3 FOREIGN DATA WRAPPER ftp; -- currently means an ftp source with an ftp translator, which is a bug because there is no ftp translator
What has just been added:
CREATE SERVER x1 FOREIGN DATA WRAPPER hdfs; -- currently means an ftp source with an ftp translator, which is a bug because there is no hdfs translator
So obviously the last two need to point to file translator instead.
But the real issue is that this approach is baking in a single translator for a given source name. excel can work against file, ftp, hdfs, and s3 once it's added.
> I see that if you are going to design above `avro` as a language feature then we do not need this?
The rationale for doing the hdfs and s3 work first was actually so that avro and parquet support would make sense at a translator level.
> If not we can see how we can abstract all these types under `file` type and figure out dynamically as you suggest.
Option 1: use the create server type information. You can default an alias to a given translator, but also specify it:
CREATE SERVER x2 FOREIGN DATA WRAPPER excel
could be fully expressed as: CREATE SERVER x2 TYPE file FOREIGN DATA WRAPPER excel
and would allow for the other combinations: CREATE SERVER x2 TYPE hdfs FOREIGN DATA WRAPPER excel
The only downside is explicitly introducing the source type construct.
Option 2: explicit and verbose expansion of alias names / connection factories. create excel-file, excel-ftp, excel-hdfs, avro-hdfs, etc.
Option 3: assume some kind of convention in the foreign data wrapper name.
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIIDSB-206
> URL: https://issues.redhat.com/browse/TEIIDSB-206
> Project: Teiid Spring Boot
> Issue Type: Feature Request
> Components: datasource
> Reporter: Van Halbert
> Assignee: Aditya Manglam Sharma
> Priority: Major
> Fix For: 1.6.0
>
> Original Estimate: 2 days, 4 hours
> Time Spent: 3 hours, 30 minutes
> Remaining Estimate: 2 days, 30 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIIDSB-206) Create native connector to interact with HDFS as a datasource
by Ramesh Reddy (Jira)
[ https://issues.redhat.com/browse/TEIIDSB-206?page=com.atlassian.jira.plug... ]
Ramesh Reddy commented on TEIIDSB-206:
--------------------------------------
File and HDFS and S3 all becoming synonymous to me, I am a little confused to see `file-hdfs` kind of naming. I do not yet understand the file consumer side as you mention such as `avro` as to how we are going to provide that. I understand that for `excel` we did a translator and tied directly to `file` are we thinking similar lines with `avro`?
creating avro-ftp, avro-hdfs, avro-file, avro-s3 like Connection factories seem to be easiest, but verbose. I see that if you are going to design above `avro` as a language feature then we do not need this? If not we can see how we can abstract all these types under `file` type and figure out dynamically as you suggest.
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIIDSB-206
> URL: https://issues.redhat.com/browse/TEIIDSB-206
> Project: Teiid Spring Boot
> Issue Type: Feature Request
> Components: datasource
> Reporter: Van Halbert
> Assignee: Aditya Manglam Sharma
> Priority: Major
> Fix For: 1.6.0
>
> Original Estimate: 2 days, 4 hours
> Time Spent: 3 hours, 30 minutes
> Remaining Estimate: 2 days, 30 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-5940) How to connect GCP bucket using Teiid
by nayan Bijagare (Jira)
[ https://issues.redhat.com/browse/TEIID-5940?page=com.atlassian.jira.plugi... ]
nayan Bijagare commented on TEIID-5940:
---------------------------------------
Hi [~shawkins]
Could you please provide us steps, how to read CSV file data using SQL queries from GCP bucket( [bucket|https://cloud.google.com/storage/docs/key-terms#buckets] in Google Cloud Storage) using Teiid,
Actually I am having a requirement below.
1. There will be a CSV file in the GCP bucket.
2. I need to read this CSV file data using Simple SQL queries. just like we read the data by using a JDBC connection.
I hope you got my concern.
Could you please help me in it?
> How to connect GCP bucket using Teiid
> -------------------------------------
>
> Key: TEIID-5940
> URL: https://issues.redhat.com/browse/TEIID-5940
> Project: Teiid
> Issue Type: Feature Request
> Components: JDBC Connector
> Reporter: nayan Bijagare
> Assignee: Steven Hawkins
> Priority: Major
> Labels: GCP_BUC, jdbc-connector, team-service-2
> Original Estimate: 3 days
> Remaining Estimate: 3 days
>
> We need to read flat file or CSV file tabular data using Teiid.
> We couldn't find any JDBC driver by using it we can establish the connection with the GCP Bucket.
> Could you please suggest us any solution how to connect to GCP bucket using Teiid.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-3647) Create native connector to interact with HDFS as a datasource
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-3647?focusedWorklogId=12451562&pag... ]
Steven Hawkins logged work on TEIID-3647:
-----------------------------------------
Author: Steven Hawkins
Created on: 22/Jun/20 9:41 PM
Start Date: 22/Jun/20 9:40 PM
Worklog Time Spent: 1 hour
Issue Time Tracking
-------------------
Time Spent: 1 day, 3 hours, 30 minutes (was: 1 day, 2 hours, 30 minutes)
Worklog Id: (was: 12451562)
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIID-3647
> URL: https://issues.redhat.com/browse/TEIID-3647
> Project: Teiid
> Issue Type: Feature Request
> Components: Misc. Connectors
> Reporter: Van Halbert
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 15.0
>
> Original Estimate: 6 hours
> Time Spent: 1 day, 3 hours, 30 minutes
> Remaining Estimate: 0 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-3647) Create native connector to interact with HDFS as a datasource
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-3647?page=com.atlassian.jira.plugi... ]
Steven Hawkins resolved TEIID-3647.
-----------------------------------
Resolution: Done
Adapted the initial implementation from TEIIDSB-206 and updated with an integration test and to hadoop client jars to avoid a lot of the dependency management. This also required upgrading our accumulo support.
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIID-3647
> URL: https://issues.redhat.com/browse/TEIID-3647
> Project: Teiid
> Issue Type: Feature Request
> Components: Misc. Connectors
> Reporter: Van Halbert
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 15.0
>
> Original Estimate: 6 hours
> Time Spent: 1 day, 2 hours, 30 minutes
> Remaining Estimate: 0 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIIDSB-206) Create native connector to interact with HDFS as a datasource
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIIDSB-206?page=com.atlassian.jira.plug... ]
Steven Hawkins commented on TEIIDSB-206:
----------------------------------------
The pr for aligning these changes with the updates from Teiid has been opened.
[~rareddy] how do you want to handle operator support for this? Namely we have an excel and file sources that presume a file connection. Do you want to also add a named excel-hdfs and file-hdfs (likewise we are currently missing the ftp variants) or should we look to make things a bit more flexible / abstract and look toward using a convention, ddl, or datasource properties to know the source type? This will continue to fan out with more file sources, such as s3, and more file consumers, such as avro (that could mean creating avro-ftp, avro-hdfs, avro-file, avro-s3)
> Create native connector to interact with HDFS as a datasource
> -------------------------------------------------------------
>
> Key: TEIIDSB-206
> URL: https://issues.redhat.com/browse/TEIIDSB-206
> Project: Teiid Spring Boot
> Issue Type: Feature Request
> Components: datasource
> Reporter: Van Halbert
> Assignee: Aditya Manglam Sharma
> Priority: Major
> Fix For: 1.6.0
>
> Original Estimate: 2 days, 4 hours
> Time Spent: 3 hours, 30 minutes
> Remaining Estimate: 2 days, 30 minutes
>
> Would like to have native connector
> To interact with HDFS as a datasource from a DV perspective there are at least two options:
> 1 - HttpFS
> 2 - WebHDFS
> Both options seems to be in practise very slow. I heard 2x slower.
> It would be nice if we can create a native HDFS connector.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months