[teiid-issues] [JBoss JIRA] (TEIID-2884) Support for Amazon Elastic MapReduce

Mon Aug 3 05:30:03 EDT 2015

    [ https://issues.jboss.org/browse/TEIID-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094936#comment-13094936 ] 

Kylin Soong commented on TEIID-2884:
------------------------------------

Amazon EMR is a platform that integrate all Hadoop ecosystem products to Amazon, for Big Data processing, analyzing, ETL, etc. 

For Hadoop ecosystem products like HBase, Hive, Spark, Impala, etc, we already have relevant translatore, also Amazon EMR have supplied some compatiable driver, downloadable image, document, etc.
	http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCDriver.html
	http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/impala-jdbc.html
I think Teiid translator can wrok well against these products in Amazon EMR(Not test so far).

Excepts integrate Hadoop ecosystem products, Amazon EMR add some more features, like EMR can use Amazon S3 to store input data, log files, and output data. 

For S3, I do wonder whether we need to develop a tranlator, what mainly stored in S3 is large picture files, vedio files and big text files, it seems meaningless to develop a JDBC based translator on top it. Further more, IMO, the work to integrate S3 to enterprise application should belong to ESB, actually there already have some implmentations:

* Camel S3 Component(http://camel.apache.org/aws-s3.html)
* Mulesoft S3 Connector(https://www.mulesoft.org/connectors/amazon-simple-storage-service-s3-connector-3.5.0-Everest)
* RSSBus/CData S3(https://www.rssbus.com/solutions/s3/)

Any idea or advice in this issue?

> Support for Amazon Elastic MapReduce
> ------------------------------------
>
>                 Key: TEIID-2884
>                 URL: https://issues.jboss.org/browse/TEIID-2884
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Misc. Connectors
>            Reporter: Van Halbert
>            Assignee: Kylin Soong
>             Fix For: 8.12
>
>
> Amazon Elastic MapReduce
> from http://en.wikipedia.org/wiki/Amazon_Elastic_MapReduce#Amazon_Elastic_MapReduce
> Elastic MapReduce (EMR)was introduced by Amazon in April 2009. Provisioning of the Hadoop cluster, running and terminating jobs, and handling data transfer between EC2 and S3 are automated by Elastic MapReduce. Apache Hive, which is built on top of Hadoop for providing data warehouse services, is also offered in Elastic MapReduce.
> ...
> In June 2012, premium options for EMR were added that replace ordinary Hadoop with MapR's M3 and M5 versions. These options provide additional capabilities over and above what the default EMR offering provides.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)