[JBoss JIRA] (TEIID-3733) Add support for web scraping
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-3733?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-3733:
---------------------------------------
Note the jsoup example will extract based upon the jsoup selector - http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector syntax. This is somewhat idiomatic to jsoup and the results are simple the set of selected elements - and component information such as inner_text, tag name, id, etc. are returned in the result. For any usage scenarios more logic would be needed to transform the result, and this would not handle tabular data well (at best assuming that you could somewhat easily identify a single html table to extract, you would read the rows, then for each row use the soup extraction again to extract the columns - then a pivot would be needed. however that may not work well in practice unless the table is regular. missing or spanning values would likely be an issue).
> Add support for web scraping
> ----------------------------
>
> Key: TEIID-3733
> URL: https://issues.jboss.org/browse/TEIID-3733
> Project: Teiid
> Issue Type: Feature Request
> Components: Misc. Connectors
> Reporter: Van Halbert
> Assignee: Steven Hawkins
>
> Add support for web scraping.
> Here's one from CA using JSoup - https://github.com/rokhmanov/teiid-translators/blob/master/translator-scr...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 2 months
[JBoss JIRA] (TEIID-3734) Add data source support for reading DOC documents
by Van Halbert (JIRA)
[ https://issues.jboss.org/browse/TEIID-3734?page=com.atlassian.jira.plugin... ]
Van Halbert updated TEIID-3734:
-------------------------------
Description:
Add data source support for reading DOC documents.
The Apache POI could be an option, as it has support for OLE2 document format.
was:
Add data source support for reading POC documents.
The Apache POI could be an option, as it has support for OLE2 document format.
> Add data source support for reading DOC documents
> -------------------------------------------------
>
> Key: TEIID-3734
> URL: https://issues.jboss.org/browse/TEIID-3734
> Project: Teiid
> Issue Type: Feature Request
> Reporter: Van Halbert
> Assignee: Steven Hawkins
>
> Add data source support for reading DOC documents.
> The Apache POI could be an option, as it has support for OLE2 document format.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 2 months
[JBoss JIRA] (TEIID-3581) Add a system function to return Teiid Server Name
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-3581?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-3581:
---------------------------------------
I don't think I was explicit enough. The main question was whether this seems worth a system function? And if so, then for the embedded case are we saying something like we'll add an EmbeddedConfiguration field for node id or would we expect the system property to be set?
> Add a system function to return Teiid Server Name
> -------------------------------------------------
>
> Key: TEIID-3581
> URL: https://issues.jboss.org/browse/TEIID-3581
> Project: Teiid
> Issue Type: Enhancement
> Components: Server
> Reporter: Van Halbert
> Fix For: 8.12.1
>
>
> A user has a 4 node Cluster and would like to get not only the sessionid, vdb and user but the node of the cluster that the user is connected to
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 2 months