[
https://issues.jboss.org/browse/TEIID-3733?page=com.atlassian.jira.plugin...
]
Steven Hawkins commented on TEIID-3733:
---------------------------------------
Note the jsoup example will extract based upon the jsoup selector -
http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector
syntax. This is somewhat idiomatic to jsoup and the results are simple the set of
selected elements - and component information such as inner_text, tag name, id, etc. are
returned in the result. For any usage scenarios more logic would be needed to transform
the result, and this would not handle tabular data well (at best assuming that you could
somewhat easily identify a single html table to extract, you would read the rows, then for
each row use the soup extraction again to extract the columns - then a pivot would be
needed. however that may not work well in practice unless the table is regular. missing
or spanning values would likely be an issue).
Add support for web scraping
----------------------------
Key: TEIID-3733
URL:
https://issues.jboss.org/browse/TEIID-3733
Project: Teiid
Issue Type: Feature Request
Components: Misc. Connectors
Reporter: Van Halbert
Assignee: Steven Hawkins
Add support for web scraping.
Here's one from CA using JSoup -
https://github.com/rokhmanov/teiid-translators/blob/master/translator-scr...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)