[JBoss JIRA] (TEIID-3733) Add support for web scraping

Wednesday, 30 September 2015

    [
https://issues.jboss.org/browse/TEIID-3733?page=com.atlassian.jira.plugin...
] 

Steven Hawkins commented on TEIID-3733:
---------------------------------------

Note the jsoup example will extract based upon the jsoup selector -
http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector
syntax.  This is somewhat idiomatic to jsoup and the results are simple the set of
selected elements - and component information such as inner_text, tag name, id, etc. are
returned in the result.  For any usage scenarios more logic would be needed to transform
the result, and this would not handle tabular data well (at best assuming that you could
somewhat easily identify a single html table to extract, you would read the rows, then for
each row use the soup extraction again to extract the columns - then a pivot would be
needed.  however that may not work well in practice unless the table is regular.  missing
or spanning values would likely be an issue).

...
 Add support for web scraping
 ----------------------------

                 Key: TEIID-3733
                 URL: https://issues.jboss.org/browse/TEIID-3733
             Project: Teiid
          Issue Type: Feature Request
          Components: Misc. Connectors
            Reporter: Van Halbert
            Assignee: Steven Hawkins

 Add support for web scraping.
 Here's one from CA using JSoup -
https://github.com/rokhmanov/teiid-translators/blob/master/translator-scr...

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009