[teiid-dev] XML as a source

Steven Hawkins shawkins at redhat.com
Thu Sep 24 16:47:44 EDT 2009


John,

Option 2 is essentially what the xml connector does - joining information from a parent to a child element would cause two passes over the response document, correct?  Option 3 is more work and would be less expressive that option 2 given that it's using sql constructs and not xpath.  

The next question is what would document retrieval look like?  

----- Original Message -----
From: "John Doyle" <jdoyle at redhat.com>
To: "Steven Hawkins" <shawkins at redhat.com>
Cc: "teiid-dev" <teiid-dev at lists.jboss.org>, "Ramesh Reddy" <rareddy at redhat.com>
Sent: Thursday, September 17, 2009 3:52:14 PM GMT -06:00 US/Canada Central
Subject: Re: [teiid-dev] XML as a source

Option 2 seems like an attractive solution, I'm of the opinion that it should be as easy as possible to create something that looks relational.  I assume option 3 would me much more work than option 2.

Thanks for the additional detail.

~jd


----- "Steven Hawkins" <shawkins at redhat.com> wrote:

> Option 1 : don't leave the xml world
> 
> i.e. extract(xml, '/po', '/item/quantity/text() > 10',
> '/number/text()' as Number integer)
> 
> Here the second argument specifies the nodes to consider, the third
> argument is the node test, and the remainder perform the typed scalar
> extractions.  That's probably not very satisfactory for ad hoc
> querying, but it's very clear how it operates and what it should
> return. 
> 
> Option 2 : emulate the xml relational logic of adding synthetic
> primary keys
> 
> i.e. select PO.Number from (extract(xml, '/po', 'position()' as key,
> '/number/text()' as Number integer)) AS PO,
> (extract(xml, '/po/item', '../position()' as ParentKey,
> '/quantity/text()' as Number integer)) AS Item where PO.key =
> Item.ParentKey and Item.Quantity > 10
> 
> I'm omitting the node test parameter shown in option 1.  The downside
> here is that you will perform multiple extractions and then join the
> results rather than option 1, which would perform everything in one
> step.  This form does however make it a bit simpler to create
> procedures and the corresponding proc relational queries that map to
> something that looks relational.  For example there would be a PO
> stored procedure that takes a doc as a parameter and returns the PO
> rowset, the procedure logic would be nothing more than extract(xmlin,
> '/po', 'position()' as key, '/number/text()' as Number integer);. 
> There would also be one for Item.  Then I could write a query like:
> 
> select PO.Number from PO, Item where PO.xmlin = xml and Item.xmlin =
> xml and PO.key = Item.ParentKey and Item.Quantity > 10
> 
> Option 3 : natively understand hierarchical xml structures similar to
> oql/hsql and our document model queries, in which the join between
> parent and child is implicit.
> 
> i.e. Select PO.Number from xml where PO.Item.Quantity > 10
> 
> however our xml document model query syntax in this case would project
> a document that only contains the Number element, not just the scalar
> number value in a rowset.  We would also have to know the document
> model / schema used in the from clause - not just that it's typed as
> XML.
> 
> ----- Original Message -----
> From: "John Doyle" <jdoyle at redhat.com>
> To: "Ramesh Reddy" <rareddy at redhat.com>
> Cc: "teiid-dev" <teiid-dev at lists.jboss.org>
> Sent: Thursday, September 17, 2009 1:49:43 PM GMT -06:00 US/Canada
> Central
> Subject: Re: [teiid-dev] XML as a source
> 
> I'm not really arguing against XPath as the syntax to express the
> mapping, it's a good solution.  I'm describing a higher level problem
> that I don't see the solution for.  
> 
> Lets say we have a virtual view we have created for a purchase order
> document and we have defined three tables with the proposed new
> functions: PO, Address, and Item.  POs have both BillTo and ShipTo
> Addresses as well as a list of Items.  Now we query this view with the
> following:
> 
> Select PO.Number from PO, Item where Item.Quantity > 10;
> 
> If our virtual tables defined by our functions are just returning row
> sets, how does this join get resolved?  How does the engine know?  In
> the XML-Relational implementation we create keyed relations in the
> model and then insert keys into the document as it is parsed so that
> we engine can resolve the join across the result sets we produce.
> 
> ~jd
> 
> ----- "Ramesh Reddy" <rareddy at redhat.com> wrote:
> 
> > I see that in the current source model's table and column property
> > "NameInSource" already contains a xpath expression to get to column
> > information from a given XML document. How is proposed function
> based
> > model is different, than just making a what is source model now to
> a
> > virtual model in future. However, this forces the XML --> Table
> > mappings
> > will be done at the engine level, thus pushes relational issues to
> > engine. Am I interpreting the scenario correctly?
> > 
> > On Thu, 2009-09-17 at 12:05 -0400, John Doyle wrote:
> > > I wanted to pick up the discussion started in TEIID_817 around
> > improvements/alternatives to the current implementation of the XML
> > Connectors (https://jira.jboss.org/jira/browse/TEIID-817). 
> > Specifically I want to get a better understanding of the idea Steve
> > presented.
> > > ___
> > > ...
> > > The alternative we've talked about before is to treat the source
> > access and table extraction as primary and separate functions of
> > Teiid. Executing a web service could have a procedure like xml
> > getDoc(xml param) - which is essentially the approach of the XML
> > Source logic. On top of that there can be helper SQL/XML like
> > functions to map a rowset to xml, such as xmlforest, and the result
> > xml to "tables" - something like extract(xml param, xpath as col1,
> > xpath as col2....). At the very least adding a rowset or table
> type.
> > > __
> > > 
> > > 
> > > How do you envision exposing the functions, specifically
> functions
> > that turn XML into a rowset or table, to users?  Would they be
> > available in Transformations that define virtual tables?  This
> would
> > retain the central value of the XML-Relational connectors, and
> remove
> > much of the complexity of how it does it.
> > > 
> > > One of the challenges in this is that we need to be able to
> extract
> > multiple virtual tables from a single XML document, and retain
> > (create) the relationships between them.  For instance, if we
> extract
> > PO, Address and Item tables from a Purchase Order xml document, we
> > need to be able to map the item to the purchase order that it came
> > from.  We do this currently by adding id's to the xml document and
> > exposing them in the model we generate.
> > > 
> > > One thing I'll note is that passing around XML has it's limits. 
> > Some of our customers are handling very large XML.
> > > 
> > > ~jd
> > 
> > _______________________________________________
> > teiid-dev mailing list
> > teiid-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/teiid-dev
> _______________________________________________
> teiid-dev mailing list
> teiid-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/teiid-dev


More information about the teiid-dev mailing list