[JBoss JIRA] (TEIID-2636) Stream corruption errors when doing big query
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-2636?page=com.atlassian.jira.plugin... ]
Steven Hawkins updated TEIID-2636:
----------------------------------
Fix Version/s: 8.4.1
8.5
Affects Version/s: 8.4
(was: 8.5)
This is happening when intermediate batches for the TEXTAGG are read from the memory/storage layer. This was effectively introduced by https://issues.jboss.org/browse/TEIID-2159
> Stream corruption errors when doing big query
> ---------------------------------------------
>
> Key: TEIID-2636
> URL: https://issues.jboss.org/browse/TEIID-2636
> Project: Teiid
> Issue Type: Bug
> Affects Versions: 8.4
> Environment: Teiid 8.5 Beta2 Running in JBoss EAP 6.1 on RHEL6
> Java 7 openjdk
> Reporter: Graeme Gillies
> Assignee: Steven Hawkins
> Fix For: 8.4.1, 8.5
>
> Attachments: server.log
>
>
> Hi,
> When performing the following big query against a virtual db we have setup in this environment, we get an error
> The query is
> {noformat}
> SELECT
> bugs.bug_id, classification.name, bugs.cf_internal_whiteboard,
> dev_cond_nak_grouped."value", partner_grouped."value",
> bugs.cf_last_closed, bugs.bug_severity, bugs.cf_qa_whiteboard,
> bugs.short_desc, qe_cond_nak_grouped."value", bugs.priority,
> bugs.version, bugs.cf_pm_score, bugs.bug_status, product.name,
> blocks_grouped.blocked, qa_contact.login_name, reporter.login_name,
> component.name, flag_grouped.flag_full, bugs.delta_ts,
> dependson_grouped.dependson, verified_grouped."value",
> bugs.creation_ts, bugs.cf_devel_whiteboard, keyword_grouped.name,
> target_release_grouped."value", bugs.target_milestone, assigned_to.login_name,
> bugs.resolution
> FROM Bugzilla_raw.bugs
> LEFT JOIN Bugzilla_raw.products ON products.id = bugs.product_id
> LEFT JOIN Bugzilla_raw.classifications classification ON classification.id = products.classification_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(dev_cond_nak."value")) AS "value" FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.bug_cf_conditional_nak dev_cond_nak ON dev_cond_nak.bug_id = bugs.bug_id GROUP BY bugs.bug_id) dev_cond_nak_grouped ON bugs.bug_id = dev_cond_nak_grouped.bug_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(partner."value")) AS "value" FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.bug_cf_partner partner ON partner.bug_id = bugs.bug_id GROUP BY bugs.bug_id) partner_grouped ON bugs.bug_id = partner_grouped.bug_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(qe_cond_nak."value")) AS "value" FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.bug_cf_qe_conditional_nak qe_cond_nak ON qe_cond_nak.bug_id = bugs.bug_id GROUP BY bugs.bug_id) qe_cond_nak_grouped ON bugs.bug_id = qe_cond_nak_grouped.bug_id
> LEFT JOIN Bugzilla_raw.products product ON product.id = bugs.product_id LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(blocks.blocked)) AS blocked FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.dependencies blocks ON blocks.dependson = bugs.bug_id GROUP BY bugs.bug_id) blocks_grouped ON bugs.bug_id = blocks_grouped.bug_id
> LEFT JOIN Bugzilla_raw.profiles qa_contact ON qa_contact.userid = bugs.qa_contact LEFT JOIN Bugzilla_raw.profiles reporter ON reporter.userid = bugs.reporter LEFT JOIN Bugzilla_raw.components component ON component.id = bugs.component_id LEFT JOIN ( SELECT flags.bug_id, TEXTAGG(FOR(concat(ft.name, flags.status))) AS flag_full FROM Bugzilla_raw.flags
> LEFT JOIN Bugzilla_raw.flagtypes ft ON ft.id = flags.type_id GROUP BY flags.bug_id) flag_grouped ON bugs.bug_id = flag_grouped.bug_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(dependson.dependson)) AS dependson FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.dependencies dependson ON dependson.blocked = bugs.bug_id GROUP BY bugs.bug_id) dependson_grouped ON bugs.bug_id = dependson_grouped.bug_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(verified."value")) AS "value" FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.bug_cf_verified verified ON verified.bug_id = bugs.bug_id GROUP BY bugs.bug_id) verified_grouped ON bugs.bug_id = verified_grouped.bug_id
> LEFT JOIN ( SELECT keywords.bug_id, TEXTAGG(FOR(kw.name)) AS name FROM Bugzilla_raw.keywords LEFT JOIN Bugzilla_raw.keyworddefs kw ON kw.id = keywords.keywordid GROUP BY keywords.bug_id) keyword_grouped ON bugs.bug_id = keyword_grouped.bug_id
> LEFT JOIN (SELECT bugs.bug_id, TEXTAGG(FOR(target_release."value")) AS "value" FROM Bugzilla_raw.bugs JOIN Bugzilla_raw.bugs_release target_release ON target_release.bug_id = bugs.bug_id GROUP BY bugs.bug_id) target_release_grouped ON bugs.bug_id = target_release_grouped.bug_id
> LEFT JOIN Bugzilla_raw.profiles assigned_to ON assigned_to.userid = bugs.assigned_to
> WHERE bugs.bug_id > 990410 OR bugs.delta_ts > parseTimestamp('2013-07-31 06:57:31 +0000', 'yyyy-MM-dd HH:mm:ss z')
> LIMIT 0,1000000
> {noformat}
> The error the client sees is
> ERROR: TEIID30048 Error reading 24,668
> DETAIL: org.teiid.jdbc.TeiidSQLException: TEIID30048 Error reading 24,668
> This is using the Postgresql ODBC interface or the JDBC interface
> The errors in the log file are quite long so I will attach the whole log file.
> Regards,
> Graeme
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2638) CLI Scripts are incorrectly referencing the cache within the cache container, should reference cache container
by Van Halbert (JIRA)
[ https://issues.jboss.org/browse/TEIID-2638?page=com.atlassian.jira.plugin... ]
Work on TEIID-2638 started by Van Halbert.
> CLI Scripts are incorrectly referencing the cache within the cache container, should reference cache container
> --------------------------------------------------------------------------------------------------------------
>
> Key: TEIID-2638
> URL: https://issues.jboss.org/browse/TEIID-2638
> Project: Teiid
> Issue Type: Bug
> Components: Build/Kits
> Affects Versions: 8.4, 8.4.1
> Reporter: Van Halbert
> Assignee: Van Halbert
>
> The domain and standalone cli scripts need to have the following lines corrected:
> /profile=ha/subsystem=teiid:add(async-thread-pool=teiid-async, distributed-cache-jgroups-stack=udp, resultset-cache-infinispan-container=resultset-repl, preparedplan-cache-infinispan-container=preparedplan, policy-decider-module=org.jboss.teiid)
> /subsystem=teiid:add(async-thread-pool=teiid-async, resultset-cache-infinispan-container=resultset, preparedplan-cache-infinispan-container=preparedplan, policy-decider-module=org.jboss.teiid)
> so that the cache containers map to teiid-cache, and not the cache within the container. This isn't seen in the standalone-teiid.xml, cause those settings to do point to the "teiid" cache container.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2638) CLI Scripts are incorrectly referencing the cache within the cache container, should reference cache container
by Van Halbert (JIRA)
Van Halbert created TEIID-2638:
----------------------------------
Summary: CLI Scripts are incorrectly referencing the cache within the cache container, should reference cache container
Key: TEIID-2638
URL: https://issues.jboss.org/browse/TEIID-2638
Project: Teiid
Issue Type: Bug
Components: Build/Kits
Affects Versions: 8.4, 8.4.1
Reporter: Van Halbert
Assignee: Van Halbert
The domain and standalone cli scripts need to have the following lines corrected:
/profile=ha/subsystem=teiid:add(async-thread-pool=teiid-async, distributed-cache-jgroups-stack=udp, resultset-cache-infinispan-container=resultset-repl, preparedplan-cache-infinispan-container=preparedplan, policy-decider-module=org.jboss.teiid)
/subsystem=teiid:add(async-thread-pool=teiid-async, resultset-cache-infinispan-container=resultset, preparedplan-cache-infinispan-container=preparedplan, policy-decider-module=org.jboss.teiid)
so that the cache containers map to teiid-cache, and not the cache within the container. This isn't seen in the standalone-teiid.xml, cause those settings to do point to the "teiid" cache container.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2584) Add management features to materialization
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-2584?page=com.atlassian.jira.plugin... ]
Ramesh Reddy commented on TEIID-2584:
-------------------------------------
Will make suggested changes. Also will add exception handling for insert call.
>And in theory the management of the materialized tables can be quite complicated. We have logic in the internal materialization stuff about sharing imported matviews, something like that may be needed here as well (although that complicates the notion of what vdb/version a materialization table is from). Also people may want materialization tables to be shared among vdb versions and selectively destroyed with a new deployment.
For this, how about adding metadata flag on the view if this that can specify the sharing scope, valid values can be {NONE, VDB, SCHEMA}, where
NONE = no sharing (default)
VDB = sharing across all the vdb versions
SCHEMA = sharing across multiple VDBs, if SCHEMA names match
I would have to change selection criteria based on this flag on the status table, seems like that should work.
>What would a partial load script do?
it another script provided by user to incrementally load the matview updates, like the load script which does the full updates.
I will read the PDF now.
> Add management features to materialization
> ------------------------------------------
>
> Key: TEIID-2584
> URL: https://issues.jboss.org/browse/TEIID-2584
> Project: Teiid
> Issue Type: Feature Request
> Reporter: Ramesh Reddy
> Assignee: Ramesh Reddy
> Labels: Beta3
> Fix For: 8.5
>
>
> Currently Teiid supports internal and external materialization features. The internal is managed completely by the Teiid query engine along with the infinispan cache.
> External materialization is completely unmanaged and left out to the user to manage it externally. This goals for this feature are unify the materialization logic for internal and external, such that both are managed similarly irrespective of the type of materialization chosen.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2475) The RelationalNode.collectNodeStats is only subtracting out the last node
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-2475?page=com.atlassian.jira.plugin... ]
Steven Hawkins reopened TEIID-2475:
-----------------------------------
The stat logic was actually a little more convoluted than first thought, as the process time includes the total wall clock time and thus may result in negative values for unions (which is why the workaround was added to begin with). In general the computation of the current notion of the node's "process time" needs tweaked.
> The RelationalNode.collectNodeStats is only subtracting out the last node
> -------------------------------------------------------------------------
>
> Key: TEIID-2475
> URL: https://issues.jboss.org/browse/TEIID-2475
> Project: Teiid
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: 6.0.0
> Reporter: Debbie Steigner
> Assignee: Steven Hawkins
> Fix For: 8.4, 7.7.8
>
> Attachments: Query_Plan.txt
>
>
> The "Node Cumulative Next Batch Process Time" Statistic
> If you have a look at the explain plan (attached). Following the definition,
> the JoinNode Node Cumulative Next Batch Process Time: 565471. So, it is the summation of AccessNode (Node Process Time: 892860) + DependentAccessNode (Node Process Time: 439) + JoinNode (Node Process Time: 2203120). The summation does not agree.
> The RelationalNode.collectNodeStats looks wrong (it's effectively only subtracting out the last node)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2584) Add management features to materialization
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-2584?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-2584:
---------------------------------------
> I really do not need those properties for this feature.
If they're not needed, I would vote for keeping them out for now.
> From the privacy aspect, should we better off tie into some role to access any system tables in general?
SYS is considered public regardless of whether roles are enabled. SYSADMIN is not. So any system information that may have a security concern should go in SYSADMIN instead - for example the procedure to grab vdb contents, which can include the raw xmi files is in SYSADMIN.
> It is returning the number of rows loaded in that "load" call, not the status of the call.
I just mean that you can easily be more informational so that a call to load can give someone the status (loading e.g -1, loaded e.g. -2) rather than just returning 0. And 0 is a valid number of rows loaded for an empty materialized view, but you wouldn't know what the call actually did.
> I made the status table "valid" column as not null, since it is doing null check it will never get into "insert" if there is a entry already.
Yes but simultaneous load calls will both attempt to do an insert.
> I did not choose VDB startup time to avoid adding another procedure
Yes I suppose that just adds another complication. And in theory the management of the materialized tables can be quite complicated. We have logic in the internal materialization stuff about sharing imported matviews, something like that may be needed here as well (although that complicates the notion of what vdb/version a materialization table is from). Also people may want materialization tables to be shared among vdb versions and selectively destroyed with a new deployment.
> Thought about it, but was not sure if the current model of vdb.xml is good vehicle for it. Also, we need to worry about distinguishing between vdb reloads, and first time loads to expose as general purpose feature. Could be worked as separate issue.
It seems odd though to have a hook that is on a public api, but not expose it. If it's not intended for general use or has issues, then we need to document what those are.
> How about this proposal? Add two additional metadata properties on View
I'm not sure that will be sufficient. What would a partial load script do?
See http://wiki.postgresql.org/wiki/Materialized_Views especially the pdf link on http://www.pgcon.org/2008/schedule/events/69.en.html covering the use of a dirty column flag
We also do capture the update events going through Teiid, but not to a row level (since we don't try to determine the full effect of the source update). That can be done through a source for each row trigger though if we want to go that route (at the cost of performance).
Another possibility is to expand the semantics of the nocache hint (or introduce something new) such that it instructs the engine to perform a refresh of just a range of the materialization table - rather than the simplistic approach of the internal logic that just updates 1 row at a time.
> Add management features to materialization
> ------------------------------------------
>
> Key: TEIID-2584
> URL: https://issues.jboss.org/browse/TEIID-2584
> Project: Teiid
> Issue Type: Feature Request
> Reporter: Ramesh Reddy
> Assignee: Ramesh Reddy
> Labels: Beta3
> Fix For: 8.5
>
>
> Currently Teiid supports internal and external materialization features. The internal is managed completely by the Teiid query engine along with the infinispan cache.
> External materialization is completely unmanaged and left out to the user to manage it externally. This goals for this feature are unify the materialization logic for internal and external, such that both are managed similarly irrespective of the type of materialization chosen.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2584) Add management features to materialization
by Ramesh Reddy (JIRA)
[ https://issues.jboss.org/browse/TEIID-2584?page=com.atlassian.jira.plugin... ]
Ramesh Reddy commented on TEIID-2584:
-------------------------------------
> I think we'll want to introduce a new system table to expose the transformation sql rather than through the properties table. We can use a sysadmin table as to not expose inadvertently expose transformation logic that may be intended to be more private and we can ensure that we aren't colliding with any existing property key.
We can add the "Teiid relational URI" name space before to avoid any name space collisions. I really do not need those properties for this feature. As I was adding the Materialization table, I added others for completeness. From the privacy aspect, should we better off tie into some role to access any system tables in general?
> With the load procedure shouldn't we return a negative status code to indicate loading/loaded rather than 0?
It is returning the number of rows loaded in that "load" call, not the status of the call.
>Should there be exception handling for inserting a status entry? Alternatively could this be done at vdb startup time?
I made the status table "valid" column as not null, since it is doing null check it will never get into "insert" if there is a entry already. But, I see your point, if user did not create table as specified then that could be issue. May be I should change it to one of the PK column. I did not choose VDB startup time to avoid adding another procedure, as I need to do "if" check.
>How would a long load or an unexpectedly failed load be handled - in the former case it seems like it's possible to have concurrent loads and in the latter it seems like some manual process would be needed to clear the loading status.
The "load" procedure is designed such that, (see begin atomic) if any process is already "loading" the matview table based on "status" table, they simply exit the load procedure. But all schedulers at cluster nodes compete to load. Right now there is no-fairness algorithm as to who gets to load. "failed-load" is same way, once the load has failed, the scheduler should try back in ttl/4 or 1 minute again to load. I was thinking with "begin atomic" concurrent loads are not possible, but I could add version based optimistic locking. I am expecting there should not be any manual clearing, that would not be ideal.
> Should the start/shutdown hooks also be definable via the vdb.xml?
Thought about it, but was not sure if the current model of vdb.xml is good vehicle for it. Also, we need to worry about distinguishing between vdb reloads, and first time loads to expose as general purpose feature. Could be worked as separate issue.
> From above #5 - How does this work in a cluster?
All nodes try to compete to load, but first one who grabs the status gets to load, assuming the status table is common among the nodes.
> What tie-ins are you thinking of exposing for incremental updates?
How about this proposal? Add two additional metadata properties on View
{code}
teiid_rel:ALLOW_MATVIEW_PARTIAL_LOADS = allow partial loads on the matview table
teiid_rel:ALLOW_MATVIEW_PARTIAL_LOAD_SCRIPT = script to load partial table
{code}
Also, I will add "LoadCount" (could be double used as version for optimistic locking) to status table. Then modify the load procedure such that, when "LoadCount > 1" and when partial loads are allowed, then instead of executing
{code}
beforeLoadScript
loadScript
afterLoadScript
{code}
will execute
{code}
partialLoadScript
{code}
Now it is up to user to provide script needed for call back into script.
> Add management features to materialization
> ------------------------------------------
>
> Key: TEIID-2584
> URL: https://issues.jboss.org/browse/TEIID-2584
> Project: Teiid
> Issue Type: Feature Request
> Reporter: Ramesh Reddy
> Assignee: Ramesh Reddy
> Labels: Beta3
> Fix For: 8.5
>
>
> Currently Teiid supports internal and external materialization features. The internal is managed completely by the Teiid query engine along with the infinispan cache.
> External materialization is completely unmanaged and left out to the user to manage it externally. This goals for this feature are unify the materialization logic for internal and external, such that both are managed similarly irrespective of the type of materialization chosen.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months
[JBoss JIRA] (TEIID-2584) Add management features to materialization
by Steven Hawkins (JIRA)
[ https://issues.jboss.org/browse/TEIID-2584?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-2584:
---------------------------------------
Looking good. A couple of thoughts and questions:
I think we'll want to introduce a new system table to expose the transformation sql rather than through the properties table. We can use a sysadmin table as to not expose inadvertently expose transformation logic that may be intended to be more private and we can ensure that we aren't colliding with any existing property key.
With the load procedure shouldn't we return a negative status code to indicate loading/loaded rather than 0?
Should there be exception handling for inserting a status entry? Alternatively could this be done at vdb startup time?
How would a long load or an unexpectedly failed load be handled - in the former case it seems like it's possible to have concurrent loads and in the latter it seems like some manual process would be needed to clear the loading status.
Should the start/shutdown hooks also be definable via the vdb.xml?
>From above #5 - How does this work in a cluster?
What tie-ins are you thinking of exposing for incremental updates?
> Add management features to materialization
> ------------------------------------------
>
> Key: TEIID-2584
> URL: https://issues.jboss.org/browse/TEIID-2584
> Project: Teiid
> Issue Type: Feature Request
> Reporter: Ramesh Reddy
> Assignee: Ramesh Reddy
> Labels: Beta3
> Fix For: 8.5
>
>
> Currently Teiid supports internal and external materialization features. The internal is managed completely by the Teiid query engine along with the infinispan cache.
> External materialization is completely unmanaged and left out to the user to manage it externally. This goals for this feature are unify the materialization logic for internal and external, such that both are managed similarly irrespective of the type of materialization chosen.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 4 months