SessionCleanupThread causes deadlocks when server is clustered
--------------------------------------------------------------
Key: TEIID-1058
URL:
https://jira.jboss.org/jira/browse/TEIID-1058
Project: Teiid
Issue Type: Bug
Components: Server
Reporter: Jerry Helbling
Assignee: Steven Hawkins
Priority: Minor
Fix For: 8.0
Note: The following problem exists in MetaMatrix 5.5.x. I am creating this Teiid JIRA so
that a fix for it (if necessary) can be considered in that product. There are no plans to
fix it in MetaMatrix 5.5.x.
When SessionID cleanup is done, either at a scheduled interval or at server shutdown,
there will sometimes be deadlocks if the MetaMatrix servers are clustered and/or have
multiple processes.
This is because all of the processes will do the full cleanup on whatever inactive and
aged sessionids are in the database. And each of them uses a single transaction to do all
of the deletes, no matter how many sessionids need to be deleted.
This is hugely redundant and subject to deadlocks. If we had a concept of a 'lead'
host in a cluster, it could be assigned this task, but we don't.
Possible solutions:
1) Adding an ORDER BY to the SELECT we use to retrieve the old sessionids. This forces
the processes to delete the sessionids in the same order, reducing the likelihood of true
deadlocks. However, in testing it did not prevent the deadlocks.
2) Refactor the cleanup code so that it does a smaller amount of work per transaction.
Currently each host will attempt to do the entire cleanup process in a single
transaction. With a default TTL of 10 hours, that could be hundreds or thousands of
deletable sessionids in a busy environment. The process could be rewritten to
discover/delete/commit a few (say 10, or even just 1) at a time.
3) Add "FOR UPDATE NOWAIT" to the SELECT that is used to retrieve the
sessionids that are ready to delete.
But this syntax is only legal for Oracle and PostgreSQL. So that may not be such good
solution.
We have decided not to pursue this fix in 5.5.4. Whether the conditions will be the same
in Teiid is not clear, but if they are this note is offered as a contribution toward
solving it in Teiid.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira