[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-5371) Add support for REVEND_TSTMP which will enable SQL table partitioning by time

Thu Sep 30 03:55:59 EDT 2010

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38555#action_38555 ] 

Erik-Berndt Scheper commented on HHH-5371:
------------------------------------------

With regard to the rationale why the timestamp is held for the end revision, I'd like to add the following.

As a starting point: I believe that audit tables should be partionioned on 'increasing level of interestingness'.
I.e. there should (or could) be a partition with audit data that is not very interesting, another for audit data that might be interesting, and one for audit data that is very interesting. Uninteresting audit data should be stored on slow media, interesting audit data should be stored on fast media.

To illustrate how this would work, I've prepared a simple example of a salary registration, e.g. for a government agency. 
The salary table currently contains the following rows for person x:
||year||salary||
|2006|EUR 3300|
|2007|EUR 3500|
|2008|EUR 4000|
|2009|EUR 4500|

The salary for the current fiscal year (2010) is unknown. The agency requires that all changes in registered salaries for a fiscal year are recorded (i.e. the audit trail). The rationale behind this is that decisions made at a certain date are based on the registered salary at that time. And at any time it must be possible reproduce the reason why a certain decision was made at a certain date. 

The following audit information is available, sorted on in order of occurrence:

||year||revtype ||revtimestamp||salary||revend_timestamp||
|2006|ADD|2007-04-01|3300|null|

||year||revtype ||revtimestamp||salary||revend_timestamp||
|2007|ADD|2008-04-01|35|2008-04-02|
|2007|MOD|2008-04-02|3500|null|

||year||revtype ||revtimestamp||salary||revend_timestamp||
|2008|ADD|2009-04-01|3700|2009-07-01|
|2008|MOD|2009-07-01|4100|2010-02-01|
|2008|MOD|2010-02-01|4000|null|

||year||revtype ||revtimestamp||salary||revend_timestamp||
|2009|ADD|2010-04-01|4500|null|

To partition this data, the 'level of interestingness' must be defined. Unfortunately, the revtimestamp cannot help here. For example the only revision for year 2006, which is the oldest of all, is still interesting. 
E.g. it would be very unfortunate if in 2011 there would be an update of the salary for fiscal year 2006 (which is possible in until at least 10 years after the fiscal year) and the audit information would have been moved to a slow disk (based on the oldness of the audit row), because the revend_timestamp must now be updated. 
Also, look at the typo-error in the salary of fiscal year 2007. Both have nearly the same revtimestamp, but the only interesting row has revend_timestamp= null.

I would say that the only column that defines a sensible 'level of interestingness' is the revend_timestamp. 

A possible division of audit table partitions would be as follows:
* revend_timestamp year = 2008 (might not be interesting)
* revend_timestamp year = 2009 (potentially interesting)
* revend_timestamp year >= 2010 or null (definitely interesting)

This also covers the 'problem' of the revend_timestamp update. Despite updating of the revend_timestamp, the audit row remains in what I tend to call the 'extension bucket'.
And sometime in 2011, the 'extension bucket' is split into 
* a partition for all audit rows with revend_timestamp year = 2010
* a partition to contain all rows with year >= 2011 or null

I hope this answers the question why the timestamp is held for the end revision.

> Add support for REVEND_TSTMP which will enable SQL table partitioning by time
> -----------------------------------------------------------------------------
>
>                 Key: HHH-5371
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-5371
>             Project: Hibernate Core
>          Issue Type: New Feature
>          Components: envers
>    Affects Versions: 3.6.0.Beta1
>            Reporter: Matthew B. Jones
>            Assignee: Erik-Berndt Scheper
>         Attachments: HHH-5371-revend_tstmp-documentation-20100908.patch, HHH-5371-revend_tstmp-revised-20100908.patch, revend_tstmp.txt
>
>
> There is strong desire to be able to purge old records out of a relational database by using table partitioning. In order to partition a table, you need to partition it on a column that exists within the table. So, in order to partition the _AUD tables by expired time, the timestamp of REVEND must be present within the table.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira