[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1504) Full Journal Compactation

Wed Jun 10 08:23:56 EDT 2009

     [ https://jira.jboss.org/jira/browse/JBMESSAGING-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Fox updated JBMESSAGING-1504:
---------------------------------

               Issue Type: Task  (was: Feature Request)
    JBoss Forum Reference: http://www.jboss.org/index.html?module=bb&op=viewtopic&t=156771
              Description: 
Let me describe how I think the compacting should work.

Let's say we have a set of files in the journal:

F0, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10

min files is set to 10

The current file is F6 and F7, F8, F9, F10 are unused files waiting to be used

F0, F1, F2, F3, F4, F5 are files full of data and not open any more, however, data about their contents are stored in memory.

We need a compacting thread that intermittently (say every 30 seconds) scans the set of journal files in memory.

It scans F0... F5, and when it has completed that scan, it can decide which records are no longer needed in each file.

A record is no longer needed in a file if it has already been deleted in any file F0.. F5.

Once the scanner thread has computed which records are needed and which are not, it can compute a percentage of which records in total are dead space.

E.g. the scanner might compute that 72% of the data in files F0.. .F5 are dead space.

We then have a parameter that the user can configure compactorDeadSpaceThresholdPercentage, e.g. this might have a default of 75%.

If the amount of dead space computed by the scanner >= compactorDeadSpaceThresholdPercentage, then the scanner will compact those files.

The actual compacting approach goes as follows:

The scanner knows how many new files it will need for the compacted records, let's say it needs two new files - it can get these from the unused files (e.g. F7, F8, F9 or F10) if they are available.

It then opens the old files F0..F5 loads the wanted records into memory in blocks and copies them to the new files.

When this process is finished it will end up with, say, two new files containing the compacted records.

We can then save a marker file (empty in the journal directory) which says we are starting the rename.

The old files F0..F5 then need to be renamed so they are no longer used by the journal (but can be recovered in case of a crash).

Then the two new files need to be renamed so they will be picked up by the journal.

When that process completes the marker file can be deleted.

If the server crashes after renaming the old files but before renaming the new files, then the marker file will still exist in the journal. The journal can detect this on startup, and finish the process.

This ensures the journal will still startup after such a crash.

Also the JournalFile objects also need to be updated in memory when the compact is complete.
                 Priority: Critical  (was: Major)

> Full Journal Compactation
> -------------------------
>
>                 Key: JBMESSAGING-1504
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1504
>             Project: JBoss Messaging
>          Issue Type: Task
>            Reporter: Clebert Suconic
>            Assignee: Clebert Suconic
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Let me describe how I think the compacting should work.
> Let's say we have a set of files in the journal:
> F0, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10
> min files is set to 10
> The current file is F6 and F7, F8, F9, F10 are unused files waiting to be used
> F0, F1, F2, F3, F4, F5 are files full of data and not open any more, however, data about their contents are stored in memory.
> We need a compacting thread that intermittently (say every 30 seconds) scans the set of journal files in memory.
> It scans F0... F5, and when it has completed that scan, it can decide which records are no longer needed in each file.
> A record is no longer needed in a file if it has already been deleted in any file F0.. F5.
> Once the scanner thread has computed which records are needed and which are not, it can compute a percentage of which records in total are dead space.
> E.g. the scanner might compute that 72% of the data in files F0.. .F5 are dead space.
> We then have a parameter that the user can configure compactorDeadSpaceThresholdPercentage, e.g. this might have a default of 75%.
> If the amount of dead space computed by the scanner >= compactorDeadSpaceThresholdPercentage, then the scanner will compact those files.
> The actual compacting approach goes as follows:
> The scanner knows how many new files it will need for the compacted records, let's say it needs two new files - it can get these from the unused files (e.g. F7, F8, F9 or F10) if they are available.
> It then opens the old files F0..F5 loads the wanted records into memory in blocks and copies them to the new files.
> When this process is finished it will end up with, say, two new files containing the compacted records.
> We can then save a marker file (empty in the journal directory) which says we are starting the rename.
> The old files F0..F5 then need to be renamed so they are no longer used by the journal (but can be recovered in case of a crash).
> Then the two new files need to be renamed so they will be picked up by the journal.
> When that process completes the marker file can be deleted.
> If the server crashes after renaming the old files but before renaming the new files, then the marker file will still exist in the journal. The journal can detect this on startup, and finish the process.
> This ensures the journal will still startup after such a crash.
> Also the JournalFile objects also need to be updated in memory when the compact is complete.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira