[jboss-jira] [JBoss JIRA] (JGRP-1448) FILE_PING: Fail to read node file

Sun May 20 13:13:17 EDT 2012

    [ https://issues.jboss.org/browse/JGRP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694174#comment-12694174 ] 

Peter Nerg edited comment on JGRP-1448 at 5/20/12 1:12 PM:
-----------------------------------------------------------

Patch for the JGRP-1448 tracker.
Contains modifications to the FILE_PING file.

I did the patch against the master branch.
This is my first official patch using Git (I'm more familiar with SVN so it's a learning curve) so hopefully I managed to create something useful.

      was (Author: peter.nerg):
    Patch for the JGRP-1448 tracker.
Contains modifications to the FILE_PING file.

> FILE_PING: Fail to read node file
> ---------------------------------
>
>                 Key: JGRP-1448
>                 URL: https://issues.jboss.org/browse/JGRP-1448
>             Project: JGroups
>          Issue Type: Patch
>    Affects Versions: 2.12.3
>         Environment: Any O/S with a NFS or other type of shared file system
>            Reporter: Peter Nerg
>            Assignee: Bela Ban
>              Labels: FILE_PING, jgroups
>             Fix For: 3.0.11, 3.1
>
>         Attachments: FILE_PING.java, FILE_PING.java, JGRP-1448.patch
>
>
> When using the FILE_PING protocol it will periodically print the following in the log:
> 2012-03-19 16:20:41,057 [ Timer-5,<ADDR>] WARN  [org.jgroups.protocols.FILE_PING] failed reading 83dc9dfe-8dd4-eff2-4474-d57dbaa96143.node: removing it 
> This is most likely due to that all members write randomly to the same directory and reading is done without any synchronization to the writes.
> Hence running for long enough some point in time the read file will be corrupt.
> This occurs more often the slower the shared file system is (e.g. a slow NFS mount).
> I will uploaded a patch in which there are two modifications to the FILE_PING class.
> 1) Writing to files are done in two steps.
> First we write to a temporary file in order to avoid that the "readAll" methods picks up a half written file.
> Then we do a semi-atomic move of the tmp file to the proper node fil
> 2) Reading all node files will perform a few re-attempts should it fail to read a file.
> This is to provide a simple re-try mechanism should the file be half written and therefore not readable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira