[jboss-jira] [JBoss JIRA] (JGRP-1448) FILE_PING: Fail to read node file

Peter Nerg (JIRA) jira-events at lists.jboss.org
Fri Apr 13 08:32:52 EDT 2012


Peter Nerg created JGRP-1448:
--------------------------------

             Summary: FILE_PING: Fail to read node file
                 Key: JGRP-1448
                 URL: https://issues.jboss.org/browse/JGRP-1448
             Project: JGroups
          Issue Type: Patch
    Affects Versions: 2.12.3
         Environment: Any O/S with a NFS or other type of shared file system
            Reporter: Peter Nerg
            Assignee: Bela Ban
         Attachments: FILE_PING.java

When using the FILE_PING protocol it will periodically print the following in the log:
2012-03-19 16:20:41,057 [ Timer-5,<ADDR>] WARN  [org.jgroups.protocols.FILE_PING] failed reading 83dc9dfe-8dd4-eff2-4474-d57dbaa96143.node: removing it 

This is most likely due to that all members write randomly to the same directory and reading is done without any synchronization to the writes.
Hence running for long enough some point in time the read file will be corrupt.
This occurs more often the slower the shared file system is (e.g. a slow NFS mount).

I will uploaded a patch in which there are two modifications to the FILE_PING class.
1) Writing to files are done in two steps.
First we write to a temporary file in order to avoid that the "readAll" methods picks up a half written file.
Then we do a semi-atomic move of the tmp file to the proper node fil

2) Reading all node files will perform a few re-attempts should it fail to read a file.
This is to provide a simple re-try mechanism should the file be half written and therefore not readable.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list