[jboss-jira] [JBoss JIRA] (JGRP-1448) FILE_PING: Fail to read node file
Peter Nerg (JIRA)
jira-events at lists.jboss.org
Fri Apr 13 08:32:52 EDT 2012
Peter Nerg created JGRP-1448:
--------------------------------
Summary: FILE_PING: Fail to read node file
Key: JGRP-1448
URL: https://issues.jboss.org/browse/JGRP-1448
Project: JGroups
Issue Type: Patch
Affects Versions: 2.12.3
Environment: Any O/S with a NFS or other type of shared file system
Reporter: Peter Nerg
Assignee: Bela Ban
Attachments: FILE_PING.java
When using the FILE_PING protocol it will periodically print the following in the log:
2012-03-19 16:20:41,057 [ Timer-5,<ADDR>] WARN [org.jgroups.protocols.FILE_PING] failed reading 83dc9dfe-8dd4-eff2-4474-d57dbaa96143.node: removing it
This is most likely due to that all members write randomly to the same directory and reading is done without any synchronization to the writes.
Hence running for long enough some point in time the read file will be corrupt.
This occurs more often the slower the shared file system is (e.g. a slow NFS mount).
I will uploaded a patch in which there are two modifications to the FILE_PING class.
1) Writing to files are done in two steps.
First we write to a temporary file in order to avoid that the "readAll" methods picks up a half written file.
Then we do a semi-atomic move of the tmp file to the proper node fil
2) Reading all node files will perform a few re-attempts should it fail to read a file.
This is to provide a simple re-try mechanism should the file be half written and therefore not readable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list