[jboss-jira] [JBoss JIRA] (JGRP-1448) FILE_PING: Fail to read node file

Wed May 16 07:09:18 EDT 2012

    [ https://issues.jboss.org/browse/JGRP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693232#comment-12693232 ] 

Peter Nerg commented on JGRP-1448:
----------------------------------

It's ok, I've been feeling a bit guilty over this.
I've been quite busy with other stuff and we are already running the patched version in our systems.
Since we won't take v3+ of Jgroups as of current I had to patch the 2.x version we use and go with that.

Also I realized it would take more effort than I expected to lift the patch as it seems that the FilePing class has changed in later versions.
I've also got issues to get the current main/master branch to compile. I seem to be missing the class JUnitXMLReporter. 

I'll try to lift the patch a.s.a.p.

> FILE_PING: Fail to read node file
> ---------------------------------
>
>                 Key: JGRP-1448
>                 URL: https://issues.jboss.org/browse/JGRP-1448
>             Project: JGroups
>          Issue Type: Patch
>    Affects Versions: 2.12.3
>         Environment: Any O/S with a NFS or other type of shared file system
>            Reporter: Peter Nerg
>            Assignee: Bela Ban
>              Labels: FILE_PING, jgroups
>             Fix For: 3.0.11, 3.1
>
>         Attachments: FILE_PING.java, FILE_PING.java
>
>
> When using the FILE_PING protocol it will periodically print the following in the log:
> 2012-03-19 16:20:41,057 [ Timer-5,<ADDR>] WARN  [org.jgroups.protocols.FILE_PING] failed reading 83dc9dfe-8dd4-eff2-4474-d57dbaa96143.node: removing it 
> This is most likely due to that all members write randomly to the same directory and reading is done without any synchronization to the writes.
> Hence running for long enough some point in time the read file will be corrupt.
> This occurs more often the slower the shared file system is (e.g. a slow NFS mount).
> I will uploaded a patch in which there are two modifications to the FILE_PING class.
> 1) Writing to files are done in two steps.
> First we write to a temporary file in order to avoid that the "readAll" methods picks up a half written file.
> Then we do a semi-atomic move of the tmp file to the proper node fil
> 2) Reading all node files will perform a few re-attempts should it fail to read a file.
> This is to provide a simple re-try mechanism should the file be half written and therefore not readable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira