[jbosstools-issues] [JBoss JIRA] Commented: (JBIDE-9357) Using JBoss Tools Properties Editor, some (but not all) non-ASCII characters are changed to their equivalent \u escaped version

Viacheslav Kabanovich (JIRA) jira-events at lists.jboss.org
Tue Jul 19 19:47:23 EDT 2011


    [ https://issues.jboss.org/browse/JBIDE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615295#comment-12615295 ] 

Viacheslav Kabanovich commented on JBIDE-9357:
----------------------------------------------

I have studied description of UTF-8. Only ASCII characters 0..127 should be encoded with 1 byte, other characters should be encoded with sequences of 2, 3, or 4 bytes, each byte in sequence belongs to range 128..255. 
The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."
However, some decoder implementations try to resolve invalid byte sequences, for example in case of 'fédéral' separate byte 'é' can be treated as one-byte character, because it cannot occur separately in a well formed UTF-8 text. But there must be a lot of words that cannot be resolved unambiguously, and I think that we should not support any kind of resolution algorithms, neither of them being recommended by the Unicode Standard.

We read file content with standard Java readers, and parse properties with algorithm implemented in java.util.Properties. If it understands 'é' in 'fédéral' - very good, but we should not do anything to 'improve' that. In the same way, to serialize properties we use algorithm implemented in java.util.Properties, if it would not save 'é' as 'é' but as '\u00E9' - we should not 'improve' it to generate invalid byte sequences.

All we can do and are doing now is to check that some property has not been touched in UI editor and so may be left as is. Our algorithm for checking if a property has been touched had a bug that is covered by the description. That bug fixed, our editor 
- will not modify file when open/closed; 
- will not interfere with any changes made in Source tab;
- will only rewrite entries explicitly edited in Properties tab - but in this case it will use the standard algorithm so that any 'é' will turn into '\u00E9'.

> Using JBoss Tools Properties Editor, some (but not all) non-ASCII characters are changed to their equivalent \u escaped version
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JBIDE-9357
>                 URL: https://issues.jboss.org/browse/JBIDE-9357
>             Project: Tools (JBoss Tools)
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.1.0.GA
>         Environment: Windows 7, Eclipse Galileo, JBoss Properties 
>            Reporter: Matthew Farwell
>            Assignee: Viacheslav Kabanovich
>              Labels: eclipse, properties
>             Fix For: 3.3.0.M3, 3.3.0.Beta1
>
>         Attachments: foo.properties, jboss-foo.properties-properties.png, jboss-plugins.png
>
>
> In Eclipse Galileo
> 1) Create a file foo.properties
> 2) Change the project & file properties to be UTF-8 (See jboss-foo.properties-properties.png)
> 2) Enter the following lines into foo.properties
> barbar=fédéral
> foobar=Numéro 
> 3) Close the foo.properties
> 4) Reopen foo.properties, the file now looks like:
> barbar=fédéral
> foobar=Num\u00E9ro
> If I save this file, the file gets saved in this form. This screws up the way that these strings are displayed on the site. Note that only the Numéro is transformed.
> Additional info:
> The files are read using org.springframework.context.support.ReloadableResourceBundleMessageSource, so the \u00E9 isn't interpreted correctly.
> For exact version number (3.1.0.v200910281724M-H247-M4), see jboss-plugins.png. (
> The workaround for this problem is 1) To have a unit test to find strings which have been badly transformed. 2) Don't use JBoss Tools Properties Editor :-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the jbosstools-issues mailing list