[gatein-issues] [JBoss JIRA] (GTNCOMMON-24) FastURLDecoder cannot decode surrogate pair characters

Peter Palaga (JIRA) issues at jboss.org
Fri Feb 27 06:08:49 EST 2015


    [ https://issues.jboss.org/browse/GTNCOMMON-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044621#comment-13044621 ] 

Peter Palaga edited comment on GTNCOMMON-24 at 2/27/15 6:08 AM:
----------------------------------------------------------------

[~tkonishi], the idea behind the attached test case is to compare the output of {{java.net.URLEncoder}} with the output of {{org.gatein.common.text.FastURLDecoder}}, right? But if this was your intention, then the test seems to be doing something different: it basically compares {{URLEncoder.encode(s, "UTF8")}} with {{FastURLDecoder.getUTF8Instance().encode(URLEncoder.encode(s, "UTF8"), out)}}. Note that the second expression encodes twice. Did you perhaps want something like this?
{code}
   public void testEncodeSurrogatePair() throws Exception
   {
      FastURLDecoder encoder = FastURLDecoder.getUTF8Instance();
      CharBuffer out = new CharBuffer();
      StringBuilder sb = new StringBuilder( 2 );
      sb.append( ( char ) 0xD840 );
      sb.append( ( char ) 0xDC0B );
      String hanU2000B = sb.toString(); // U+2000B
      String encodedWithURLEncoder = URLEncoder.encode(hanU2000B, "UTF8");
      encoder.encode(hanU2000B, out);
      assertEquals(encodedWithURLEncoder, out.asString());
   }
{code}


was (Author: ppalaga):
[~tkonishi], the idea behind the attached test case is to compare the output of {{java.net.URLEncoder}} with the output of {{org.gatein.common.text.FastURLDecoder}}. But if this was your intention, then the test seems to be doing something different: it basically compares {{URLEncoder.encode(s, "UTF8")}} with {{FastURLDecoder.getUTF8Instance().encode(URLEncoder.encode(s, "UTF8"), out)}}. Note that the second expression encodes twice. Did you perhaps want something like this?
{code}
   public void testEncodeSurrogatePair() throws Exception
   {
      FastURLDecoder encoder = FastURLDecoder.getUTF8Instance();
      CharBuffer out = new CharBuffer();
      StringBuilder sb = new StringBuilder( 2 );
      sb.append( ( char ) 0xD840 );
      sb.append( ( char ) 0xDC0B );
      String hanU2000B = sb.toString(); // U+2000B
      String encodedWithURLEncoder = URLEncoder.encode(hanU2000B, "UTF8");
      encoder.encode(hanU2000B, out);
      assertEquals(encodedWithURLEncoder, out.asString());
   }
{code}

> FastURLDecoder cannot decode surrogate pair characters
> ------------------------------------------------------
>
>                 Key: GTNCOMMON-24
>                 URL: https://issues.jboss.org/browse/GTNCOMMON-24
>             Project: GateIn Common
>          Issue Type: Bug
>            Reporter: Takayuki Konishi
>         Attachments: surrogatepairtest.patch
>
>
> FastURLDecoder cannot decode surrogate pair characters.
> When I decoded [U+20000B|http://www.fileformat.info/info/unicode/char/2000B/index.htm], I got MalformedInputException:
> {code}
> org.gatein.common.text.MalformedInputException: Cannot decode char 'A0'
> 	at org.gatein.common.text.FastURLDecoder.safeEncode(FastURLDecoder.java:217)
> 	at org.gatein.common.text.AbstractCharEncoder.encode(AbstractCharEncoder.java:45)
> 	at org.gatein.common.text.AbstractCharEncoder.encode(AbstractCharEncoder.java:62)
> 	at org.gatein.common.text.FastURLDecoderTestCase.testEncodeSurrogatePair(FastURLDecoderTestCase.java:159)
> {code}
> I also attach a patch for testcase.



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the gatein-issues mailing list