[jboss-jira] [JBoss JIRA] (ELY-525) Support our own Unicode normalizer

Wed May 4 13:51:00 EDT 2016

     [ https://issues.jboss.org/browse/ELY-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Lloyd updated ELY-525:
----------------------------
    Description: 
We should do our own Unicode normalizer, because the JDK one is not good performance-wise or memory-wise, and doesn't integrate well with the authentication mechanisms which require normalization.

It would be accessible off of {{CodePointIterator}} as a few methods:

* {{decomposeCanonical}}
* {{decomposeCompatibility}}
* {{composeCanonical}}

These methods could be chained in various ways to achieve the standard defined normalization types:

* NFD = {{decomposeCanonical}}
* NFC = {{decomposeCanonical}} + {{composeCanonical}}
* NFKD = {{decomposeCompatibility}}
* NFKC = {{decomposeCompatibility}} + {{composeCanonical}}

The types and behaviors are defined here: http://www.unicode.org/reports/tr15/tr15-43.html

The implementations should be lazy.  If possible they should be implemented in code as opposed to data tables, possibly one class per operation type per Unicode version so that only the necessary transformations are loaded/initialized.  The code could potentially be generated from tables and rules by a Maven plugin or annotation processor (see https://github.com/jdeparser/jdeparser2 for one option).

  was:
We should do our own Unicode normalizer, because the JDK one is not good performance-wise or memory-wise, and doesn't integrate well with the authentication mechanisms which require normalization.

It would be accessible off of {CodePointIterator} as a few methods:
* {decomposeCanonical}
* {decomposeCompatibility}
* {composeCanonical}

These methods could be chained in various ways to achieve the standard defined normalization types:
* NFD = {decomposeCanonical}
* NFC = {decomposeCanonical} + {composeCanonical}
* NFKD = {decomposeCompatibility}
* NFKC = {decomposeCompatibility} + {composeCanonical}

The types and behaviors are defined here: http://www.unicode.org/reports/tr15/tr15-43.html

The implementations should be lazy.  If possible they should be implemented in code as opposed to data tables, possibly one class per operation type per Unicode version so that only the necessary transformations are loaded/initialized.  The code could potentially be generated from tables and rules by a Maven plugin or annotation processor (see https://github.com/jdeparser/jdeparser2 for one option).

> Support our own Unicode normalizer
> ----------------------------------
>
>                 Key: ELY-525
>                 URL: https://issues.jboss.org/browse/ELY-525
>             Project: WildFly Elytron
>          Issue Type: Feature Request
>            Reporter: David Lloyd
>            Priority: Minor
>
> We should do our own Unicode normalizer, because the JDK one is not good performance-wise or memory-wise, and doesn't integrate well with the authentication mechanisms which require normalization.
> It would be accessible off of {{CodePointIterator}} as a few methods:
> * {{decomposeCanonical}}
> * {{decomposeCompatibility}}
> * {{composeCanonical}}
> These methods could be chained in various ways to achieve the standard defined normalization types:
> * NFD = {{decomposeCanonical}}
> * NFC = {{decomposeCanonical}} + {{composeCanonical}}
> * NFKD = {{decomposeCompatibility}}
> * NFKC = {{decomposeCompatibility}} + {{composeCanonical}}
> The types and behaviors are defined here: http://www.unicode.org/reports/tr15/tr15-43.html
> The implementations should be lazy.  If possible they should be implemented in code as opposed to data tables, possibly one class per operation type per Unicode version so that only the necessary transformations are loaded/initialized.  The code could potentially be generated from tables and rules by a Maven plugin or annotation processor (see https://github.com/jdeparser/jdeparser2 for one option).

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)