[
https://jira.jboss.org/jira/browse/DNA-278?page=com.atlassian.jira.plugin...
]
Randall Hauch resolved DNA-278.
-------------------------------
Resolution: Done
Created the implementation class by using a single structure to hold for each character a
bitmask that describes the set of classifications that character belongs to. For example,
'a' belongs to multiple character classes, so the bitmask ORs together the bitmask
of each of those character classes. A simple array is used to store all the bitmasks,
indexed by the integer value of the character. Then, each method (e.g., to check whether
a character is a valid NCName starting character) simply has to look up the mask and
perform a single bit operation. (Some methods also have some shortcuts, as several of the
classes have a very large range of characters (at the upper end) that are or are not in
the character class, and these can be checked by simply comparing the integer value of the
character with the lower end of the range.)
Create new utility class for determining validity of XML NCNames
----------------------------------------------------------------
Key: DNA-278
URL:
https://jira.jboss.org/jira/browse/DNA-278
Project: DNA
Issue Type: Task
Components: Common
Affects Versions: 0.3
Reporter: Randall Hauch
Assignee: Randall Hauch
Fix For: 0.4
A namespace prefix in JCR must be a valid NCName, as defined by the XML specification.
We need a utility to perform this checking. Currently, 'dna-common' is dependent
upon only a minimal set of libraries (e.g,. logging and the optional JCIP annotations),
and these do not have this utility. A few libraries do have such utilities, but then
we'd be dependent upon a library just for one class.
Implementation-wise, it doesn't seem to be too tough a nut to crack. The only trick
is how to quickly identify whether a character matches one (or more) of the character
classes (as defined by the spec). One approach would be to use ranges within each method,
but that would require doing multiple math operations on each lookup. The other approach
would be to precompute the information so that lookups are fast. Even with this latter
approach there are multiple options for the structure: have a bit set for each class
keyed by character; the other is to have a one bitmask for each character (in a map or
array). The latter seems fastest (single index and a few bit operations) and smallest in
memory footprint (only one data structure the size of the characters).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira