[
https://jira.jboss.org/jira/browse/DNA-522?page=com.atlassian.jira.plugin...
]
Randall Hauch commented on DNA-522:
-----------------------------------
ANTLR 3.0 by default treats keywords used in the parser as special tokens in the lexer,
meaning that the lexer will always generate a keyword token whenever such a word appears
in the input (regardless of whether that word is indeed a proper use of that keyword or
just a coincidental match). To fix this requires a lot of nasty workarounds, and this
makes the generated parser really large.
So instead of using ANTLR, the parser was changed to use the TokenStream class added late
last release. This is a very simple framework that makes it possible to write a parser
that is fairly efficient but extremely easy to understand when reading and extremely easy
to debug. Plus, the total lines of code for this CND importer/parser are actually
reduced, even when excluding the generated code from the stats. The same CndImporter
interface was kept, so there were no other changes outside of the 'dna-cnd'
project.
Also, one of the test cases used to this point attempted to import a namespace mapping
where the URI was not quoted. Technically this is not possible according to the JCR 2.0
Public Final Draft specification, and the reference implementation does not appear to
support it. Actually, there are several errors in the JCR 2.0 PFD specification section
that talks about the CND grammar. In particular, the definition of a string is unclear in
the very least and like in error, since it attempts to define a string as a sequence of
one or more XmlChar, where XmlChar appears to be defined as any of the characters allowed
in the Char production of the XML specification (
http://www.w3.org/TR/xml/#NT-Char) and
therefore would allow nearly any unicode character (including whitespace, newlines, etc).
This is clearly NOT the behavior of the reference implementation, which treats a CND
unquoted string simply as '[A-Za-z0-9:_]+'. Therefore, the string behavior of the
parser was cleaned up a bit, though it still is more lenient than the reference
implementation. Basically, our unquoted string is any non-whitespace character except the
following: []<>=-+(),\"'/{*|
As a result of these changes, all existing unit tests and integration tests pass (though
some of the dna-cnd tests were changed and new ones were added).
CND files that use keywords as names cannot be read in
------------------------------------------------------
Key: DNA-522
URL:
https://jira.jboss.org/jira/browse/DNA-522
Project: DNA
Issue Type: Bug
Components: JCR, Sequencers
Affects Versions: 0.6
Reporter: Randall Hauch
Assignee: Randall Hauch
Fix For: 0.7
The CND parser fails to load CND files that use keywords as unquoted names. Apparently
this is a 'feature' of ANTLR, since the generated lexer always returns keyword
tokens whenever a token appears in the file (regardless of the position).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira