[
https://issues.jboss.org/browse/JBRULES-2936?page=com.atlassian.jira.plug...
]
Jesper S. Møller commented on JBRULES-2936:
-------------------------------------------
I was bitten by this too, but a bit of digging solved the mystery:
JXL gets this wrong, at least for BIFF8 files (binary Excel 97 files). They store all
strings in the Shared String Table (SST), and always as Unicode. For strings where all the
high-order-bytes are 0x00, they use a compressed format, where they leave all the
high-order bytes out. JXL wrongly tries to tackle those as though they were MBCS using the
system.encoding (or jxl.encoding). This is very wrong. POI gets it right. Hardcoding
jxl.encoding to ISO-8859-1 (a.k.a. Windows-1252) fixes JXL for you - for BIFF8 and up.
This is because code points U+00 - U+FF is exactly ISO-8859-1.
For versions prior to Excel 97 (BIFF8), string values were stored in LABEL followed the
codepage record which is stored in the file. I don't see neither POI or JXL doing this
right when reading an Excel 95 file, but it requires a "codepage number -> Java
encoding" table to get right, which is hard work.
But the JXL version I tried had a different bug in reading Excel95 files (containing
WRITEACCESS records, which I guess are common), so that's likely not a problem at
all.
Yeah, yeah, one day I'll do "The OSS Right Thing" and produce a proper patch
for JXL and POI.
For now, set the system properties (System.setProperty("jxl.encoding",
"ISO-8859-1");) and enjoy your decision tables!
Note how this problem is unlikely to hit you if you
A) Primarily use English text
B) Use an 8-bit "file.encoding" which has all the characters you need in the
same place as ISO-8859-1 (i.e. Windows for most Europeans)
Diversity matters! (I say from Denmark running Mac OS X)
Importing decision table from Excel: Non Ascii chars should not be
corrupted
----------------------------------------------------------------------------
Key: JBRULES-2936
URL:
https://issues.jboss.org/browse/JBRULES-2936
Project: Drools
Issue Type: Bug
Security Level: Public(Everyone can see)
Reporter: Geoffrey De Smet
Fix For: 5.4.0.Beta2
see
http://stackoverflow.com/questions/5298748/guvnor-rules-encoding
Excel (like windows) probably has crappy encoding standardization (as in none at all), so
I suspect that we 'll need to ask the excel document what encoding (or even what
locale) it is and read the data in that encoding.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira