[
https://issues.jboss.org/browse/TEIID-2383?page=com.atlassian.jira.plugin...
]
Hisanobu Okuda commented on TEIID-2383:
---------------------------------------
File translator uses buffered reader to read file. In the sample, you can see the Japanese
character '道'={0xe9 0x81 0x93} is buffered separately, i.e., {0xe9} and {0x81
0x93}.
{code}
[hokuda@localhost 00769999]$ rlwrap -l jdb.log -pRED -C jdb_teiid /opt/jdk1.6.0_37/bin/jdb
-attach localhost:8787
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
stop at org.teiid.core.util.InputStreamReader:93
Set breakpoint
org.teiid.core.util.InputStreamReader:93
Breakpoint hit:
"thread=Worker2_QueryProcessorQueue16",
org.teiid.core.util.InputStreamReader.read(), line=93 bci=178
Worker2_QueryProcessorQueue16[1] cont
Breakpoint hit:
"thread=Worker2_QueryProcessorQueue16",
org.teiid.core.util.InputStreamReader.read(), line=93 bci=178
Worker2_QueryProcessorQueue16[1] cont
Breakpoint hit:
"thread=Worker2_QueryProcessorQueue16",
org.teiid.core.util.InputStreamReader.read(), line=93 bci=178
Worker2_QueryProcessorQueue16[1] cont
Breakpoint hit:
"thread=Worker2_QueryProcessorQueue16",
org.teiid.core.util.InputStreamReader.read(), line=93 bci=178
Worker2_QueryProcessorQueue16[1] cont
Breakpoint hit:
"thread=Worker3_QueryProcessorQueue18",
org.teiid.core.util.InputStreamReader.read(), line=93 bci=178
Worker3_QueryProcessorQueue18[1] print this.bb
this.bb = "java.nio.HeapByteBuffer[pos=8191 lim=8192 cap=8192]"
Worker3_QueryProcessorQueue18[1] print this.bb.hb[8191]
this.bb.hb[8191] = -23 (=0xe9)
{code}
Before executing line 93, position=8191 and limit=8192 in this.bb. This means the last
byte remained to be decoded at next turn. The value of the last byte is -23 = 0xe9. It is
the first byte of Japanese character '道'={0xe9 0x81 0x93}.
{code}
Worker3_QueryProcessorQueue18[1] locals
Method arguments:
cbuf = instance of char[8192] (id=10433)
off = 0
len = 8192
Local variables:
read = 8191
cr = instance of java.nio.charset.CoderResult(id=10434)
Since bb.position()=8191 and read=8191, bb.clear() is invoked. Actually, bb.compact()
should be invoked though.
Worker3_QueryProcessorQueue18[1] next
Step completed:
"thread=Worker3_QueryProcessorQueue18",
org.teiid.core.util.InputStreamReader.read(), line=96 bci=201
Worker3_QueryProcessorQueue18[1] next
Step completed:
"thread=Worker3_QueryProcessorQueue18",
org.teiid.core.util.InputStreamReader.read(), line=98 bci=209
Worker3_QueryProcessorQueue18[1] print this.bb
this.bb = "java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192]"
{code}
Therefore read=8191 and bb.position()=8191, bb.clear() is invoked, the position of bb is
reset to 0, and finally the last byte this.bb.hb[8191] = -23 (=0xe9) is discarded. In next
blocking read, 0x81 and 0x93 are stored in bb.
{code}
Worker3_QueryProcessorQueue18[1] stop at org.teiid.core.util.InputStreamReader:82
Set breakpoint org.teiid.core.util.InputStreamReader:82
Worker3_QueryProcessorQueue18[1] cont
Breakpoint hit:
"thread=Worker3_QueryProcessorQueue18",
org.teiid.core.util.InputStreamReader.read(), line=82 bci=100
Worker3_QueryProcessorQueue18[1] print this.bb
this.bb = "java.nio.HeapByteBuffer[pos=0 lim=4 cap=8192]"
Worker3_QueryProcessorQueue18[1] print this.bb.hb[0]
this.bb.hb[0] = -127 (=0x81)
Worker3_QueryProcessorQueue18[1] print this.bb.hb[1]
this.bb.hb[1] = -109 (=0x93)
Worker3_QueryProcessorQueue18[1] print this.bb.hb[2]
this.bb.hb[2] = 39
Worker3_QueryProcessorQueue18[1] print this.bb.hb[3]
this.bb.hb[3] = 10
Worker3_QueryProcessorQueue18[1]
{code}
The first byte 0xe9 is missing, decoding fails.
InputStreamReader throws MalformedInputException when handling
multi-byte characters
------------------------------------------------------------------------------------
Key: TEIID-2383
URL:
https://issues.jboss.org/browse/TEIID-2383
Project: Teiid
Issue Type: Bug
Components: Common
Affects Versions: 7.7.1
Reporter: Hisanobu Okuda
Assignee: Steven Hawkins
Attachments: sample.zip, simple_client.tar
My VDB uses File translator to handle a csv file which contains multi-byte characters.
Reading some csv file via the VDB, MalformedInputException always occurs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira