[teiid-issues] [JBoss JIRA] (TEIID-5092) Joining tables from S3 occasionaly fails with NumberFormatException

Andrej Šmigala (JIRA) issues at jboss.org
Tue Oct 10 03:11:00 EDT 2017


    [ https://issues.jboss.org/browse/TEIID-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474293#comment-13474293 ] 

Andrej Šmigala commented on TEIID-5092:
---------------------------------------

There is no bad data involved here.

This exception comes from SimpleDateFormat.parse method. On [line 488|https://github.com/teiid/teiid/blob/64-8.12.x/connectors/translator-amazon-s3/src/main/java/org/teiid/translator/amazon/s3/S3ProcedureExecution.java#L488] in S3ProcedureExecution, that method is called with the value of the "Last-Modified" header returned by S3 from the http call. I verified the headers returned by S3 are correct, even in the situation where the exception occurs.

What I'm assuming happens is that when doing a join, multiple S3ProcedureExecution instances are created and their next() methods are called from different threads, which in turn means the parse() method is called on the same SimpleDateFormat instance from multiple threads. SimpleDateFormat is [not thread safe|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html#synchronization], so this results in its internal data to become corrupted and throw NumberFormatException from deep within.

I verified this assumption by removing the static modifier from the df field on [line 73|https://github.com/teiid/teiid/blob/64-8.12.x/connectors/translator-amazon-s3/src/main/java/org/teiid/translator/amazon/s3/S3ProcedureExecution.java#L73], after which the exception never occured. (I did not create a PR with this change because instantiation of a SimpleDateFormat is reportedly an expensive operation, so this will require a proper synchronization).

> Joining tables from S3 occasionaly fails with NumberFormatException
> -------------------------------------------------------------------
>
>                 Key: TEIID-5092
>                 URL: https://issues.jboss.org/browse/TEIID-5092
>             Project: Teiid
>          Issue Type: Bug
>          Components: Misc. Connectors
>    Affects Versions: 8.12.x-6.4
>            Reporter: Andrej Šmigala
>            Assignee: Steven Hawkins
>            Priority: Critical
>
> Running any query that reads multiple tables from an Amazon S3 datasource sometimes fails with the following exception:
> {noformat}
> [org.teiid.CONNECTOR] (Worker0_QueryProcessorQueue3373) Connector worker process failed for atomic-request=ZwjBqkPjJdw2.386.4.689: java.lang.NumberFormatException: For input string: ""
>   at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [rt.jar:1.8.0_131]
>   at java.lang.Long.parseLong(Long.java:601) [rt.jar:1.8.0_131]
>   at java.lang.Long.parseLong(Long.java:631) [rt.jar:1.8.0_131]
>   at java.text.DigitList.getLong(DigitList.java:195) [rt.jar:1.8.0_131]
>   at java.text.DecimalFormat.parse(DecimalFormat.java:2051) [rt.jar:1.8.0_131]
>   at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1869) [rt.jar:1.8.0_131]
>   at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514) [rt.jar:1.8.0_131]
>   at java.text.DateFormat.parse(DateFormat.java:364) [rt.jar:1.8.0_131]
>   at org.teiid.translator.amazon.s3.S3ProcedureExecution.next(S3ProcedureExecution.java:488) [translator-amazon-s3-8.12.11.6_4-redhat-64-6.jar:8.12.11.6_4-redhat-64-6]
>   at org.teiid.dqp.internal.datamgr.ConnectorWorkItem.handleBatch(ConnectorWorkItem.java:433)
>   at org.teiid.dqp.internal.datamgr.ConnectorWorkItem.more(ConnectorWorkItem.java:236)
>   at sun.reflect.GeneratedMethodAccessor211.invoke(Unknown Source) [:1.8.0_131]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131]
>   at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131]
>   at org.teiid.dqp.internal.datamgr.ConnectorManager$1.invoke(ConnectorManager.java:211)
>   at com.sun.proxy.$Proxy79.more(Unknown Source)
>   at org.teiid.dqp.internal.process.DataTierTupleSource.getResults(DataTierTupleSource.java:309)
>   at org.teiid.dqp.internal.process.DataTierTupleSource$1.call(DataTierTupleSource.java:112)
>   at org.teiid.dqp.internal.process.DataTierTupleSource$1.call(DataTierTupleSource.java:108)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
>   at org.teiid.dqp.internal.process.FutureWork.run(FutureWork.java:65)
>   at org.teiid.dqp.internal.process.DQPWorkContext.runInContext(DQPWorkContext.java:280)
>   at org.teiid.dqp.internal.process.ThreadReuseExecutor$RunnableWrapper.run(ThreadReuseExecutor.java:119)
>   at org.teiid.dqp.internal.process.ThreadReuseExecutor$3.run(ThreadReuseExecutor.java:210)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
>   at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
> {noformat}.
> The actual input string reffered to in the stack trace occasionaly contains garbage data.
> This is caused by unsynchronized access to a static SimpleDateFormat field here: https://github.com/teiid/teiid/blob/64-8.12.x/connectors/translator-amazon-s3/src/main/java/org/teiid/translator/amazon/s3/S3ProcedureExecution.java#L488 



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)



More information about the teiid-issues mailing list