Ike,
I got a chance to dig into the issue this weekend and I believe I have
found the cause (or causes, as the case may be). First, the code currently
in the HTTP Server does not assume that there is only one body part in the
multipart POST (it assumes there could be a number of fields with one of
them being the actual deployment being uploaded). This was something that I
did not include originally, but added later after Jason and I discussed how
the upload API may need to be used in the future (i.e. what would happen if
the form post needs to provide other information in addition to the
deployment as part of the upload). Therefore, the code keeps reading
header/body pairs from the stream until it finds one of content-type
"application/octet-stream". However, upon further investigation (and using
different browsers), it appears that different browsers provide different
content types for the input type="file" input element in a
"multipart/form-data" form (for instance, Firefox 3.x reports
application/octet-stream, while IE 9 reports application/x-zip-compressed
for a .war file and I suspect Safari is doing something different as well).
There is definitely a defect in the logic that reads through the stream
where it could never exit the outer while loop in the extractPostContent
method if it never finds a part of type application/octet-stream. I have
posted a patch to fix the issue here:
https://github.com/jdpgrailsdev/jboss-as/commit/83f0d293e018a550562cd0381....
Note that this patch does NOT fix the issue with the different content-types
-- it merely fixes the OOM and returns a HTTP 500 if we do not find a part
of type application/octet-stream in the POST (see below for more thoughts on
what to do with this). I have added a comment to this effect on JBAS-9268.
On a related note, it appears that your patch does not handle the case
described above (mutliple parts in the form post that need to be inspected
to find the actual deployment -- please correct me if I am reading the code
wrong!). From what I can tell, the proposed patch consumes the header (i.e.
the part that contains the content-disposition and content-type
declarations) and then reads the body of the part, writing it to the output
stream point to the temporary file. It repeats this process until there are
no more parts to read (this means that in the case of multiple parts, the
bodies will be concatenated together in the output file). It *seems* as
though the new patch assumes only one part for the deployment or that the
deployment itself will be split across multiple boundaries. This is not the
assumption that the original code was produced under (I will defer to Jason
on this as regards to whether or not we need the ability to handle upload
requests with multiple parts with only one of those parts being the actual
deployment).
I guess this leads us to a decision point as to how to modify the HTTP
server:
1) Patch the existing code to a) not produce OOM when no deployment is found
in the incoming POST and b) modify the code to look for the appropriate
content-types that we need to support (instead of just
application/octet-stream). This would mean that we want to keep the logic
to handle inspecting each part of the multipart form post to find the
deployment.
2) Use Ike's patch as-is.
3) Modify Ike's patch to include the logic for handling inspecting each part
of the multipart form post (hybrid of 1 and 2, basically).
Options 1 and 3 assume that we still need the logic to "find" the deployment
in the upload POST request by interpreting the content-type of each part.
Thinking about this some more, it may make sense to standardize on the name
of the field instead of the type, as we can enforce it from an API
perspective what people have to name the part for the API to deploy it,
instead of relying on all browsers to use the same content-type for a
deployment.
Please let me know what you think and/or if you need me to make any of the
changes!
--Jonathan
On Fri, Apr 15, 2011 at 7:50 AM, Heiko Braun <hbraun(a)redhat.com> wrote:
I've run into a OOM issue when uploading content through the HTTP API
again.
(
https://issues.jboss.org/browse/JBAS-9268)
I did take a look at the current implementation and propose that we change
it in the following way:
- read multipart contents from mime boundary
- replace the custom multipart stream implementation with a mature one
(leans on apache commons fileupload)
I've got this changes tested and verified in a custom branch:
https://github.com/heiko-braun/jboss-as/commits/out_of_memory
However before going ahead, I would like get some feedback from
a) the original author (Jonathan Pearlin. Welcome onboard btw)
b) Jason wrt the Apache License sources (See
org.jboss.as.domain.http.server.multipart.asf.*)
Regards, Ike
_______________________________________________
jboss-as7-dev mailing list
jboss-as7-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev