[Repost] Re: A question about your HttpClient example and beyond

Sun Sep 20 22:17:28 EDT 2009

Thanks Frederic for a nice guidance.  Happy to see the great conversation!

— Trustin Lee, http://gleamynode.net/

On Sat, Sep 12, 2009 at 2:44 PM, J. Mi <jmi258 at gmail.com> wrote:
> Hi Frederic,
>
> You got it all correct in my application. Thanks so much for the help.
> Following your suggestion, I was able to maximize the concurrency by adding
> a ChannelFutureListener on connect operation. In its operationComplete(),
> channel sent a http request and count down a CountDownLatch since I do need
> to 'join' all responses.
>
> So now, I have used 3 loops (vs previously 4 loops), plus more
> concurrency, like this:
>
> - Setup the first CountDownLatch
> - Loop 1: connect n times with a listener for each connect. In the
> listener's operationComplete(), request was sent, handler list was added
>               and a CountDownLatch was counted down.
> - wait on the first CountDownLatch
> - Loop 2: use the handler list to retrieve each response.
> - Setup the second CountDownLatch
> - Loop 3: do a channel.getCloseFuture().addListener. In the listener's
> operationComplete(), simply count down the CountDownLatch
> - wait on the first CountDownLatch
> - bootstrap.releaseExternalResources
>
> I think I'm now observing now better performance than before AFTER a
> threshold. In my case, in serving less than 100 requests, multi-threading
> plus synch Apache HttpClient does better. In serving 100 requests, they
> break even. In serving 200 and 300 requests, Netty does better. My VMware
> workstation 6 with Centos 5.3 and 5gb mem in a 4-cores desktop cannot handle
> more than 300 requests in my simple testing application.
>
> Next, I'm going to work on HttpChunkAggregator as you suggested. Not sure
> what else I need to do other than uncommenting out that line in snoop
> example. I'll look into it.
>
> Again, thanks so much to guide me as a newbie to this framework.
>
> Jason
>
> On Fri, Sep 11, 2009 at 12:30 AM, Frederic Bregier <fredbregier at free.fr>
> wrote:
>>
>> Hi Jason,
>>
>> Again, I'm feeling not able to answer to all, but I will start to answer
>> to
>> some...
>>
>> One of the interest of the Nio model is the asynchronous part.
>> In your example, if I get it correctly you do something like this:
>> For all host/port
>>    connect
>> For all connect
>>    wait their finished connection and send request
>> For all connected
>>    wait for the answer for one request in order
>>
>> Then you are implementing something in the middle of synchronous and
>> asynchronous.
>> I would have the following idea (using all ChannelFuture capability of
>> Netty):
>>
>> For all host/port
>>    connect and add a ChannelFutureListener on the connection done
>>
>> In the ChannelFutureListener
>>    send the request => each request will add (not necesseraly in order)
>> the
>> result in your arraylist
>>
>> Wait for the list to be full (n requests => n answers or using a countdown
>> from concurrent package)
>>
>> Then you can have connection/sending request/receiving request all
>> overlapping between several requests.
>>
>> What you have done tend to be synchronous, not completely since you
>> overlap
>> connections between all channel connection, but as you are waiting that
>> all
>> are done...
>> You can get it by this "picture":
>> you create n task (connection)
>> you are waiting that all n task are done (connected) so a global
>> synchronisation of all threads
>> then you create n task (request)
>> you are waiting that all n task are done (answered) so again a global
>> synchronisation
>>
>> What I suggest is:
>> you create n task (connection), they will continue by sending the request
>> (no synchronisation)
>> you are waiting that all n task (connected and answered) are done (on any
>> order) so a global synchronisation but based on the slowest answer from
>> remote host.
>>
>> Of course, if you can avoid to wait for all answers to be there and work
>> with each answer one by one, then you can even avoid such a global wait on
>> the slowest answer. But it depends on you business logic there...
>>
>> Reusing connection is not quite possible in Netty but there is some
>> handlers/code that allow reconnection (Trustin made an example a few days
>> ago posted in the ML).
>>
>> Now for the chunk part, yes chunk should be supported by any HTTP server.
>> The reason is that when a request is bigger than 8KB, it is supposed to be
>> chunked.
>> However, there is in Netty an handler (HttpChunkAggregator) that allow you
>> to get the full body (only the body is concerned by chunk) in one
>> ChannelBuffer. This handler does accumulating of all chunks up to the last
>> one and returns to the next handler when it is completed.
>> It is obviously simplest for a standard program.
>> However take care of one thing, it means that if you have 100 requests and
>> if all requests sends 1MB of body, then you will have 100 MB in memory (at
>> least) since it will store all bodies in memory until they finished to
>> decode all chunks.
>> In my work, I use the Http codec chunk by chunk since I am able with my
>> business model to handle data chunk by chunk so keeping the memory as low
>> as
>> possible.
>> But if it is not your case, just use the HttpChunkAggregator handler, it
>> works perfectly and then you can ignore if the answer is chunked or not.
>> In
>> the snoop example there is an example on how to use it.
>>
>> HTH,
>> Frederic
>>
>>
>> J. Mi wrote:
>> >
>> > Thanks to Frederic for the overview. It's very helpful for me.
>> >
>> > I have come up with an approach to replace my multi-thread model with
>> > Netty's HttpClient. It's pretty much based on the snoop example. I
>> > just added 3 loops to achieve the concurrency (multiple http requests at
>> > the
>> > same time). The first loop was around the call to bootstrap.connect(new
>> > InetSocketAddress(host, port)). The second loop was waiting for each
>> > connection attempt to succeed and then send the request. The third loop
>> > was
>> > using the handler to retrieve each http response by using a
>> > LinkedBlockingQueue. I used ArrayList to maintain a list for
>> > ChannelFuture,
>> > a list for Channel and a list for HttpResponseHandler among these 3
>> > loops.
>> >
>> > Everything worked well for me with the approach. However, my test result
>> > didn't seem to show this approach out-perform my multi-thread model,
>> > i.e.
>> > one thread (java.util.concurrent) for each http request which was done
>> > by
>> > Apache Commons HttpClient (a synchronous model). My performance was
>> > measured
>> > by timing the total time spent in making n http requests and retrieving
>> > this
>> > n http responses end-to-end.
>> >
>> > With requests below 50, the multi-thread model performed a little
>> > better.
>> > I
>> > was hoping Netty's way can catch up for better scaling because I was
>> > concerned about the current muti-thread model may not scale well when
>> > getting hundreds requests at the same time. But I still failed to
>> > observe
>> > any increased performance relative to the multi-thread model beyond
>> > serving
>> > 50, 100, 200...800 concurrent requests.
>> >
>> > One thing I need to understand more (Frederic already touched some
>> > basics
>> > here) is about the connection management. I felt that Apache Commons
>> > HttpClient seemed to manage the connection with possible reuse. Not
>> > exactly
>> > sure about how Netty does that.
>> >
>> > One more question about Netty's HttpClient. In its
>> > HttpResponseHandler.java,
>> > messageReceived() method only receives a portion of response at a time
>> > and
>> > has a dependence on server's responding with "chunked' Transfer-Encoding
>> > header and content for an end of response condition. This raised 2
>> > questions: (1) is there a way to receive response in one shot, like
>> > Apache's
>> > HttpClient; and (2) do all Http server required to respond with
>> > "chunked"
>> > content? In my case, I need to retrieve online responses from different
>> > web
>> > sites.
>> >
>> > Cheers,
>> > Jason
>> >
>> >
>> >
>> > On Thu, Sep 10, 2009 at 6:45 AM, Frederic Bregier
>> > <fredbregier at free.fr>wrote:
>> >
>> >>
>> >> Hi,
>> >>
>> >> I will not talk about the specific Http part of Netty but about its
>> >> main
>> >> interest, the NIO of Netty.
>> >> Of course, Trustin or others can be more precised than me. It is just
>> >> my
>> >> general comprehension (I'm not a Nio expert neither a Netty expert, so
>> >> it
>> >> is
>> >> just my comprehension as an end user).
>> >>
>> >> To compare to a standard Blocking IO, Netty uses less threads to manage
>> >> the
>> >> same behaviour.
>> >> For instance, if you think about Apache or Tomcat, one connection will
>> >> be
>> >> handled by at least one thread through the full life of the connection.
>> >> So
>> >> if you have 1000 connections, you will have at least 1000 threads.
>> >> In Netty, a thread will be active when data arrives into the server
>> >> (the
>> >> general idea is greatly simplified here, it is not to take it as the
>> >> reality). For instance, for those 1000 connections, maybe at most 100
>> >> are
>> >> really sending something on the same time to the server, so around 100
>> >> threads will be used. Netty does something like reusing threads,
>> >> whatever
>> >> the connection is.
>> >>
>> >> Another point of course is the non blocking way. Once you send
>> >> something,
>> >> you have the choice to continue the job without waiting that the data
>> >> is
>> >> really sent (of course, you have to take care about it for instance
>> >> before
>> >> closing the channel). So you can overlap sending data with other
>> >> computations (for instance for next packet to be sent).
>> >> Compares to blocking IO, of course, there you wait for the data to be
>> >> really
>> >> sent (or at least buffered).
>> >>
>> >> So in many points, Netty approach should have more performance than
>> >> blocking
>> >> IO. I said "should" since there exist some counter examples where
>> >> blocking
>> >> IO are faster, since NIO introduces some extra computing comparing to
>> >> blocking IO. However most of the time, these extra are masked by the
>> >> implementation of Netty and are quicker than blocking IO. But I recall
>> >> some
>> >> examples however.
>> >>
>> >> Also, Netty can have different kind of transport (nio, oio, ...), so
>> >> the
>> >> behaviour can be different according to one or another low network
>> >> transport
>> >> framework.
>> >>
>> >> This is not the full idea of Netty, but a start of answer to your
>> >> question.
>> >> For more information, either other people can continue this thread (or
>> >> correct where I a wrong), and of course you can read the examples that
>> >> are
>> >> in Netty (even those not about Http) and the documentation of Netty.
>> >>
>> >> HTH,
>> >> Cheers,
>> >> Frederic
>> >>
>> >> J. Mi wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > I guess my fundamental question here is if, in theory at least, Netty
>> >> > provides a better asynchronous mechanism than the concurrent java
>> >> package
>> >> > from java.util.concurrent.* in terms of performance. Does internally
>> >> Netty
>> >> > use multi-threading, java.nio, or both, or neither?
>> >> >
>> >> > If Netty does better than java.util.concurrent.* for performance, is
>> >> there
>> >> > any example, tutorial, which can guide me a little for replacing my
>> >> > current
>> >> > multi-threading process which I described in that previous email?
>> >> >
>> >> > Many thanks to you for sharing your expertise,
>> >> > Jason
>> >> >
>> >> > On Wed, Sep 2, 2009 at 12:11 PM, J. Mi <jmi258 at gmail.com> wrote:
>> >> >
>> >> >> Hi folks,
>> >> >>
>> >> >> Currently, my application's process flow logic is like this:
>> >> >>
>> >> >> -> A controlling process receives one request for data which will be
>> >> >> fetched from multiple online sources.
>> >> >> -> The controlling process spawns multiple threads. Each of these
>> >> threads
>> >> >> will (1) use Apache synchronous commons httpclient to fetch the
>> >> >> data;
>> >> (2)
>> >> >> parse the data; and (3)
>> >> >>     return the data to the controlling process.
>> >> >> -> The controlling process joins all threads and return the combined
>> >> data
>> >> >> to the requestor.
>> >> >>
>> >> >> So basically, each thread uses a synchronous httpclient to fetch the
>> >> data
>> >> >> and then parse it.
>> >> >>
>> >> >>  In reading org.jboss.netty.example.http.snoop package, I have the
>> >> >> following question:
>> >> >> If I just replace the Apache's synchronous httpclient with Nettty's
>> >> >> org.jboss.netty.handler.codec.http.* as the example does, will I be
>> >> >> benefited performance-wise? I heard something about blocking I/O
>> >> >> hurts
>> >> >> multi-threading. If so, should Netty's package work better for me?
>> >> >>
>> >> >> Or should I actually get ride of the existing multi-threading by
>> >> >> using
>> >> >> Netty's framework? If so, which of your examples can be better
>> >> referenced
>> >> >> for my purpose?
>> >> >>
>> >> >> Thanks for your in advance,
>> >> >> Jason
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > netty-dev mailing list
>> >> > netty-dev at lists.jboss.org
>> >> > https://lists.jboss.org/mailman/listinfo/netty-dev
>> >> >
>> >> >
>> >>
>> >>
>> >> -----
>> >> Hardware/Software Architect
>> >> --
>> >> View this message in context:
>> >>
>> >> http://n2.nabble.com/A-question-about-your-HttpClient-example-and-beyond-tp3568879p3617420.html
>> >> Sent from the Netty Developer Group mailing list archive at Nabble.com.
>> >> _______________________________________________
>> >> netty-dev mailing list
>> >> netty-dev at lists.jboss.org
>> >> https://lists.jboss.org/mailman/listinfo/netty-dev
>> >>
>> >
>> > _______________________________________________
>> > netty-dev mailing list
>> > netty-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/netty-dev
>> >
>> >
>>
>>
>> -----
>> Hardware/Software Architect
>> --
>> View this message in context:
>> http://n2.nabble.com/A-question-about-your-HttpClient-example-and-beyond-tp3568879p3624150.html
>> Sent from the Netty Developer Group mailing list archive at Nabble.com.
>> _______________________________________________
>> netty-dev mailing list
>> netty-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/netty-dev
>
>
> _______________________________________________
> netty-dev mailing list
> netty-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-dev
>
>