How to properly handle decoder chain in netty?

Wed Jun 3 15:51:18 EDT 2009

Hi,

comments inline.

On Wed, Jun 3, 2009 at 9:28 PM, Frederic Bregier <fredbregier at free.fr> wrote:
>
> However, just to open my mind (no argue please ;-), corruption arrives on
> network.

of course they can happen, but there are checksums at the data link
layer (2) and the transport layer (4), so it is quite unlikely to
happen.

> Personnaly, I remember sometimes when I download a file with Http
> or Ftp and at the end, the files where not ok even if they size were ok.
> Of course it is not often (even very rare I should say).
> Another point is that if this never occurs, then why having MD5 checksum to
> download at the end?

to prevent undetected malicious exchange of the downloadable item. Of
course, many "do it wrong" by providing the file and the md5 hash at
the same place, but the main idea is that you can verify that the file
you just downloaded is really the file the "author" provided and
hasn't been exchanged by someone who had access to the file server or
similar.

> So, perhaps it is not so useful..., but one might want to secure its
> transfer.
> Again, I'm perfectly ok with what you said. I was only at the software
> level,
> guessing that for some reasons, the physical layout was doing something
> wrong...

like I said, corruption is very unlikely. TCP does a lot for you to
make the transfer reliable and just fails if it can't recover. You
know, the WWW is based on tcp, if corruptions would occur at a really
noticable rate this would have been addressed already a long time ago.
You can add additional checks if you need to be 100% sure that nobody
modified the data you just received, but it is not very likely that
this modifications happened because of the network itself.

regards,
christian!

>
> christian wrote:
>>
>> just a short remark:
>>
>> tcp is all about guaranteed delivery, so although it is theorectically
>> possible that the content of tcp packets can be changed during
>> transmission, it is very unlikely to happen in practice because each
>> tcp packet is indexed and individually replied ON THE TCP level.  It
>> can be turned off but normally tcp packets are checksummed so
>> corruption is even more unlikely.
>>
>> Frederic, your assumption about http and ftp transfers is not correct,
>> since ftp is also based on tcp and does not do any checksums on the
>> data transfer this can even be easily understand. the difference
>> between http and ftp transfer of huge files is that most ftp server
>> implementations support continuation of aborted transfers, ie. if the
>> connection aborts you don't have to start over the whole transfer
>> again, whereas most http servers do not support transfer continuation
>> and just start over again.
>>
>> If you use tcp you can be reasonably sure that the data you transfer
>> is correct and in the same sequence the remote peer sent it. Except
>> for things where higher security is mandatory (like doing financial
>> transactions) you normally don't have to do extra checksum checks with
>> tcp (of course you are free do to so).
>>
>> I am still thinking that there is something wrong with Jiangs decoder
>> when more chunks are sent. You can do a network trace on the server
>> and see what was received, if it looks correct there is a problem with
>> the server.
>>
>>
>> regards,
>> christian
>>
>>
>> On Wed, Jun 3, 2009 at 8:13 PM, Frederic Bregier <fredbregier at free.fr>
>> wrote:
>>>
>>> Jiang,
>>>
>>> Happy to help someone there... ;-)
>>>
>>> Just a precision of what I have in mind:
>>> If you do have such MD5, do you use it on the full file or only by chunk?
>>> If using by chunk, then it is only this chunk that has to be resent.
>>> However, then, you will perhaps have a protocol that will do something
>>> like:
>>> - client sends 1 chunk with its MD5 and its rank
>>> - server acquires the chunk, verifies the rank and the md5 of the chunk
>>> - if something goes wrong (bad rank, bad md5), ask the client to resend
>>> the
>>> last chunk
>>> - if ok, then just ack the chunk
>>> The problem is then your send will be somehow "synchronous" since each
>>> chunk
>>> will have to be validated by the server. The easy part is that resend the
>>> last bad chunk is then easy. The bad part is that you will rely on
>>> latency
>>> on the network (chunk send then waiting for acknowledge from the server
>>> before sending the next chunk). But you will have a very secure transfer
>>> file then...
>>> Perhaps to have less acknowledge (depending on the size of chunk), you
>>> could
>>> perhaps acknowledge them by packet (for instance each 10 chunk, except if
>>> one goes wrong where you immediately send the acknowledge for the good
>>> ones
>>> and a bad acknowledge for the bad one, restarting the transfer from this
>>> chunk).
>>>
>>> In fact, I'm full of idea on this since I'm currently writing such a file
>>> transfer monitor for production on "secure" IT... In my case, efficiency
>>> is
>>> not the most important, security and restarting without resending the
>>> full
>>> file is the most important.
>>>
>>> HTH,
>>> Frederic
>>>
>>> Jiang Bian wrote:
>>>>
>>>> Frederic -
>>>>
>>>> Always appreciate your quick response.
>>>>
>>>> Yes, I do have those features implemented already, the md5 checksum
>>>> (i.e.
>>>> it might be a overkill, but i just don't want to worry about it), index
>>>> number of each chunk, etc.
>>>>
>>>> The protocol works, and the file did go through eventually (of course
>>>> the
>>>> server will request to send the corrupted data again). I just feel
>>>> annoyed
>>>> by the corruption. Now, it seems that I have to live with it.
>>>>
>>>> Thanks again for your help!
>>>>
>>>> Jiang
>>>>
>>>>
>>>> Frederic Bregier wrote:
>>>>>
>>>>> Hi Jiang,
>>>>>
>>>>> I think that corrupted data on huge transfer is probably normal.
>>>>>
>>>>> My view is based on the following.
>>>>> When you want to download a huge file (say a ISO file of your favorite
>>>>> linux distribution, about 4 GB), you have often three pointers:
>>>>> - one link to an HTTP download link with a "warning" that says "you
>>>>> might
>>>>> better want to use the FTP protocol since HTTP download can corrupt
>>>>> data"
>>>>> - one link to an FTP download link
>>>>> - one link to a MD5 file with a "warning" saying "for such huge file,
>>>>> you
>>>>> should verify the correctness of your download with the following MD5
>>>>> key"
>>>>>
>>>>> So based on this example, my guess is that sending huge file (by chunk
>>>>> or
>>>>> not) can lead to some bad transfer that TCP/IP (or even more UDP)
>>>>> cannot
>>>>> address.
>>>>> That's why most of the "financial" transfer file software integrate a
>>>>> checksum during the transfer to validate each file (or even each
>>>>> chunk).
>>>>>
>>>>> My suggestion could be the following (but take it only as a
>>>>> suggestion),
>>>>> you could perhaps integrate something like a checksum in your protocol,
>>>>> in two ways:
>>>>> - either at the end of the transfer, you send the checksum (MD5 for
>>>>> instance) and compares it on the target host (each checksum computed
>>>>> probably at both side during the transfer in order to prevent a new
>>>>> read
>>>>> of the file)
>>>>> - either for each chunk, you send too a checksum of this chunk to be
>>>>> compared on the remote host
>>>>>
>>>>> You can even include a count number that orders the chunks (for
>>>>> instance
>>>>> to see if there is an error in the transmission where a chunk number
>>>>> 1024
>>>>> occurs after a chunk number 1021, meaning the chunks with numbers 1022
>>>>> and 1023 are missing and to enable the restart from chunk 1022).
>>>>>
>>>>> But don't go too fast, because including such behaviour in your
>>>>> protocol
>>>>> will increase greatly the computation (checksum, order of chunk, retry,
>>>>> ...) and so decrease the efficiency. In such problem, it is always a
>>>>> choice between efficiency and security, depending on your final
>>>>> goals...
>>>>>
>>>>> HTH,
>>>>> Frederic
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> -----
>>> Hardware/Software Architect
>>> --
>>> View this message in context:
>>> http://n2.nabble.com/How-to-properly-handle-decoder-chain-in-netty--tp3015408p3020051.html
>>> Sent from the Netty User Group mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> netty-users mailing list
>>> netty-users at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/netty-users
>>>
>> _______________________________________________
>> netty-users mailing list
>> netty-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/netty-users
>>
>>
>
>
> -----
> Hardware/Software Architect
> --
> View this message in context: http://n2.nabble.com/How-to-properly-handle-decoder-chain-in-netty--tp3015408p3020513.html
> Sent from the Netty User Group mailing list archive at Nabble.com.
>
> _______________________________________________
> netty-users mailing list
> netty-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/netty-users
>