How to properly handle decoder chain in netty?

Wed Jun 3 14:46:41 EDT 2009

Frederic,

I actually do md5 on chunks and sha256 on the whole file. If only a chunk is
corrupted, I only request for that chunk. And also, the server won't ack on
each chunk, but on the whole file. At the end of the cycle, if there is any
missing chunks, the server will send a request for all missing segments.

So, my point is, I do agree with you on the dilemma between efficiency and
reliability. And normally, I tend to think a corrupted data worth nothing,
and eventually it reduce the efficiency as well.

The goal of my project is to do a secure distributed file system to provide
certain level of deniability. I am more a security guy than networking, so I
am glad that someone is really taking security issue seriously.

Thanks a bunch,

Jiang

Frederic Bregier wrote:
> 
> Jiang,
> 
> Happy to help someone there... ;-)
> 
> Just a precision of what I have in mind:
> If you do have such MD5, do you use it on the full file or only by chunk?
> If using by chunk, then it is only this chunk that has to be resent.
> However, then, you will perhaps have a protocol that will do something
> like:
> - client sends 1 chunk with its MD5 and its rank
> - server acquires the chunk, verifies the rank and the md5 of the chunk
> - if something goes wrong (bad rank, bad md5), ask the client to resend
> the last chunk
> - if ok, then just ack the chunk
> The problem is then your send will be somehow "synchronous" since each
> chunk will have to be validated by the server. The easy part is that
> resend the last bad chunk is then easy. The bad part is that you will rely
> on latency on the network (chunk send then waiting for acknowledge from
> the server before sending the next chunk). But you will have a very secure
> transfer file then...
> Perhaps to have less acknowledge (depending on the size of chunk), you
> could perhaps acknowledge them by packet (for instance each 10 chunk,
> except if one goes wrong where you immediately send the acknowledge for
> the good ones and a bad acknowledge for the bad one, restarting the
> transfer from this chunk).
> 
> In fact, I'm full of idea on this since I'm currently writing such a file
> transfer monitor for production on "secure" IT... In my case, efficiency
> is not the most important, security and restarting without resending the
> full file is the most important.
> 
> HTH,
> Frederic
> 
> Jiang Bian wrote:
>> 
>> Frederic -
>> 
>> Always appreciate your quick response.
>> 
>> Yes, I do have those features implemented already, the md5 checksum (i.e.
>> it might be a overkill, but i just don't want to worry about it), index
>> number of each chunk, etc.
>> 
>> The protocol works, and the file did go through eventually (of course the
>> server will request to send the corrupted data again). I just feel
>> annoyed by the corruption. Now, it seems that I have to live with it.
>> 
>> Thanks again for your help!
>> 
>> Jiang
>> 
>> 
>> Frederic Bregier wrote:
>>> 
>>> Hi Jiang,
>>> 
>>> I think that corrupted data on huge transfer is probably normal.
>>> 
>>> My view is based on the following.
>>> When you want to download a huge file (say a ISO file of your favorite
>>> linux distribution, about 4 GB), you have often three pointers:
>>> - one link to an HTTP download link with a "warning" that says "you
>>> might better want to use the FTP protocol since HTTP download can
>>> corrupt data" 
>>> - one link to an FTP download link
>>> - one link to a MD5 file with a "warning" saying "for such huge file,
>>> you should verify the correctness of your download with the following
>>> MD5 key"
>>> 
>>> So based on this example, my guess is that sending huge file (by chunk
>>> or not) can lead to some bad transfer that TCP/IP (or even more UDP)
>>> cannot address.
>>> That's why most of the "financial" transfer file software integrate a
>>> checksum during the transfer to validate each file (or even each chunk).
>>> 
>>> My suggestion could be the following (but take it only as a suggestion),
>>> you could perhaps integrate something like a checksum in your protocol,
>>> in two ways:
>>> - either at the end of the transfer, you send the checksum (MD5 for
>>> instance) and compares it on the target host (each checksum computed
>>> probably at both side during the transfer in order to prevent a new read
>>> of the file)
>>> - either for each chunk, you send too a checksum of this chunk to be
>>> compared on the remote host
>>> 
>>> You can even include a count number that orders the chunks (for instance
>>> to see if there is an error in the transmission where a chunk number
>>> 1024 occurs after a chunk number 1021, meaning the chunks with numbers
>>> 1022 and 1023 are missing and to enable the restart from chunk 1022).
>>> 
>>> But don't go too fast, because including such behaviour in your protocol
>>> will increase greatly the computation (checksum, order of chunk, retry,
>>> ...) and so decrease the efficiency. In such problem, it is always a
>>> choice between efficiency and security, depending on your final goals...
>>> 
>>> HTH,
>>> Frederic
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://n2.nabble.com/How-to-properly-handle-decoder-chain-in-netty--tp3015408p3020250.html
Sent from the Netty User Group mailing list archive at Nabble.com.