From: JR Conlin <jrconlin(a)mozilla.com>
Subject: Re: packet truncating for wire exchange.
Date: November 5, 2013 3:27:21 PM EST
To: Doug Turner <doug.turner(a)gmail.com>
Cc: Jeremiah Orem <oremj(a)mozilla.com>, dev-simplepush(a)mozilla.org
First off, this was merely a proposal, not a demand or request. I simply
was noting that some efficiency could be introduced with minimal impact.
That said, I can also offer the following helpful points:
TCP structures are not set to a static length. The TCP header specifies
the "window size" of the data packet.
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_st...
TCP window sizes can be between 2 and 65,535 bytes by default or a
scaling factor may be used (see RFC 1323) which can expand a window size
up to 1GB. Packets of that size are generally not recommended for most
systems due to the increasing chance of corruption on non dedicated links.
There are other considerations regarding TCP tuning that can effect
retransmission or optimization of data packets, but our data is
generally far, far below that.
Where things become more of a concern is that metering is generally done
not by packets but by bits. (This is why carriers tend to refer to
connections using the small "b" indicating a given link can exchange
56Kbs or 56,000,000 bits per second). In this case, sending
{"messageType":"hello"} (using ASCII) would require 24 Bytes of
information, or 192 bits. Sending {"m":"h"} would require 10 bytes
of
information, or 80 bits. While this is purely an example, it should be
noted that the following would help show the bandwidth consumption of
repeated transmission:
long form short form
1: 192 1: 80
2: 384 2: 160
3: 576 3: 320
4: 768 4: 400
5: 960 5: 480
6: 1152 6: 540
So after six of these exchanges, we're seeing 612 bits of savings.
Enough to send another 7 short form messages.
This implementation could be performed without impacting existing code.
We're using the same data format for exchange (string encoded JavaScript
Object Notation). JSON provides a very loose binding for data (in that
data exchanged may have optional components or definitions). This is
useful in cases where the actual function to process the data may be
different depending on a sole value within the data being defined. In
this case, we are using the value of the key "messageType". These
values, while descriptive, are a bit like me calling you Douglas Turner
of Mozilla Incorporated located in Mountain View California in the
United States of America on planet Earth of the Sol System contained in
the Orion sub-arm of the the Milky Way Galaxy of the Local Galaxy
Group... Since no one else in this group is called "Doug", it's a
viable, non confusing means of noting the individual to whom I wish to
address. Instead of prepping a good deal of memory and instructions to
cycle over a string, we can simply tell the program to match the first
character and move on.
Does this help clarify my thinking?
On 2013/11/5 11:50 AM, Doug Turner wrote:
> No change without data. We need to understand exactly the bandwidth saving (in terms
of tcp packets). Hearing about random reports from people deploying isn’t good enough.
Lets get the hard data, analyze, then figure out what we should do.
>
>
>
> On Nov 5, 2013, at 11:48 AM, JR Conlin <jrconlin(a)mozilla.com> wrote:
>
>> True, but that would also mean scrapping SimplePush as it stands
>> currently, since there's no provision for versions. (How does a client
>> or server know before data exchange which protocol is being used?)
>>
>> So far, we've not experienced any problems with SimplePush, mostly
>> because it has yet to get wide traction.
>>
>> While tossing out the protocol may have minimal impact to ourselves, it
>> does mean that we've wasted a good deal of time and marketing from both
>> ourselves and groups like Aerogear. I don't think they'd appreciate us
>> tossing things, and might not be as welcoming to the next version of
>> whatever we release.
>>
>> (on a somewhat personal note, this is why I tend to advocate for
>> quieter, internal releases and iterations of projects like this before
>> we try to seek wide adoption.)
>>
>> On 2013/11/5 11:39 AM, Doug Turner wrote:
>>> If we make any change to the wire protocol, we should start form the
beginning and design a binary protocol (not what we have now). We also may want to move
away from web sockets.
>>>
>>> Has this become any significant problem?
>>>
>>>
>>>
>>>
>>> On Nov 5, 2013, at 11:35 AM, Jeremiah Orem <oremj(a)mozilla.com> wrote:
>>>
>>>> If we are thinking about changing the format of the messages, I'd
like to throw an additional idea in to the ring.
>>>>
>>>> Right now, with Go, we have to decode the message to a map instead of a
struct, because we don't know the message fields ahead of time. If we changed the
structure to {"m":"h", "d":
{"u":"123abcd...","c":[...]}}, we could use the RawMessage
decoder. for example:
>>>>
>>>> type Message struct {
>>>> MessageType string `json:"m"`
>>>> Data *json.RawMessage `json:"d"`
>>>> }
>>>>
>>>> type HelloMessage struct {
>>>> Uaid string `json:"u"`
>>>> C []string `json:"c"`
>>>> }
>>>>
>>>> var d Message
>>>> json.Unmarshal(msg, &d)
>>>> switch d.MessageType {
>>>> case "h":
>>>> var h HelloMessage
>>>> d.Data.UnMarshal(&h)
>>>> fmt.Println(h.Uaid, h.C)
>>>> ....
>>>>
>>>> Any reason not to do this?
>>>>
>>>> -Jeremy
>>>>
>>>> JR Conlin wrote:
>>>>> Looking at the packets being exchanged between server and client, it
>>>>> strikes me that they're a bit... wordy. That's great for docs
and demo
>>>>> code, not so hot for user bandwidth costs.
>>>>>
>>>>> Fortunately, it looks like we can trim things down quite a bit by
just
>>>>> doing lead character checks. For example: the "hello"
packet is:
>>>>>
>>>>> {"messageType":"hello",
"uaid":"123abcd...", "channelIDs":[...]}
>>>>>
>>>>> This could be trimmed down to just:
>>>>> {"m":"h",
"u":"123abcd...","c":[...]}
>>>>>
>>>>> We could do the same trick with most of the other packets as well.
I'm
>>>>> betting you can probably figure out which these are:
>>>>>
>>>>> {"m":"r","u":"234cdef..."}
>>>>> {"m":"p"}
>>>>> {"m":"u","u":"234cdef..."}
>>>>>
{"m":"n","u":[{"c":"123abcd...","v":1},{"c":"234cdef...","v":2}]}
>>>>>
>>>>> Since the context of the message is already determined by the
message
>>>>> type, and there's only a few message types presented, it's
easy to
>>>>> determine the data.
>>>>>
>>>>> Granted, we're not saving a ton per message, but a fair bit in
>>>>> aggregate. In addition, key matches would obviously be a fair bit
>>>>> faster, since your only matching single values. Or you could add a
>>>>> wrapper that detects the short "m" key, and converts things
back to
>>>>> "long form".
>>>>>
>>>>> granted, this is somewhat complicated because there's no version.
While
>>>>> not strictly required for this (short keys are a pretty good
indicator
>>>>> of the change being used), it does mean that any future change might
be
>>>>> a bit difficult to do, and limits us to 52 potential message types.
>>>>>
>>>>> Thoughts?
>>>>> _______________________________________________
>>>>> Dev-simplepush mailing list
>>>>> Dev-simplepush(a)mozilla.org
>>>>>
https://mail.mozilla.org/listinfo/dev-simplepush
>>>> _______________________________________________
>>>> Dev-simplepush mailing list
>>>> Dev-simplepush(a)mozilla.org
>>>>
https://mail.mozilla.org/listinfo/dev-simplepush
>>>
>>
>
_______________________________________________
Dev-simplepush mailing list
Dev-simplepush(a)mozilla.org
https://mail.mozilla.org/listinfo/dev-simplepush