Interesting discussion on simplepush happening

Begin forwarded message:

From: JR Conlin <jrconlin@mozilla.com>
Subject: Re: packet truncating for wire exchange.
Date: November 5, 2013 3:27:21 PM EST
To: Doug Turner <doug.turner@gmail.com>
Cc: Jeremiah Orem <oremj@mozilla.com>, dev-simplepush@mozilla.org

First off, this was merely a proposal, not a demand or request. I simply
was noting that some efficiency could be introduced with minimal impact.

That said, I can also offer the following helpful points:

TCP structures are not set to a static length. The TCP header specifies
the "window size" of the data packet.
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure

TCP window sizes can be between 2 and 65,535 bytes by default or a
scaling factor may be used (see RFC 1323) which can expand a window size
up to 1GB. Packets of that size are generally not recommended for most
systems due to the increasing chance of corruption on non dedicated links.

There are other considerations regarding TCP tuning that can effect
retransmission or optimization of data packets, but our data is
generally far, far below that.

Where things become more of a concern is that metering is generally done
not by packets but by bits. (This is why carriers tend to refer to
connections using the small "b" indicating a given link can exchange
56Kbs or 56,000,000 bits per second). In this case, sending
{"messageType":"hello"} (using ASCII) would require 24 Bytes of
information, or 192 bits. Sending {"m":"h"} would require 10 bytes of
information, or 80 bits. While this is purely an example, it should be
noted that the following would help show the bandwidth consumption of
repeated transmission:

long form                  short form
 1:  192                     1:  80
 2:  384                     2: 160
 3:  576                     3: 320
 4:  768                     4: 400
 5:  960                     5: 480
 6: 1152                     6: 540

So after six of these exchanges, we're seeing 612 bits of savings.
Enough to send another 7 short form messages.

This implementation could be performed without impacting existing code.
We're using the same data format for exchange (string encoded JavaScript
Object Notation). JSON provides a very loose binding for data (in that
data exchanged may have optional components or definitions). This is
useful in cases where the actual function to process the data may be
different depending on a sole value within the data being defined. In
this case, we are using the value of the key "messageType". These
values, while descriptive, are a bit like me calling you Douglas Turner
of Mozilla Incorporated located in Mountain View California in the
United States of America on planet Earth of the Sol System contained in
the Orion sub-arm of the the Milky Way Galaxy of the Local Galaxy
Group... Since no one else in this group is called "Doug", it's a
viable, non confusing means of noting the individual to whom I wish to
address. Instead of prepping a good deal of memory and instructions to
cycle over a string, we can simply tell the program to match the first
character and move on.

Does this help clarify my thinking?

On 2013/11/5 11:50 AM, Doug Turner wrote:
No change without data.  We need to understand exactly the bandwidth saving (in terms of tcp packets).  Hearing about random reports from people deploying isn’t good enough.  Lets get the hard data, analyze, then figure out what we should do.



On Nov 5, 2013, at 11:48 AM, JR Conlin <jrconlin@mozilla.com> wrote:

True, but that would also mean scrapping SimplePush as it stands
currently, since there's no provision for versions. (How does a client
or server know before data exchange which protocol is being used?)

So far, we've not experienced any problems with SimplePush, mostly
because it has yet to get wide traction.

While tossing out the protocol may have minimal impact to ourselves, it
does mean that we've wasted a good deal of time and marketing from both
ourselves and groups like Aerogear. I don't think they'd appreciate us
tossing things, and might not be as welcoming to the next version of
whatever we release.

(on a somewhat personal note, this is why I tend to advocate for
quieter, internal releases and iterations of projects like this before
we try to seek wide adoption.)

On 2013/11/5 11:39 AM, Doug Turner wrote:
If we make any change to the wire protocol, we should start form the beginning and design a binary protocol (not what we have now).  We also may want to move away from web sockets.

Has this become any significant problem?




On Nov 5, 2013, at 11:35 AM, Jeremiah Orem <oremj@mozilla.com> wrote:

If we are thinking about changing the format of the messages, I'd like to throw an additional idea in to the ring.

Right now, with Go, we have to decode the message to a map instead of a struct, because we don't know the message fields ahead of time. If we changed the structure to {"m":"h", "d": {"u":"123abcd...","c":[...]}}, we could use the RawMessage decoder. for example:

type Message struct {
MessageType string `json:"m"`
Data *json.RawMessage `json:"d"`
}

type HelloMessage struct {
Uaid string `json:"u"`
C []string `json:"c"`
}

var d Message
json.Unmarshal(msg, &d)
switch d.MessageType {
case "h":
var h HelloMessage
d.Data.UnMarshal(&h)
fmt.Println(h.Uaid, h.C)
....

Any reason not to do this?

-Jeremy

JR Conlin wrote:
Looking at the packets being exchanged between server and client, it
strikes me that they're a bit... wordy. That's great for docs and demo
code, not so hot for user bandwidth costs.

Fortunately, it looks like we can trim things down quite a bit by just
doing lead character checks. For example: the "hello" packet is:

{"messageType":"hello", "uaid":"123abcd...", "channelIDs":[...]}

This could be trimmed down to just:
{"m":"h", "u":"123abcd...","c":[...]}

We could do the same trick with most of the other packets as well. I'm
betting you can probably figure out which these are:

{"m":"r","u":"234cdef..."}
{"m":"p"}
{"m":"u","u":"234cdef..."}
{"m":"n","u":[{"c":"123abcd...","v":1},{"c":"234cdef...","v":2}]}

Since the context of the message is already determined by the message
type, and there's only a few message types presented, it's easy to
determine the data.

Granted, we're not saving a ton per message, but a fair bit in
aggregate. In addition, key matches would obviously be a fair bit
faster, since your only matching single values. Or you could add a
wrapper that detects the short "m" key, and converts things back to
"long form".

granted, this is somewhat complicated because there's no version. While
not strictly required for this (short keys are a pretty good indicator
of the change being used), it does mean that any future change might be
a bit difficult to do, and limits us to 52 potential message types.

Thoughts?
_______________________________________________
Dev-simplepush mailing list
Dev-simplepush@mozilla.org
https://mail.mozilla.org/listinfo/dev-simplepush
_______________________________________________
Dev-simplepush mailing list
Dev-simplepush@mozilla.org
https://mail.mozilla.org/listinfo/dev-simplepush




_______________________________________________
Dev-simplepush mailing list
Dev-simplepush@mozilla.org
https://mail.mozilla.org/listinfo/dev-simplepush