[aerogear-dev] sync spec strawman
Summers Pittman
supittma at redhat.com
Wed Jan 8 09:13:36 EST 2014
On 01/08/2014 04:56 AM, Bruno Oliveira wrote:
> Good morning Summers, I was reviewing the whole document and I think is the great start. Either way that looked a bit confuse to me, so here comes my attempt to reorganize it (is just a suggestion): https://github.com/aerogear/aerogear.org/pull/227
>
> Another change that I would suggest is the inclusion of the field “revision” like Qmx already suggested, with your idea of the checksum. And also the inclusion of the field “signature” to make sure that data is not corrupted or tampered.
Would signature be an optional field?
And is revision a checksum now?
>
> Thoughts?
I feel like we are going forward now.
>
> --
> abstractj
>
> On January 7, 2014 at 12:21:35 PM, Summers Pittman (supittma at redhat.com) wrote:
>>
>> I've updated the sync spec on the data-sync branch on aerogear.org
>> with
>> what qmx posted yesterday and some ideas I had as well. If I don't
>> get
>> any tomatoes I will try to see what a POC on Android looks like this
>> afternoon.
>>
>> Sync doc follows.
>>
>> # Status: Experimental
>>
>> # AeroGear Data Sync
>>
>>
>> ## basics?
>>
>> Since we've been catering the enterprise market, this essentially
>> means
>> we need to get the __boring__ stuff right first, then move over
>> to the
>> __shiny__ stuff, like realtime data sync, update policies &
>> friends.
>>
>> ### data model
>>
>> For starters, I think that the most important thing that needs
>> to be
>> agreed upon is the data model and the atomic operations around
>> it. As
>> previous discussed, I really like CouchDB's datamodel -- and
>> hate erlang ;)
>>
>> `{_id:, content:, rev:}`
>>
>> #### JS
>>
>> Well, it's JSON, it _Just Works_™
>>
>> #### Java
>>
>> I didn't want to pick on Java, but since its fame forces me to it.
>> First
>> stab (a courtesy of our friend Dan Bevenius):
>>
>> public interface Document {
>> public String id;
>> public String content;
>> public String rev;
>> }
>>
>> We naturally want to kick this a notch, and use objects instead
>> of plain
>> strings:
>>
>> public interface Document {
>> public ID id;
>> public T content;
>> public String rev;
>> }
>>
>> In this case, we can use the convention requiring that `T` is any
>> **object serializable to JSON**. `ID` is a convenience shorthand
>> since
>> it's a **GUID/UUID**. I think this key isn't necessarily a natural
>> key
>> (a surrogate key instead).
>>
>> #### Objective-C
>>
>> volunteers needed ;)
>>
>> ### Transactions
>>
>> These are the most basic parts of sync I can think of that our system
>> should be able to do/manage. Our internal representation of
>> the client
>> documents and collections should make implementing this automatically
>> and without user intervention as simple as possible
>>
>> * Detect Change
>>
>> When a user changes her local data, the system should note the
>> change and generate a sync message to send to the server. This
>> can be
>> done automatically or manually but SHOULD be done automatically.
>>
>> * Send update
>>
>> When a sync message is ready to be sent, and the system allows for
>> it to be sent (network available, not in blackout window from
>> exponential backoff, etc) then sync message should be sent.
>> This being
>> done automatically should be the default, but the developer
>> can override
>> this behavior.
>>
>> * Receive Update
>>
>> When a client updates it data and successfully syncs to the remote
>> server, the remote server will notify all of the relevant clients.
>> The
>> client must automatically and without user intervention receive
>> this
>> update and either act on it or store it for later processing.
>>
>> * Apply Update
>>
>> Once a client application has an update message from the server,
>> it
>> can apply the message correctly to the data on it. This should
>> be done
>> automatically as part of receiving the update, but it may be done
>> manually or may be delayed and automatically executed later.
>>
>> * Detect Conflict
>>
>> When applying an update fails, the system must detect this. The
>> system will provide state to the application and/or the user
>> to handle
>> the conflict. The user MUST NOT have to check for conflicts on
>> her own.
>>
>> * Resolve Conflict
>>
>> There must be a mechanism for resolving a conflict. The CAN be
>> done automatically using default resolvers provided by AeroGear,
>> by
>> using a resolver provided by the developer/user, or by the app
>> user
>> selecting the correct merge. This will possibly generate a new
>> sync
>> message.
>>
>>
>> ### API levels
>>
>> As soon as we have a rough data-model defined, we can start dabbling
>> around different API levels to be served:
>>
>> (parts **I think** are potentially deliverable for a 1.0)
>>
>> - level 0: explodes when there's a conflict
>> - level 1: semi-automatic conflict resolution via something
>> like
>> google's diff-match-patch
>> - level 2: business rules determine who wins a conflicting update
>> (supervisor wins over normal user)
>>
>> (parts **I think** are potentially deliverable for a 2.0)
>>
>> - level 3: real-time updates via diff-match-patch
>> - level 4: real-time updates via OT/EC
>>
>> All those proposed API operations should be serializable, meaning
>> I can
>> potentially keep doing changes offline then just replying them
>> to the
>> server when online.
>>
>> ### transport
>>
>> Since we know about the future-looking ideas on v2.0, it would
>> be really
>> nice for us to specify a very simple/dumb JSON-based protocol
>> for those
>> change messages. Something that could accomodate both the full
>> document
>> updates and the OT/EC incremental bits too. I have no ideas on
>> this, tbh.
>>
>> #### Strawman - Summers
>>
>> {id : Object, data : String, checksum: long}
>>
>> **id** :
>> This is the global identifier for the object. This field is optional.
>>
>> **data** :
>> This is the sync data for the application. It may be a diff, a whole
>> object, etc. This field is required.
>>
>> **checksum** :
>> This is the client's idea of what a known good sync will look like.
>> If, post merge, the server's checksum and client's check sum
>> do not
>> match then the client is out of sync and must resync and handle
>> the
>> conflict.
>>
>>
>>
>> ## Appendix Use Cases:
>>
>>
>> Here are a few contrived use cases that we may want to keep in mind.
>>
>> 1. Legacy Bug Trackers From Hell
>>
>> a. It is a webapp written in COBOL, no one will ever EVER update
>> or
>> change the code
>>
>> b. It has TONS of legacy but important data
>>
>> c. It has TONS of users
>>
>> d. It only has a few transactions per day, all creating and
>> updating
>> bug reports
>>
>> e. Multiple users can edit the same report
>>
>>
>> 2. Slacker Gallery
>>
>> a. Each User has a multiple galleries, each gallery has multiple
>> photos
>>
>> b. A Gallery has only one user, but the user may be on multiple
>> devices
>>
>> c. Galleries may be renamed, created, and deleted
>>
>> d. Photos may only be created or deleted. Photos also have meta
>> data
>> which may be updated, but its creation and deletion is tied to
>> the Photo
>> object.
>>
>>
>> 3. Dropbox clone
>>
>> a. A folder of files may be shared among users
>>
>> b. There is a size limit to files and how much storage may be used
>> per folder
>>
>> c. Files are not updated. If there is a new file, there is an
>> atomic
>> delete and create operation
>>
>>
>> 4. Email client
>>
>> a. This is an AG-controller which accesses a mail account.
>>
>> b. There are mobile offline and sync enabled clients which connect
>> to this controller.
>>
>>
>> 5. Google Docs clone
>>
>> a. Operational Transform out the wazzoo
>>
>> b. What would the server need?
>>
>> c. What would the client need?
>>
>>
>> 6. Building Inspector app
>>
>> Building inspector system - we have mobile apps that store relevant
>> info
>> and are bound to be accessed on places where we won't have any kind
>> of
>> connection, or very poor signal.
>>
>> You can have several inspectors screening the same building
>> simultaneously.
>>
>> Let's say we have Agnes and Joe are doing the fire extinguisher
>> inspection in a new hospital building. Technically each fire
>> extinguisher has its own identifier and can be an independent
>> document.
>> In this case we would have no conflict happening.
>>
>> Now they start finding expired fire extinguishers and start
>> to add them
>> to the report. This report could potentially have two divergent
>> lists of
>> fire extinguishers to be replenished/revalidated, as the building's
>> compliance status.
>>
>> 7. Census App
>>
>> Census system - we have mobile apps focused on offline data collection.
>> We have the previous year's info that needs to be updated on the
>> server.
>> The interviewee needs to take a call, then asks the interviewer
>> to come
>> back later. This results in two sets of changes for the same document,
>> stacked together, which should work flawlessly.
>>
>>
>> # Appendix Reference (Open Source) Products:
>>
>>
>> - Wave-in-a-box
>>
>> - CouchDB
>>
>> - Google Drive RealtimeAPI
>>
>> - [diff-merge-patch
>> algorithm](http://code.google.com/p/google-diff-match-patch/)
>>
>> - [Summers' Realtime Sync Demo](http://www.youtube.com/watch?v=WEkZGbVk4Lc)
>>
>> - [Summers' Devnexus Sync
>> Demo](https://plus.google.com/103442292643366117394/posts/HGVHwtPArPW)
>>
>> - Google Android Sync Architecture
>>
>>
>> _______________________________________________
>> aerogear-dev mailing list
>> aerogear-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/aerogear-dev
>>
More information about the aerogear-dev
mailing list