[aerogear-dev] sync spec strawman

Summers Pittman supittma at redhat.com
Tue Jan 7 09:21:28 EST 2014


I've updated the sync spec on the data-sync branch on aerogear.org with 
what qmx posted yesterday and some ideas I had as well.  If I don't get 
any tomatoes I will try to see what a POC on Android looks like this 
afternoon.

Sync doc follows.

# Status: Experimental

# AeroGear Data Sync


## basics?

Since we've been catering the enterprise market, this essentially means 
we need to get the __boring__ stuff right first, then move over to the 
__shiny__ stuff, like realtime data sync, update policies & friends.

### data model

For starters, I think that the most important thing that needs to be 
agreed upon is the data model and the atomic operations around it. As 
previous discussed, I really like CouchDB's datamodel -- and hate erlang ;)

`{_id:<guid>, content:<arbitrary json>, rev:<last revision>}`

#### JS

Well, it's JSON, it _Just Works_&trade;

#### Java

I didn't want to pick on Java, but since its fame forces me to it. First 
stab (a courtesy of our friend Dan Bevenius):

     public interface Document {
         public String id;
         public String content;
         public String rev;
     }

We naturally want to kick this a notch, and use objects instead of plain 
strings:

     public interface Document<T, ID> {
         public ID id;
         public T content;
         public String rev;
     }

In this case, we can use the convention requiring that `T` is any 
**object serializable to  JSON**. `ID` is a convenience shorthand since 
it's a **GUID/UUID**. I think this key isn't necessarily a natural key 
(a surrogate key instead).

#### Objective-C

volunteers needed ;)

### Transactions

These are the most basic parts of sync I can think of that our system 
should be able to do/manage.  Our internal representation of the client 
documents and collections should make implementing this automatically 
and without user intervention as simple as possible

* Detect Change

     When a user changes her local data, the system should note the 
change and generate a sync message to send to the server.  This can be 
done automatically or manually but SHOULD be done automatically.

* Send update

      When a sync message is ready to be sent, and the system allows for 
it to be sent (network available, not in blackout window from 
exponential backoff, etc) then sync message should be sent. This being 
done automatically should be the default, but the developer can override 
this behavior.

* Receive Update

     When a client updates it data and successfully syncs to the remote 
server, the remote server will notify all of the relevant clients. The 
client must automatically and without user intervention receive this 
update and either act on it or store it for later processing.

* Apply Update

     Once a client application has an update message from the server, it 
can apply the message correctly to the data on it.  This should be done 
automatically as part of receiving the update, but it may be done 
manually or may be delayed and automatically executed later.

* Detect Conflict

     When applying an update fails, the system must detect this.  The 
system will provide state to the application and/or the user to handle 
the conflict.  The user MUST NOT have to check for conflicts on her own.

* Resolve Conflict

     There must be a mechanism for resolving a conflict.  The CAN be 
done automatically using default resolvers provided by AeroGear, by 
using a resolver provided by the developer/user, or by the app user 
selecting the correct merge.  This will possibly generate a new sync 
message.


### API levels

As soon as we have a rough data-model defined, we can start dabbling 
around different API levels to be served:

(parts **I think** are potentially deliverable for a 1.0)

- level 0: explodes when there's a conflict
- level 1: semi-automatic conflict resolution via something like 
google's diff-match-patch
- level 2: business rules determine who wins a conflicting update 
(supervisor wins over normal user)

(parts **I think** are potentially deliverable for a 2.0)

- level 3: real-time updates via diff-match-patch
- level 4: real-time updates via OT/EC

All those proposed API operations should be serializable, meaning I can 
potentially keep doing changes offline then just replying them to the 
server when online.

### transport

Since we know about the future-looking ideas on v2.0, it would be really 
nice for us to specify a very simple/dumb JSON-based protocol for those 
change messages. Something that could accomodate both the full document 
updates and the OT/EC incremental bits too. I have no ideas on this, tbh.

#### Strawman - Summers

{id : Object, data : String, checksum: long}

**id** :
  This is the global identifier for the object.  This field is optional.

**data** :
  This is the sync data for the application.  It may be a diff, a whole 
object, etc.  This field is required.

**checksum** :
  This is the client's idea of what a known good sync will look like.  
If, post merge, the server's checksum and client's check sum do not 
match then the client is out of sync and must resync and handle the 
conflict.



## Appendix Use Cases:


Here are a few contrived use cases that we may want to keep in mind.

1.  Legacy Bug Trackers From Hell

     a.  It is a webapp written in COBOL, no one will ever EVER update or
change the code

       b.  It has TONS of legacy but important data

       c.  It has TONS of users

       d.  It only has a few transactions per day, all creating and 
updating
bug reports

       e. Multiple users can edit the same report


2.  Slacker Gallery

       a.  Each User has a multiple galleries, each gallery has multiple 
photos

       b.  A Gallery has only one user, but the user may be on multiple 
devices

       c.  Galleries may be renamed, created, and deleted

       d.  Photos may only be created or deleted.  Photos also have meta 
data
which may be updated, but its creation and deletion is tied to the Photo
object.


3.  Dropbox clone

       a.  A folder of files may be shared among users

       b.  There is a size limit to files and how much storage may be used
per folder

       c.  Files are not updated.  If there is a new file, there is an 
atomic
delete and create operation


4.  Email client

        a.  This is an AG-controller which accesses a mail account.

        b.  There are mobile offline and sync enabled clients which connect
to this controller.


5.  Google Docs clone

        a. Operational Transform out the wazzoo

        b.  What would the server need?

        c.  What would the client need?


6.  Building Inspector app

Building inspector system - we have mobile apps that store relevant info 
and are bound to be accessed on places where we won't have any kind of 
connection, or very poor signal.

You can have several inspectors screening the same building simultaneously.

Let's say we have Agnes and Joe are doing the fire extinguisher 
inspection in a new hospital building. Technically each fire 
extinguisher has its own identifier and can be an independent document. 
In this case we would have no conflict happening.

Now they start finding expired fire extinguishers and start to add them 
to the report. This report could potentially have two divergent lists of 
fire extinguishers to be replenished/revalidated, as the building's 
compliance status.

7. Census App

Census system - we have mobile apps focused on offline data collection. 
We have the previous year's info that needs to be updated on the server. 
The interviewee needs to take a call, then asks the interviewer to come 
back later. This results in two sets of changes for the same document, 
stacked together, which should work flawlessly.


# Appendix Reference (Open Source) Products:


- Wave-in-a-box

- CouchDB

- Google Drive RealtimeAPI

- [diff-merge-patch 
algorithm](http://code.google.com/p/google-diff-match-patch/)

- [Summers' Realtime Sync Demo](http://www.youtube.com/watch?v=WEkZGbVk4Lc)

- [Summers' Devnexus Sync 
Demo](https://plus.google.com/103442292643366117394/posts/HGVHwtPArPW)

- Google Android Sync Architecture




More information about the aerogear-dev mailing list