Re: [aerogear-dev] Use of Differential Synchronization for data sync

Saturday, 2 August 2014

Hi Randall,

regarding the choice of looking closer a DS, the reason for this was that
Operational Transformation (OT) has been implemented previously at Red Hat
by the Errai team. Instead of doing something with OT again we decided to
try DS and compare the two. Erik Jan de Wit in the AeroGear team used to
work in Errai team and hopefully he can help us with comparing these two
with each other. DS is just one suggestion and I don't know enough about OT
to compare them. I'd like to take a closer look at OT if time permits
though.

I think you have raised interesting concerns/issues, some that I've not
thought about before so let me think and while I don't have answers I might
be able to reply on some of them next week.

Thanks,

/Dan

On 1 August 2014 21:02, Randall Hauch <rhauch(a)redhat.com&gt; wrote:

...
 I’ve really enjoyed learning about what AeroGear has been doing with
data
 sync. This is a tough problem, but finding a solution is really important.
 Both data sync POCs appear to use Differential Synchronization, or DS [1].
 I was not familiar with the paper until today, but after reading it I do
 have a few questions/comments. Bear with me; this is a long post.

 DS is clearly targeted for use within a collaborative document editor,
 where there are multiple clients concurrently editing the same document,
 and at any one time there are a relatively small number of documents being
 edited; you can get a feel for this by looking at figures 5 and 7 in the
 paper [1] — look at the amount of server memory and CPU required to perform
 DS on just one document being edited by a half-dozen clients. Also, in a
 collaborative document editor, clients are often continually making changes
 even as they attempt to synchronize with the server.

 (It’s interesting that Google Docs, and Google Wave before it, appear to
 use Operational Transformation [2] rather than DS. OT might also make it
 easier to implement undo/redo, which works really well in Google Docs.)

 An MBaaS or any other database-like service is very different. It has to
 host multiple applications (i.e., databases), each with multiple
 collections containing potentially millions of entities (e.g., JSON
 documents). The entities themselves are more fine-grained and smaller than
 collaborative documents (though probably a bit coarser-grained and larger
 than a single record in a RDBMS). Many clients might be reading and
 updating lots of documents at once, and the data service has to coordinate
 those changes. A single batch update from one client might request changes
 to dozens of entities. And the clients can/will always wait for
 confirmation that the server made the requested changes before continuing
 (unless the client is offline); or at a minimum can enqueue the requested
 changes.

 Given these characteristics, using DS within the data service might be
 extremely expensive in terms of CPU and memory, and difficult for a
 DS-based service to implement all of the features necessary. First, the
 data service doesn’t really know which entities are being“edited”; instead,
 connected clients read entities, make changes locally, then request the
 service make those changes. Secondly, every time a change comes in, to
 compute the diff the service would have to read the persisted entity; this
 not only is inefficient, but this also makes it more difficult to scale and
 handle the concurrency, consistency, atomicity, and serializability
 guarantees. Thirdly, what would the data service need to do when a client
 connects and asks for the changes since it was last connected? The data
 service might be able to quickly find out which entities were modified
 since then, but computing the diffs (relative to the time the client last
 connected) for all of those changed entities would be very complicated. It
 may be easier and better for the data service to record the individual
 changes (edits) made by each transaction, and then to use that information
 to compute the effective diffs from some period of time. In fact, these
 recorded edits might also be useful to implement other features within the
 data service; see CQRS [3] and [4].

 What is really required by the client when trying to synchronize its data
 after being disconnected? Assuming the client can say which subset of
 entities it’s interested in when it reconnects (via some criteria in a
 subscription), does the client want:

    1. the new versions of those entities that changed;
    2. the deltas in the entities; and/or
    3. all of the events describing the individual changes made to all of
    those entities?

 It may not matter for clients that don’t allow local offline changes, but
 what might the preferred approach be for clients that do allow offline
 changes? Option 1 is clearly the easiest from the perspective of the data
 service, but options #2 and #3 can certainly be handled. With option #1,
 can the client do something like DS and maintain copies of each original
 (unmodified) entity so that it can compute the differences? Does this
 (perhaps with a journal of edits made while offline) provide enough info
 for the client to properly merge the local changes, or does the client
 really need the individual events in #3 so that it can, for example, know
 that some local changes were made to now-out-date data?

 Will the same option work for online notifications? After all, it’d be
 great if the same mechanism was used for data-sync, offline (push)
 notifications, and online notifications (events).

 Finally, the data sync APIs of the data service should support the use of
 local client storage, but it should not require it.

 Best regards,

 Randall

 [1] http://research.google.com/pubs/pub35605.html
 [2] http://en.wikipedia.org/wiki/Operational_transformation
 [3]
 http://www.infoq.com/presentations/Events-Are-Not-Just-for-Notifications
 [4] http://martinfowler.com/bliki/CQRS.html

 _______________________________________________
 aerogear-dev mailing list
 aerogear-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/aerogear-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [aerogear-dev] Use of Differential Synchronization for data sync