World is in conflict:

my app is online

Speakers


Web application

Client/server

Server

... mostly offline

Server

Synchronization

Actor A State 1 Actor B State 2 Network Actor A State 3 Actor B State 3

As a client,

when I come back online,

how to synchronize with the server?

Common

No(t only) push!

Server Cloud Notification Service

Supported operations

Performed by myself and others

  • Creating an item
  • Modifying an existing item
  • Deleting an item

Selective data model

{ ... } User

Conflicts

client A server client B sync offline change change sync/conflict

Initiated by the client

  1. client logs offline edits
  2. client pushes edits
  3. clients pulls changes which happened on the server

Evaluation

  • Bandwidth
  • Roundtrips
  • Computational cost
  • Recovery
  • Setup

Wholesale

Transfer everything!

Evaluation

  • Bandwidth: O(dataset size)
  • Roundtrips: 1
  • Computational cost: low
  • Recovery: yes
  • Setup: does anyone have a simpler idea?

With versioned entities

{
  type: "user",
  id : 42,
  version : 5,
  firstName : "John",
  lastName : "Smith"
  /* more */
}
[
  { id: 42, version: 5 }
  /* more */
]

Rsync

requester requested gimme your state! MD5(file) { file1: 1A2B, file2: 3C4D, ... } MD5(file) { file1: 1A2B, file2: 5EF6, ... } file2 = {...}

Algorithm

client server gimme your state! entity.version { entity1: 2, entity2: 3 , ... } entity.version { entity1: 2, entity2: 5 , ... } gimme entity2 entity2 = {...}

Versioned entities evaluation

  • Bandwidth: lower but still O(dataset size)
  • Roundtrips: O(changeset size)
  • Computational cost: low
  • Recovery: not to rogue edits
  • Setup: remains ok

Timestamps

Log changes!

Checkpoints

time - identity 100 - /user/1 101 - /user/2 103 - /user/8 104 - /user/3 105 - /user/2 106 - /user/5 ... 242 - /user/5 /user/1: { ... } /user/2: { ... } /user/3: { ... } /user/3: { ... } /user/4: { ... } /user/5: { ... } Client Checkpoint: 104 Changes since 104! /user/2: { ... } /user/5: { ... } checkpoint: 242 Checkpoint: 242

Version identifier

  • Changes whenever anything in dataset changes
  • Retrieve operations between two versions
first commit f34cx4e second commit e324xz third commit w34sxd tree 12qaws tree hf75es tree qa23fc main.js Version 1 w34rd2 new.js qa24zx2 main.js Version2 wa234f new.js Version2 334eds

Clocks

Errors

Server Client checkpoint: 103 checkpoint: 103 nothing! 100 - /user/1 101 - /user/2 103 - /user/8 /user/1: { ... } /user/2: { ... } /user/8: { ... } Rogue edit on /user/8 100 - /user/1 101 - /user/2 103 - /user/8 /user/1: { ... } /user/2: { ... } /user/8: { ... }

Filters


  function(doc, req) {
    return (doc.type && doc.type == "foo")
  }
            

  • Iterate on the whole log
  • Soft deletes
  • Data used in filters must be immutable

Evaluation

  • Bandwidth: efficient (with deduplicates)
  • Roundtrips: just 1
  • Computational cost: low for CPU / up to double database footprint
  • Recovery: errors remain here forever
  • Setup: logging infrastructure / extra field to store on clients

Couch/PouchDB

  • ID
  • Sequence
  • Revision
  • Document
  • Database
  • Source
  • Target
  • Checkpoint

Couch/PouchDB Algorithm

web DB source checkpoint: 1A2B new checkpoint: 3C4D source DB gimme changes since 1A2B collect id/revision { id1: revision3, id2: revision4 , ... } gimme id2 at revision4 { _id: id2, _rev: revision4, ...} send checkpoint: 3C4D

Mathematical approach

let's play!

given n

server: build a data structure of size O(n)

client: compute its delta from the server

succeeds if # of differences is less than n

Algorithm

var server, i, diff;

i = 0;

while (!diff) {

  server = getFromServer(Math.pow(2, i));

  diff = localStore.computeDifference(server);

  i++;
}

local.applyDifference(diff);

Performances

  • Bandwidth: O(changeset size)
  • Roundtrips: O(log(changeset size))
  • Computational cost: ?
  • Error correction: ?
  • Setup: ?

Mathsync

Mathsync data structure

{ name: "Alice" } content: [1A2B] hash: [123] buckets: 1, 2 { name: "Bob" } content: [3C4D] hash: [456] buckets: 2, 3 [{ items: 1 content: [1A2B] hash: [123] }, { items: 2 content: [1A2B] XOR [3C4D] hash: [123] XOR [456] }, { items: 1 content: [3C4D] hash: [456] }]

Mathsync: data structure

var ibf = emptyBuckets(n);
// ibf: [{ hash: [], content: [], items: 0 }, ... ]

items.forEach(function(item) {
  var bytes = serialize(item);

  buckets(bytes).forEach(function (id) {
    var bucket = ibf[id % n];

    bucket.items++;
    bucket.content = xor(bucket.content, bytes);
    bucket.hash = xor(bucket.hash, hash(bytes));
  });
});

Mathsync setup - server

var data = [/* ... */];

function serialize(item) {
  var buffer = new Buffer(item.key + ':' + item.value, 'utf-8');
  return new Uint8Array(buffer).buffer;
}

var summarizer = require('mathsync').summarizer.fromItems(data, serialize);

app.use(route.get('/summary/:level', function* (level) {
  this.body = yield summarizer(level | 0);
}));

Mathsync setup - client

var ms = require('mathsync');
var data = [/* ... */];

// local items
function serialize(item) { /* ... */ }
var local = ms.summarizer.fromItems(data, serialize);

// server items
function fetchSummary(level) {
  return http.getJson('/summary/' + level);
}
var remote = summarizer.fromJSON(fetchSummary);

// delta
function deserialize(buffer) {
  var arr = new Buffer(new Uint8Array(buffer)).toString('utf-8').split(':');
  return { key: arr[0], value: arr[1] };
}
var resolve = ms.resolver.fromSummarizers(local, remote, deserialize);

resolve().then(function (difference) { /* ... */ });

Ahead of time computation

Errors get corrected!

application server inconsistent data synchronize fix bug clean caches

Selective data model

{ ... } Summary Combined summary

Evaluation

  • Bandwidth: O(changeset size)
  • Roundtrips: O(log(changeset size))
  • Computational cost: high
  • Error correction: fix on next synchronization
  • Setup: mathsync?

Conclusion bullets

  • Push is not sufficient
  • Start simple with wholesale transfer!
  • When needed move to log-based solution if its assumptions are validated
  • Otherwise use the complex mathematical approach (and our library!)

Thanks!