Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Lecture 11: Data Synchronization Techniques for Mobile Devices
© Dimitre Trendafilov 2003
Modified by T. Suel 2004
CS623, 4/20/2004
Problem Definition
Given two versions of a data set on different machines, say an outdated and a current one, how can we update the outdated one with minimum communication cost?
Related Problem: What if data has been changed in several machines? (How to reconcile data: difficult, application dependent)
Obvious Solutions
Send the all of the current data. Compress the current data and then send it. Send only the compressed difference
between the two data sets. If the sender has both versions use a suitable
delta compression tool.What if the sender has no access to the outdated
version?
Two Aspects of the Problem File Synchronization (rsync)
Update an outdated file so that it becomes identical to a current one
Set Reconciliation (today)Assume you have many small data records, but
you only want to send modified recordsE.g., Database with a set of 100-byte recordsUnordered: order of records not importantFind which records need to be transmitted, then
send the entire recordRecord identified by number (hash, record ID)
Applications for Data Synchronization
Synchronizing data between PDA and PC Microsoft briefcase etc. Synchronizing databases over a network Synchronizing a file system in two stages:
find which files have changed (MD5 of files) use rsync on those that have changed
Palm Hot Sync
Relies on metadata maintained on both machines.
The metadata is stored in Palm DB There is one Palm DB for each application
(Date Book, To Do, Address Book, etc) A record in Palm DB consist of unique id,
pointer to the object, and status flag.
Palm Hot Sync Preferred mode of operation:
Fast Sync Exchange only the modified records. Works only if the synchronization is done between
two machines.
Palm Hot Sync “Backup” mode of operation:
Slow Sync Copy all of the data. Used when the last synchronization was done
with different machine.
Timestamps
Maintain a timestamp for each record. Send only the records with timestamp greater then
timestamp of the last synchronization Good for synchronization between two machines
but inefficient for more
SyncML (www.syncml.org, now part of Open Mobile Alliance) Fairly large initiative funded by Ericsson, IBM,
Lotus, Matsushita, Motorola, Nokia Seeks to provide an open standard for
synchronization between different platforms and devices
Uses XML Based on timestamps A device stores a timestamp for each record
and each device it communicates with. N records and M devices result in N*M timestamps Not scalable!
Intellisync Anywhere
Developed by Puma Technologies. Relies on a central server Similar to Fast Sync, but each devices
synchronizes only with the central server. It has a single point of failure The central server can get congested
Characteristic Polynomial Interpolation Synchronization (CPISync)
Time/bandwidth complexity depends on the number of differences.
Computationally expensive – cubic in the number of differences
But can be improved Computations could be done on only one of
the two devices (the faster one) Works in general peer-to-peer environment
CPISync Preliminaries
Each data set can be represented as a set of numbers [using hash functions].
A characteristic polynomial for a sets is:
Note that for two polynomials SA and SB
CPISync
Host A and B evaluate their characteristic polynomials and at the same sample points , .
Host B sends to host A its evaluations The evaluations are combined at host A to
compute . The zeroes in and are determined.
Those are the differences!
IPSync – Finding the Number of Differences Guess a bound. Send evaluations at k random points Verify at k points Repeat with another bound if needed. The probability for error is:
More Techniques: Bloom Filters
Get a bloom filter for the receivers data set Send only elements that are not found in the
bloom filter.
More Techniques:Using Error Correction Codes
Send error correction code for the data set The receiver, “correct the errors” in its
outdated data set. Reed-Solomon Codes Decoding time depends only on the number
of differences between the sets (almost linear, not cubic)
But extra factor of 2 transmission