Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | sibyl-hodges |
View: | 220 times |
Download: | 0 times |
EM416 Choosing the Correct Data Movement Technology
Chris KleisathDirector of EngineeringiAnywhere [email protected]
Overview
When choosing a data movement technology you need to consider:The business requirements for the distributed database.
The technological limitations of your environment.
The development and administrative resources available.
Central Datastore
Mobile Computing
EmbeddedComputing
Workgroup Computing
Enterprise Data Movement
Wired and WirelessCommunication Links
Distributed Systems
Using any data movement technology means we have a distributed database.
This almost always implies some form of distributed application.
What is a Distributed System?
C.J. Date’s working definition:
“A distributed database system consists of a collection of sites, connected together via some sort of network, in which:Each site is a database system in its own right
Sites have agreed to work together (if necessary), so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user’s own site.”
Distributed SystemsPractical Factors
Not all systems require that all data be available to all sites.
Not all systems require that all data be consistent between all sites all of the time.
The degree to which your system must meet the ideal definition is the single biggest factor in choosing your data movement technology.
Issues when distributing data
• Local autonomy• Data partitioning (fragmentation)• Consistency• Transaction control• Accessibility (connection)• Topology
Local autonomy
Each site should operate independently of the other sites.
No site should depend on another site for its successful functioning.
A centralized database provides the lowest level of local autonomy.
Decentralized systems provide the highest level of local autonomy.
Data Partitioning
Also known as fragmentation.Only the data needed by a site is present at
the site.The database at a site is a “complete” subset
of the data.Some data will need to be duplicated
between sites.
Data PartitioningUpdate Anywhere
Primary keys must be unique across the entire distributed system.If multiple sites insert into the same table.
Requires a conflict detection and resolution mechanism.If multiple sites are able to change the same row.
Tight vs. Loose Consistency
Which version of the data is being used?
Waterloo
Acct200 1000
Account
AcctNo Bal
Paris
Acct200 1000
Account
AcctNo Bal
Consistency
Tight consistency requires all data to be in a consistent state.
Loose consistency permits data to be “out-of-date”.
Latency is the measure of how long it takes the data to become consistent.
In some cases it is never consistent since there are always changes that have not been moved.
Transaction Control
Your chosen technology must pass the ACID testAtomicity, Consistency, Isolation, Durability
Only committed data should move.Committed data must move.Failure to successfully move committed data
must be detectable.Changes must be applied in the same order
on all databases.
Accessibility
What kind of “network” do you have between the sites?High-speed LAN/WAN
Low-speed Dial-up (RAS)
Wireless
Indirect (email, ftp)
Internet (HTTP)
Sneaker-net
Topology
What kind of relationship exists between the sites.
Peer-to-peerEach site can transfer data to any other site.
No centralized master copy can exist.
Conflict resolution is extremely difficult
• There is no place to detect and resolve the conflict
Topology
HierarchicalEach site passes data up and down the hierarchy.
A central master copy (consolidated database) exists.
Data must pass through the consolidated to move to another site.
Conflict detection and resolution is implemented on the consolidated.
Other Issues
Number of sitesSome technologies are better suited to mass deployment.
VendorsAre the databases at each site the same product?
Is the technology commercially available and supported?
FactorsSummary
Each of the following factors will influence your choice of data movement technologies:Local autonomy
Data partitioning (fragmentation)
Consistency
Transaction control
Accessibility (connection)
Topology
Types of Data Movement
All technologies can be categorized as one of:Online
Synchronization
Replication
Online
Changes are made “simultaneously” on all databases.
Waterloo
Acct200 1000
Account
AcctNo Bal
Paris
Acct200 1000
Account
AcctNo Bal
Withdraw $100
Please wait while your account is being updated…
Online
In its “simplest” form the application updates all of the databases directly.
Underlying technology is normally “Two Phase Commits”
Sybase products:EAServer
Not really data movement but is appropriate in some systems.
Online Characteristics:Local Autonomy
Very low level of local autonomy.If one site is down the entire system is down.
Waterloo
Acct200 1000
Account
AcctNo Bal
Paris
Acct200 1000
Account
AcctNo Bal
Withdraw $100
Sorry the System is Unavailable X
Online Characteristics:Data Partitioning
Data can be partitioned as required.If the data is partitioned the application must
update the row(s) everywhere.Since transactions are applied at all
databases simultaneously no primary key or conflict issues arise.
Online Characteristics:Consistency
Use when tight consistency is an absolute requirement.
Transactions will succeed or fail on all databases.
Online Characteristics:Transaction Control
A Distributed Transaction Server (DTS) should be used.Ensures the transaction is applied on all sites or not at all.
Very expensive to code yourself.
Both ASA and ASE provide support for a DTS
Online Characteristics:Accessibility
Requires a reliable network connection between sites.Transactions will fail if one database is unavailable.
Application speed will be affected by network speed.
Online Characteristics
Typically a peer-to-peer topology.Since all databases are updated at once no master copy is
required.
Online CharacteristicsOther Issues
“Simple” to understandLooks just like a centralized database.
Very few sites can be supportedConsider the cost of updating many databases at once.
VendorsHeterogeneous environments are “easily” supported.
Synchronization
Current state of the data is moved between databases.
Can be a complete refresh or only the rows that have changed.
Sybase products:MobiLink
Synchronization
1234 10
Product
sku_key qty_oh
1234 10
Product
sku_key qty_oh
109X1098
XX
108X
UPDATE Product SET qty_oh = 8WHERE sku_key = 1234
Synchronization CharacteristicsLocal Autonomy
High local autonomySite database must have all of the data
required for the application to run.
Synchronization CharacteristicsData Partitioning
Data is usually partitioned.Each site has common data and site specific
data.Update anywhere requires:
Unique primary keys.
Conflict detection and resolution mechanism.
Synchronization CharacteristicsConsistency
Low to high consistency.Data is only consistent immediately after
synchronization.Frequency of synchronization affects level of
consistency but in all cases there is some latency.
Synchronization CharacteristicsTransaction Control
Transaction boundaries are not maintained.• Some operation sequences can not be
synchronized. (i.e. insert then delete of a row with the same primary key value)
Most synchronization technologies “batch” the operations.
• e.g. all deletes, then inserts, then updates
Synchronization CharacteristicsAccessibility
Requires a stable network connection during the synchronization process.
Connection speed affects the amount of data that can be reasonably synchronized.
Synchronization CharacteristicsTopology
Both peer-to-peer and hierarchical topologies are possible.
Peer-to-peer is difficult if update anywhere is permitted.Which copy of the data is correct?
Who resolves an update conflict?
Synchronization CharacteristicsOther Issues
Heterogeneous environments can be supported.
• Be aware of compatibility issues. E.g. Oracle allows 1 varchar column/table, ASA has no limit.
Because each site synchronizes independently many sites can be supported.
MobiLink
ASA, ASE, Microsoft, Oracle, IBM
ASA, PalmOS, CE, Pagers, Phones
HTTP, TCPIP HotSync, WirelessSerial
MobiLink Characteristics
Complete local autonomy.Complete control over data partitioning on
the consolidated through the use of scripts.Uses the consolidated database’s scripting language or
Java.
No partitioning allowed on the remote.
MobiLink Characteristics
Session based.Only changed records are synchronized.
Connection only required while synchronizing.
Bi-directional by default.Medium to high latency.Low to medium data volume.
MobiLink Characteristics
Hierarchical topology.Consolidated can be any ODBC-based database
• Sybase, Microsoft, Oracle, IBM
ASA and/or UltraLite remotes.
Optimized for thousands of remotes.Scalable based on consolidated database’s
capabilities.
MobiLink Synchronization Components
TCP/IP
ODBC
MobiLink Client(ASA or UltraLite)
Consolidated Data Store
Consolidated Database Server
Remote Data Store
Remote Database Server (ASA or UltraLite)
MobiLinkServer
TCP/IP
MobiLink Synchronization Server
Provides interface between consolidated database and remote server.
Works with ODBC-based host databases.Responsible for ensuring the synchronization
process completes. Supports multiple simultaneous
synchronizations.
MobiLink Consolidated Synchronization Logic
SQL statements executed against the consolidated database.Written in language of consolidated database or Java.
Guides the synchronization server.Controls the flow of data in both directions.
Handles conflicts.
MobiLink Remote Synchronization Logic
ASA and UltraLite keep track of changes to the data.
A synchronization component is provided to:Scan for changes to create the upload stream
Receive the download stream and apply the changes to the remote
Replication
Transactions (changes) are moved between the databases.
Uses store and forward mechanism.Site(s) must have a common starting point.Sybase Products:
SQL Remote
Replication Server
Replication
1234 10
Product
sku_key qty_oh
Product
sku_key qty_oh
109X1098
XX
UPDATE Product SET qty_oh = 8WHERE sku_key = 1234
UPDATE Product SET qty_oh = 9WHERE sku_key = 1234
1234 10109X1098
XX
Replication CharacteristicsLocal Autonomy
High local autonomy.Database must have all of the data required
for the application to run.
Replication CharacteristicsData Partitioning
Data is usually partitioned.Each site has common data and site specific
data.Update anywhere requires:
Unique primary keys.
Conflict detection and resolution mechanism.
Replication CharacteristicsConsistency
Low to high consistency is possible.Speed of store and forward messaging
system determines how consistent the database is.
Some latency is always present.
Replication CharacteristicsTransaction Control
Mechanism must exist to guarantee transactions are:Sent and applied in the correct order.
No transactions are skipped
Replication CharacteristicsAccessibility
Whether a direct connection is required or not is dependant on the latency requirements.Not required in high latency implementations.
Replication CharacteristicsTopology
Both peer-to-peer and hierarchical topologies can be used.
Conflict resolution normally requires a hierarchical model.
Replication CharacteristicsOther Issues
Only transactions are moved therefore:• It is possible to support many sites.• Throughput is usually independent of
database size.
Replication Server
Adaptive Server
Adaptive Server
Replication Agent
Replication Agent
Replication Server
Replication Server
DirectCONNECT(Native drivers)DirectCONNECT(Native drivers)
Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS
Adaptive Server/Enterprise Adaptive Server/Anywhere Oracle Informix OS/390 DB2 Replication Toolkit for MVS
Replicate SitesReplicate SitesReplicate SitesReplicate SitesPrimary SitesPrimary SitesPrimary SitesPrimary Sites
DirectCONNECT/Anywhere (ODBC)DirectCONNECT/Anywhere (ODBC)
Replication Server Characteristics
Transactions are sent to Replication Server which stores and forwards them to the interested sites.
Assumes there is normally a high speed connection.Near real time (low latency).High data volumes.Moderate number of sites.Heterogeneous databases supported.Uni-directional by default.
Replication Server
Replication Agent
Replication Agent
Replication Server
Replication Server
Replicate SitesReplicate SitesReplicate SitesReplicate SitesPrimary SitesPrimary SitesPrimary SitesPrimary Sites
Replication Server ComponentsPrimary Site
Origin of the data being moved.Multiple vendors RDBMS supported.Keeps a record of all transactions. Normally
this is in the transaction log but it depends on the RDBMS.
Replication Server ComponentsReplication Agent
Scans the primary site’s record of transactions.
Passes the committed transactions, in the order they were applied, to Replication Server.
Replication Server ComponentsReplication Server
Receives transactions from the Replication Agents.Stores the transactions until they are successfully
applied on all replicate sites.Maintains a connection to all replicate sites.
Automatically recovers when a connection is dropped and restored
Determines which site(s) require the transaction and applies them in the correct order.
Replication Server ComponentsReplication Server
Prevents “circular” transactions.Provides user programmable “function
strings” to allow manipulation of the transaction.Data conversions (e.g. date formats)
Conversion of SQL in heterogeneous environments.
Detects SQL errors.
Replication Server ComponentsReplicate Site
Applies SQL sent by Replication Server.A replicate site can also be defined as a
primary site if bi-directional replication is required.
SQL Remote
ASE
MAPI VIM
FILE
FTP SMTP
ASA
ASA
MAPI VIM
FILE
FTP SMTP
ASA
OR
SQL Remote Characteristics
Complete local autonomy.Partitioning based on:
Column values
Subqueries
Where clauses
Message based (no connection)MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File
Very loose consistency.
SQL Remote Characteristics
Built in guaranteed message delivery.Hierarchical
Consolidated is either ASA or ASE
Remotes are ASA
Homogeneous.Many (thousands) of remotes.Low to medium data volumes.
SQL Remote Components
Message Agent
Remote Data Store
Remote DatabaseServer
Message System
Message Agent
Consolidated Data Store
Consolidated DatabaseServer
SQL Remote ComponentsConsolidated Database
Contains a copy of all data that is replicating.Performs conflict detection and resolution.Transactions are recorded in the transaction
log.Maintains additional data in the transaction
log about what transactions are eligible to replicate and how they are partitioned.
SQL Remote ComponentsMessage Agent
Scans the transaction log for committed transactions that are eligible to replicate.
Builds messages for the sites that have subscribed to the transactions.
Interfaces with the message system.Guarantees that transactions are:
Sent in the correct order.Applied in the correct order and only applied once.No transactions are skipped.
SQL Remote ComponentsMessage Agent
Receives transactions from the message system.
Applies the transactions.Prevents “circular” transactions.Detects update conflicts.Detects SQL errors.
SQL Remote ComponentsMessage System
Provides the store and forward technology.Support for:
MAPI
SMTP
VIM
File
FTP
SQL Remote ComponentsRemote Database
Contains data the site is subscribed to. Transactions are recorded in the transaction
log.Maintains additional data in the transaction
log about what transactions are eligible to replicate and how they are partitioned.
Which Technology should I Choose?
Depends on the business requirements and technological infrastructure available.
Consistency and latency are the biggest factors.
Use EAServer When …
Absolute consistency is required (Zero latency).
Transactions must fail when one of the site databases is unavailable.
There are very few sites.
Use MobiLink When …
Latency is permitted.Local autonomy is required.A reliable connection exists.You have low to medium data volumes.There are heterogeneous databases.You do not require transaction boundaries to be
maintained.You have a hierarchical topology.You have many remotes.You must know when your changes have been
synchronized.
Use Replication Server When …
Near real-time consistency is required.You have high data volumes.Local autonomy is required.There are heterogeneous databases.You require transaction boundaries to be
maintained.You have a peer-to-peer topology.
You can implement a hierarchical topologyA small number of sites.
Use SQL Remote when …
Latency is not a factor.No direct connection exists (or is not permitted) or
the connection is unreliable,.Local autonomy is required.You have low to medium data volumes.Homogeneous (ASA & ASE) databases.Hierarchical topology.You have many remotes.You require transaction boundaries to be maintained.
Or Combine them
All four products are compatible.Use the strengths of each to solve your
business problems.Replication Server or EAServer between main
geographical databases.
MobiLink or SQL Remote for mass deployed devices.
Summary
Sybase has many different methods of maintaining data in distributed databases.
Your business requirements dictate which method is best.
All the technologies can be used together on the same database.
EM 416 Choosing the Correct Data Movement Technology
Chris KleisathDirector of EngineeringiAnywhere [email protected]