+ All Categories
Transcript
Page 1: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

1

Fundamentals of Fundamentals of Distributed SystemsDistributed Systems

..

Jim GrayJim GrayResearcherResearcher

Microsoft Corp.Microsoft [email protected]@Microsoft.com

Prof. Andreas ReuterProf. Andreas ReuterProfessorProfessor

U. StuttgartU. [email protected]@Informatik.uni-stuttgart.de

Page 2: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

2

OutlineOutlineConcepts and TerminologyConcepts and Terminology

Why DistributedWhy Distributed

Distributed data & objectsDistributed data & objects

Distributed executionDistributed execution

Three tier architecturesThree tier architectures

Transaction conceptsTransaction concepts

Goal: What you need to know to understand Goal: What you need to know to understand Microsoft Transaction Server Microsoft Transaction Server (or CORBA or …)(or CORBA or …)

Page 3: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

3

What’s a Distributed What’s a Distributed System?System?

Centralized: Centralized: everything in one placeeverything in one place stand-alone PC or Mainframestand-alone PC or Mainframe

Distributed: Distributed: some parts remotesome parts remote

distributed usersdistributed users distributed executiondistributed execution distributed datadistributed data

Page 4: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

4

Why Distribute?Why Distribute?

No best organizationNo best organization

Companies constantly swing betweenCompanies constantly swing between Centralized: focus, control, economyCentralized: focus, control, economy Decentralized: adaptive, responsive, competitiveDecentralized: adaptive, responsive, competitive

Why distribute?Why distribute? reflect organization or application structure reflect organization or application structure empower users / producersempower users / producers improve service (response / availability)improve service (response / availability) distributed loaddistributed load use PC technology (economics)use PC technology (economics)

Page 5: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

5

What What Should Be Distributed? Should Be Distributed?

Users and User InterfaceUsers and User Interface Thin client Thin client

ProcessingProcessing Trim clientTrim client

DataData Fat clientFat client

Will discuss tradeoffs later Will discuss tradeoffs later

Database

Business Objects

workflow

Presentation

Page 6: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

6

Transparency Transparency in Distributed Systemsin Distributed Systems

Make distributed system as easy to use and Make distributed system as easy to use and manage as a centralized systemmanage as a centralized system

Give a Single-System ImageGive a Single-System Image

Location transparency:Location transparency: hide fact that object is remotehide fact that object is remote hide fact that object has movedhide fact that object has moved hide fact that object is partitioned or replicatedhide fact that object is partitioned or replicated

Name doesn’t change if object is replicated, Name doesn’t change if object is replicated, partitioned or moved.partitioned or moved.

Page 7: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

7

Naming- The basicsNaming- The basics Objects haveObjects have

Globally Unique Identifier (GUIDs)Globally Unique Identifier (GUIDs) location(s) = address(es) location(s) = address(es) name(s) name(s) addresses can changeaddresses can change objects can have many namesobjects can have many names

Names are context dependent:Names are context dependent: (Jim @ KGB (Jim @ KGB Jim @ CIA)Jim @ CIA)

Many naming systemsMany naming systems UNC: \\node\device\dir\dir\dir\objectUNC: \\node\device\dir\dir\dir\object Internet: http://node.domain.root/dir/dir/dir/objectInternet: http://node.domain.root/dir/dir/dir/object LDAP: ldap://ldap.domain.root/o=org,c=US,cn=dirLDAP: ldap://ldap.domain.root/o=org,c=US,cn=dir

guid

Jim

Address

James

Page 8: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

8

Name ServersName Serversin Distributed Systemsin Distributed Systems

Name servers translate Name servers translate names + context names + context

to address (+ GUID)to address (+ GUID) Name servers are partitioned Name servers are partitioned

(subtrees of name space)(subtrees of name space) Name servers replicate root Name servers replicate root

of name treeof name tree Name servers form a hierarchyName servers form a hierarchy Distributed data from hell: Distributed data from hell:

high read traffic high read traffic high reliability & availabilityhigh reliability & availability

autonomyautonomy

root

North

South

Southernnames

Northernnames

root

Page 9: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

9

Autonomy Autonomy in Distributed Systemsin Distributed Systems

Owner of site Owner of site (or node, or application, or database)(or node, or application, or database)Wants to control itWants to control it

If my part is working , If my part is working , must be able to access & manage itmust be able to access & manage it

(reorganize, upgrade, add user,…)(reorganize, upgrade, add user,…)

Autonomy isAutonomy is EssentialEssential Difficult to implement. Difficult to implement. Conflicts with global consistencyConflicts with global consistency

examples: naming, authentication, admin…examples: naming, authentication, admin…

Page 10: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

10

Security Security The BasicsThe Basics

Authentication server Authentication server subject + Authenticator => subject + Authenticator =>

(Yes + token) | (Yes + token) | NoNo

Security matrix:Security matrix: who can do what to whomwho can do what to whom Access control list is Access control list is

column of matrixcolumn of matrix ““who” is authenticated IDwho” is authenticated ID

In a distributed system, In a distributed system, “who” and “what” and “whom” “who” and “what” and “whom” are distributed objectsare distributed objects

subject

Object

Permissions

Page 11: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

11

Security Security in Distributed Systemsin Distributed Systems

Security domainSecurity domain: : nodes with a shared security server.nodes with a shared security server.

Security domains can have trust relationships:Security domains can have trust relationships: A trusts B: A “believes” B when it says this is Jim@BA trusts B: A “believes” B when it says this is Jim@B

Security domains form a hierarchy.Security domains form a hierarchy. Delegation: Delegation: passing authority to a server passing authority to a server

when A asks B to do something when A asks B to do something (e.g. print a file, read a database)(e.g. print a file, read a database)B may need A’s authorityB may need A’s authority

Autonomy requires:Autonomy requires: each node is an authenticatoreach node is an authenticator each node does own security checkseach node does own security checks

Internet Today: Internet Today: no trust among domains no trust among domains (fire walls, many passwords)(fire walls, many passwords) trust based on digital signaturestrust based on digital signatures

Page 12: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

12

Clusters Clusters The Ideal Distributed System.The Ideal Distributed System.

Cluster is distributed Cluster is distributed system BUT singlesystem BUT single locationlocation managermanager security policysecurity policy

relatively homogeneousrelatively homogeneous

communications iscommunications is high bandwidthhigh bandwidth low latencylow latency low error ratelow error rate

Clusters use Clusters use distributed distributed

system system techniques techniques forfor load distributionload distribution

storage storage executionexecution

growthgrowth fault tolerancefault tolerance

Page 13: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

13

Cluster: Shared What?Cluster: Shared What? Shared Memory MultiprocessorShared Memory Multiprocessor

Multiple processors, one memoryMultiple processors, one memory all devices are localall devices are local DEC or SGI or Sequent 16x nodesDEC or SGI or Sequent 16x nodes

Shared Disk ClusterShared Disk Cluster an array of nodesan array of nodes all shared common disksall shared common disks VAXcluster + OracleVAXcluster + Oracle

Shared Nothing ClusterShared Nothing Cluster each device local to a nodeeach device local to a node ownership may changeownership may change Tandem, SP2, WolfpackTandem, SP2, Wolfpack

Page 14: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

14

OutlineOutlineConcepts and TerminologyConcepts and Terminology

Why DistributeWhy Distribute

Distributed data & objectsDistributed data & objects PartitionedPartitioned ReplicatedReplicated

Distributed executionDistributed execution

Three tier architecturesThree tier architectures

Transaction conceptsTransaction concepts

Page 15: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

15

Partitioned Data Partitioned Data Break file into disjoint groupsBreak file into disjoint groups

Exploit data access localityExploit data access locality Put data near consumerPut data near consumer Less network trafficLess network traffic Better response timeBetter response time Better availabilityBetter availability Owner controls data Owner controls data

autonomy autonomy

Spread LoadSpread Load data or traffic may exceed data or traffic may exceed

single storesingle store

OrdersN.A. S.A. Europe Asia

Page 16: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

16

How to Partition Data?How to Partition Data? How to PartitionHow to Partition

by attribute or by attribute or random or random or by source or by source or by useby use

Problem: to find it must haveProblem: to find it must have Directory (replicated) orDirectory (replicated) or AlgorithmAlgorithm

Encourages Encourages attribute-based partitioningattribute-based partitioning

N.A. S.A. Europe Asia

Page 17: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

17

Replicated DataReplicated DataPlace fragment at many sitesPlace fragment at many sites

Pros:Pros:+ Improves availabilityImproves availability+ Disconnected (mobile) operationDisconnected (mobile) operation+ Distributes loadDistributes load+ Reads are cheaperReads are cheaper

Cons:Cons: N times more updates N times more updates N times more storageN times more storage

Placement strategies:Placement strategies: Dynamic: cache on demandDynamic: cache on demand Static: place specific Static: place specific

Catalog

Page 18: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

18

Updating Replicated DataUpdating Replicated Data When a replica is updated, how do changes propagate?When a replica is updated, how do changes propagate?

Master copy, many slave copies (SQL Server)Master copy, many slave copies (SQL Server) always know the correct value (master)always know the correct value (master) change propagation can be change propagation can be

transactionaltransactional as soon as possibleas soon as possible periodicperiodic on demandon demand

Symmetric, and anytime (Access)Symmetric, and anytime (Access) allows mobile (disconnected) updatesallows mobile (disconnected) updates updates propagated ASAP, periodic, on demandupdates propagated ASAP, periodic, on demand non-serializablenon-serializable colliding updates must be reconciled.colliding updates must be reconciled. hard to know “real” valuehard to know “real” value

Page 19: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

19

Replication and Partitioning Replication and Partitioning ComparedCompared

CentralCentralScaleupScaleup2x2x

more workmore work

Partition Partition ScaleupScaleup2x2x

more workmore work

ReplicationReplicationScaleupScaleup4x4x

more workmore work

ReplicationPartitioningTwo 1 TPS systems

1 TPS server100 Users

1 TPS server100 Users

O t

ps

O t

ps

1 TPS server100 Users

Base casea 1 TPS system

2 TPS server200 Users

Scaleupto a 2 TPS centralized system

Two 2 TPS systems

2 TPS server100 Users

2 TPS server100 Users

1 t

ps

1 t

ps

Page 20: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

20

OutlineOutlineConcepts and TerminologyConcepts and Terminology Why Distribute Why Distribute

Distributed data & objectsDistributed data & objects PartitionedPartitioned ReplicatedReplicated

Distributed executionDistributed execution remote procedure callremote procedure call queuesqueues

Three tier architecturesThree tier architectures

Transaction conceptsTransaction concepts

Page 21: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

21

Distributed ExecutionDistributed ExecutionThreads and MessagesThreads and Messages

Thread is Execution unitThread is Execution unit(software analog of cpu+memory)(software analog of cpu+memory)

Threads execute at a nodeThreads execute at a node

Threads communicate viaThreads communicate via Shared memory (local)Shared memory (local) Messages (local and remote)Messages (local and remote)

threads

shared memory

messages

Page 22: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

22

Peer-to-Peer or Client-ServerPeer-to-Peer or Client-Server

Peer-to-Peer is symmetric:Peer-to-Peer is symmetric: Either side can sendEither side can send

Client-serverClient-server client sends requestsclient sends requests server sends responsesserver sends responses simple subset of peer-to-peersimple subset of peer-to-peer

requestresponse

Page 23: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

23

Connection-less Connection-less oror Connected Connected Connection-less Connection-less

request containsrequest contains client idclient id client contextclient context work requestwork request

client authenticated on each client authenticated on each messagemessage

only a single response only a single response messagemessage

e.g. HTTP, NFS v1 e.g. HTTP, NFS v1

Connected Connected (sessions)(sessions)open - request/reply - closeopen - request/reply - closeclient authenticated onceclient authenticated onceMessages arrive in orderMessages arrive in orderCan send many replies Can send many replies (e.g. FTP)(e.g. FTP)

Server has client contextServer has client context (context sensitive) (context sensitive) e.g. Winsock and ODBC e.g. Winsock and ODBC HTTP adding connectionsHTTP adding connections

Page 24: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

24

Remote Procedure Call: Remote Procedure Call: The key to transparencyThe key to transparency

Object may be Object may be local or remotelocal or remote

Methods on Methods on object work object work wherever it is.wherever it is.

Local Local invocationinvocation

y = pObj->f(x);

f()

x

valy = val;

return val;

Page 25: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

25

Remote Procedure Call: Remote Procedure Call: The key to transparencyThe key to transparency

Remote invocationRemote invocation

Obj Local?x

valy = val;

f()

return val;

y = pObj->f(x);

marshal

unmarshal

marshal

unmarshal

x

proxy

unmarshal

pObj->f(x)

marshal

xstub

Obj Local?Obj Local?x

val

f()

return val;

val

val

Page 26: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

26

Transaction

Object Request Broker (ORB)Object Request Broker (ORB) Orchestrates RPCOrchestrates RPC

Registers ServersRegisters Servers Manages pools of serversManages pools of servers Connects clients to serversConnects clients to servers Does Naming, request-level authorization,Does Naming, request-level authorization, Provides transaction coordination Provides transaction coordination (new feature)(new feature) Old names: Old names:

Transaction Processing Monitor, Transaction Processing Monitor, Web server, Web server, NetWareNetWare

Object-Request Broker

Page 27: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

27

So

lari

sU

NIX

Inte

rnat

ion

al

OSFDCE

Op

en s

oft

war

e F

ou

nd

atio

n (

OS

F)

NT

ODBCXA / TX

DCE

RPC

GUIDs

IDL

DNS

Kerber

os

COM

Ob

ject

M

anag

emen

t G

rou

p (

OM

G)

CORBAOpenGroup

History and Alphabet SoupHistory and Alphabet Soup

1985

1990

1995

X/O

pen

Microsoft DCOM based on OSF-DCE TechnologyDCOM and ActiveX extend it

Page 28: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

28

Send updates to correct partitionSend updates to correct partition

Using RPC for TransparencyUsing RPC for TransparencyPartition TransparencyPartition Transparency

part Local?xy = pfile->write(x);

sendto

correct partition

sendto

correct partition

x

unmarshal

pObj->write(x)

marshal

x

x

val

write()

return val;

val

val

Page 29: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

29

Send updates to EACH nodeSend updates to EACH node

Using RPC for TransparencyUsing RPC for TransparencyReplication TransparencyReplication Transparency

xy = pfile->write(x);

Sendto

eachreplica

Sendto

eachreplica

x

val

Page 30: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

30

Client/Server InteractionsClient/Server Interactions All can be done with RPCAll can be done with RPC

Request-ResponseRequest-Responseresponse may be many messagesresponse may be many messages

ConversationalConversationalserver keeps client contextserver keeps client context

DispatcherDispatcherthree-tier: complex operation at serverthree-tier: complex operation at server

QueuedQueuedde-couples client from serverde-couples client from serverallows disconnected operationallows disconnected operation

C S

C S

C S SS

C SS

Page 31: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

31

Queued Request/ResponseQueued Request/Response Time-decouples client and serverTime-decouples client and server

Three TransactionsThree Transactions

Almost real time, ASAP processingAlmost real time, ASAP processing

Communicate at each other’s convenienceCommunicate at each other’s convenienceAllows mobile (disconnected) operationAllows mobile (disconnected) operation

Disk queues survive client & server failuresDisk queues survive client & server failures

Client Server

SubmitPerform

Response

Page 32: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

32

Why Queued Processing?Why Queued Processing? Prioritize requestsPrioritize requests

ambulance dispatcher favors high-priority callsambulance dispatcher favors high-priority calls

Manage WorkflowsManage Workflows

Deferred processing in mobile appsDeferred processing in mobile apps

Interface heterogeneous systemsInterface heterogeneous systemsEDI, EDI, MOM: Message-Oriented-Middleware MOM: Message-Oriented-Middleware DAD: Direct Access to Data DAD: Direct Access to Data

Order Build Ship Invoice Pay

Page 33: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

33

OutlineOutlineConcepts and TerminologyConcepts and Terminology Why DistributedWhy Distributed

Distributed data & objectsDistributed data & objects

Distributed executionDistributed execution remote procedure callremote procedure call queuesqueues

Three tier architecturesThree tier architectures whatwhat whywhy

Transaction conceptsTransaction concepts

Page 34: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

34

Work Distribution SpectrumWork Distribution Spectrum

Presentation Presentation and plug-insand plug-ins

Workflow Workflow manages manages session & session & invokes objectsinvokes objects

Business Business objectsobjects

DatabaseDatabase

Fat

ThinFat

Thin

Database

Business Objects

workflow

Presentation

Page 35: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

35

Transaction Processing Transaction Processing Evolution to Three TierEvolution to Three TierIntelligence migrated to clients Intelligence migrated to clients

Mainframe Batch processing Mainframe Batch processing (centralized)(centralized)

Dumb terminals &Dumb terminals & Remote Job Entry Remote Job Entry

Intelligent terminals Intelligent terminals database backendsdatabase backends

Workflow SystemsWorkflow SystemsObject Request BrokersObject Request BrokersApplication GeneratorsApplication Generators

Mainframe

cards

Active

green screen3270

Server

TP Monitor

ORB

Page 36: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

36

Web Evolution to Three TierWeb Evolution to Three TierIntelligence migrated to clients (like TP)Intelligence migrated to clients (like TP)

Character-mode clients, Character-mode clients, smart serverssmart servers

GUI Browsers - Web file serversGUI Browsers - Web file servers

GUI Plugins - Web dispatchers - CGIGUI Plugins - Web dispatchers - CGI

Smart clients - Web dispatcher (ORB)Smart clients - Web dispatcher (ORB)pools of app servers (ISAPI, Viper)pools of app servers (ISAPI, Viper)workflow scripts at client & serverworkflow scripts at client & server

archie ghophergreen screen

WebServer

Mosaic

WAIS

NS & IE

Active

Page 37: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

37

PC Evolution to Three TierPC Evolution to Three Tier Intelligence migrated to serverIntelligence migrated to server

Stand-alone PC Stand-alone PC (centralized)(centralized)

PC + File & print serverPC + File & print servermessage per I/Omessage per I/O

PC + Database server PC + Database server message per SQL message per SQL

statementstatement

PC + App server PC + App server message per transactionmessage per transaction

ActiveX Client, ORB ActiveX Client, ORB ActiveX server, Xscript ActiveX server, Xscript

disk I/OIO request

reply

SQL Statement

Transaction

Page 38: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

38

The Pattern: The Pattern: Three Tier ComputingThree Tier Computing

Clients do presentation, gather inputClients do presentation, gather input

Clients do some workflow (Xscript)Clients do some workflow (Xscript)

Clients send high-level requests to Clients send high-level requests to ORB (Object Request Broker)ORB (Object Request Broker)

ORB dispatches workflows and ORB dispatches workflows and business objects -- proxies for client, business objects -- proxies for client, orchestrate flows & queuesorchestrate flows & queues

Server-side workflow scripts call on Server-side workflow scripts call on distributed business objects to distributed business objects to execute taskexecute task

Database

Business Objects

workflow

Presentation

Page 39: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

39

The Three The Three TiersTiers

Web Client

HTML

VB or Java Script Engine

VB or Java Virt Machine

VBscritptJavaScrpt

VB Javaplug-ins

InternetORB

HTTP+DCOM

ObjectserverPool

MiddlewareORB

TP MonitorWeb Server...

DCOM (oleDB, ODBC,...)

Object & Dataserver.

LU6.2

IBMLegacy Gateways

Page 40: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

40

Why Did Everyone Go To Why Did Everyone Go To Three-Tier?Three-Tier?

ManageabilityManageability Business rules must be with dataBusiness rules must be with data Middleware operations toolsMiddleware operations tools

Performance (scaleability)Performance (scaleability) Server resources are preciousServer resources are precious ORB dispatches requests to server poolsORB dispatches requests to server pools

Technology & PhysicsTechnology & Physics Put UI processing near userPut UI processing near user Put shared data processing near shared Put shared data processing near shared

datadataDatabase

Business Objects

workflow

Presentation

Page 41: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

41

DAD’sRaw Data

Customer comes to storeTakes what he wantsFills out invoiceLeaves money for goods

Easy to buildNo clerks

Why Put Business Objects Why Put Business Objects at Server?at Server?

Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoiceCustomer pays clerk, gets goods

Easy to manageClerks controls accessEncapsulation

MOM’s Business Objects

Page 42: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

42

What Middleware DoesWhat Middleware Does ORB, TP Monitor, Workflow Mgr, Web Server ORB, TP Monitor, Workflow Mgr, Web Server

Registers transaction programs Registers transaction programs

workflow and business objects (DLLs)workflow and business objects (DLLs) Pre-allocates server poolsPre-allocates server pools Provides server execution environmentProvides server execution environment Dynamically checks authorityDynamically checks authority

(request-level security)(request-level security)

Does parameter bindingDoes parameter binding Dispatches requests to serversDispatches requests to servers

parameter bindingparameter binding load balancingload balancing

Provides QueuesProvides Queues Operator interfaceOperator interface

Page 43: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

43

Server Side ObjectsServer Side Objects Easy Server-Side ExecutionEasy Server-Side Execution

ORB gives simple ORB gives simple execution environmentexecution environment

Object gets Object gets startstart invokeinvoke shutdownshutdown

Everything else is Everything else is automaticautomatic

Drag & Drop Business Drag & Drop Business ObjectsObjects

NetworkNetwork

Thread PoolThread Pool

QueueQueue

ConnectionsConnections

ContextContext SecuritySecurity

Shared Data

ReceiverReceiver

SynchronizationSynchronization

Service logic

Co

nfig

ura

tion

Co

nfig

ura

tion

Ma

na

ge

me

nt

Ma

na

ge

me

nt

A Server

Page 44: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

44

Why Server Pools?Why Server Pools? Server resources are precious.Server resources are precious.

Clients have 100x more power than server. Clients have 100x more power than server.

Pre-allocate everything on serverPre-allocate everything on server preallocate memorypreallocate memory pre-open filespre-open files pre-allocate threadspre-allocate threads pre-open and authenticate clientspre-open and authenticate clients

Keep high duty-cycle on objectsKeep high duty-cycle on objects (re-use them)(re-use them) Pool threads, not one per clientPool threads, not one per client

Classic example: Classic example: TPC-C benchmarkTPC-C benchmark 2 processes2 processes

everything pre-allocatedeverything pre-allocated

7,000 clients

IIS SQL

Pool ofDBC linksHTTP

N clients x N Servers x F files =N x N x F file opens!!!

IE

Page 45: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

45

Classic Three-Tier Example Classic Three-Tier Example TPC-C TPC-C

Transaction Processing Transaction Processing Performance Council (TPC): Performance Council (TPC): standard performance benchmarksstandard performance benchmarks

5 transaction types5 transaction types order entry , payment , status (oltp)order entry , payment , status (oltp) delivery (mini-batch)delivery (mini-batch) restock (mini-DSS)restock (mini-DSS)

Metrics: Metrics: Throughput, Price/PerformanceThroughput, Price/Performance

Shows best practices:Shows best practices: everyone three tiereveryone three tier 2 processes at server2 processes at server everything pre-allocatedeverything pre-allocated

HT

TP

HT

TP

OD

BC

OD

BC

SQL SQL

IISIIS= Web= Web

7,000 Web clients7,000 Web clients

Pool ofDBC links

Page 46: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

46

Classic MistakesClassic Mistakes Thread per terminalThread per terminal

fix: DB server thread poolsfix: DB server thread poolsfix: server poolsfix: server pools

Process per request (CGI)Process per request (CGI)fix: ISAPI & NSAPI DLLs fix: ISAPI & NSAPI DLLs fix: connection poolsfix: connection pools

Many messages per operationMany messages per operationfix: stored proceduresfix: stored proceduresfix: server-side objectsfix: server-side objects

File open per requestFile open per requestfix: cache hot filesfix: cache hot files

Page 47: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

47

OutlineOutline Why DistributedWhy Distributed

Distributed data & objectsDistributed data & objects

Distributed executionDistributed execution

Three tier architecturesThree tier architectures why: manageability & performancewhy: manageability & performance what: server side workflows & objectswhat: server side workflows & objects

Transaction conceptsTransaction concepts Why transactions?Why transactions? Using transactionsUsing transactions Two Phase CommitTwo Phase Commit How transactions?How transactions?

Page 48: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

48

ThesisThesis Transactions are key to Transactions are key to

structuring distributed applications structuring distributed applications ACID properties easeACID properties ease

exception handlingexception handling AtomicAtomic: all or nothing: all or nothing ConsistentConsistent: state transformation: state transformation IsolatedIsolated: no concurrency anomalies: no concurrency anomalies DurableDurable: committed transaction effects persist: committed transaction effects persist

Page 49: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

49

What Is A Transaction?What Is A Transaction?

Programmer’s view: Programmer’s view: Bracket a collection of actionsBracket a collection of actions

A A simplesimple failure model failure model Only two outcomes:Only two outcomes:

Begin()Begin() actionaction actionaction actionaction actionactionCommit()Commit()

Success!Success!

Begin()Begin()action action actionactionactionactionRollback()Rollback()

Begin()Begin()action action actionactionactionaction

Rollback()Rollback()

Failure!Failure!

Fail !Fail !Fail !Fail !

Page 50: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

50

Why Bother: Atomicity?Why Bother: Atomicity?

RPC semantics:RPC semantics: At most once: try one time At most once: try one time

At least once: keep trying At least once: keep trying ’till acknowledged’till acknowledged

Exactly once: keep trying Exactly once: keep trying ’till acknowledged and server’till acknowledged and serverdiscards duplicate requestsdiscards duplicate requests

???

Page 51: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

51

Why Bother: Atomicity?Why Bother: Atomicity? Example: insert record in fileExample: insert record in file

At most onceAt most once: time-out means “maybe”: time-out means “maybe” At least onceAt least once: retry may get “duplicate” error : retry may get “duplicate” error

or retry may do second insertor retry may do second insert Exactly onceExactly once: you do not have to worry: you do not have to worry

What if operation involvesWhat if operation involves Insert several records? Insert several records? Send several messages?Send several messages?

Want ALL or NOTHING for group of actionsWant ALL or NOTHING for group of actions

Page 52: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

52

Why Bother: ConsistencyWhy Bother: Consistency

Begin-Commit brackets a set of operationsBegin-Commit brackets a set of operations You can violate consistency inside bracketsYou can violate consistency inside brackets

Debit but not credit (destroys money)Debit but not credit (destroys money) Delete old file before create new file in a copyDelete old file before create new file in a copy Print document before delete from spool queuePrint document before delete from spool queue

Begin and commit are points of consistencyBegin and commit are points of consistency

State transformationsState transformationsnew state under constructionnew state under construction

Be

gin

Be

gin

Co

mm

itC

om

mit

Page 53: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

53

Why Bother: Isolation Why Bother: Isolation

Running programs concurrentlyRunning programs concurrentlyon same data can createon same data can createconcurrency anomaliesconcurrency anomalies The shared checking account exampleThe shared checking account example

Programming is hard enough without Programming is hard enough without having to worry about concurrencyhaving to worry about concurrency

Begin()Begin() read BALread BAL add 10add 10 write BALwrite BALCommit()Commit()

Bal = 100Bal = 100

Bal = 70Bal = 70

Bal = 110Bal = 110

Bal = 100Bal = 100

Begin()Begin() read BALread BAL Subtract 30Subtract 30 write BALwrite BALCommit()Commit()

Page 54: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

54

IsolationIsolation

It is as though programs run one at a timeIt is as though programs run one at a time No concurrency anomaliesNo concurrency anomalies

System automatically protects applicationsSystem automatically protects applications Locking (DB2, Informix, MicrosoftLocking (DB2, Informix, Microsoft® ® SQL ServerSQL Server™™, ,

Sybase…)Sybase…) Versioned databases (Oracle, Interbase…)Versioned databases (Oracle, Interbase…)

Begin()Begin() read BALread BAL add 10add 10 write BALwrite BALCommit()Commit()

Bal = 100Bal = 100

Bal = 110Bal = 110

Bal = 80Bal = 80

Bal = 110Bal = 110

Begin()Begin() read BALread BAL Subtract 30Subtract 30 write BALwrite BALCommit()Commit()

Page 55: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

55

Why Bother: DurabilityWhy Bother: Durability Once a transaction commits,Once a transaction commits,

want effects to survive failureswant effects to survive failures Fault tolerance:Fault tolerance:

old master-new master won’t work: old master-new master won’t work: Can’t do daily dumps: Can’t do daily dumps:

would lose recent workwould lose recent work Want “continuous” dumpsWant “continuous” dumps

Redo “lost” transactions Redo “lost” transactions in case of failurein case of failure

Resend unacknowledged messages Resend unacknowledged messages

Page 56: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

56

Why ACID For Why ACID For Client/Server And DistributedClient/Server And Distributed

ACID is important for centralized systemsACID is important for centralized systems Failures in centralized systems are simplerFailures in centralized systems are simpler In distributed systems:In distributed systems:

More and more-independent failuresMore and more-independent failures ACID is harder to implementACID is harder to implement

That makes it even MORE IMPORTANTThat makes it even MORE IMPORTANT Simple failure modelSimple failure model Simple repair modelSimple repair model

Page 57: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

57

ACID GeneralizationsACID Generalizations Taxonomy of actions Taxonomy of actions

Unprotected: not undone or redoneUnprotected: not undone or redone Temp filesTemp files

Transactional: can be undone before commitTransactional: can be undone before commit Database and message operationsDatabase and message operations

Real: cannot be undoneReal: cannot be undone Drill a hole in a piece of metal,Drill a hole in a piece of metal,

print a checkprint a check

Nested transactions: subtransactionsNested transactions: subtransactions Work flow: long-lived transactionsWork flow: long-lived transactions

Page 58: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

58

OutlineOutline Why DistributedWhy Distributed Distributed data & objectsDistributed data & objects Distributed executionDistributed execution Three tier architecturesThree tier architectures Transaction conceptsTransaction concepts

Why transactions?Why transactions? ACID: atomic, consisistent, isolated, durableACID: atomic, consisistent, isolated, durable

Using transactionsUsing transactions programmingprogramming save pointssave points nested, chainednested, chained workflowworkflow

Two Phase CommitTwo Phase Commit How transactions?How transactions?

Page 59: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

59

Programming & TransactionsProgramming & TransactionsThe Application ViewThe Application View

You Start You Start (e.g. in TransactSQL)(e.g. in TransactSQL):: Begin [Distributed] Transaction <name>Begin [Distributed] Transaction <name> Perform actionsPerform actions Optional Save Transaction <name>Optional Save Transaction <name> Commit or RollbackCommit or Rollback

You Inherit a XIDYou Inherit a XID Caller passes you a transactionCaller passes you a transaction You return or Rollback.You return or Rollback. You can Begin / Commit sub-trans.You can Begin / Commit sub-trans. You can use save pointsYou can use save points

Begin

Commit

Begin

RollBack

Return

RollBackReturn

XID

Page 60: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

60

Transaction Save PointsTransaction Save PointsBacktracking within a transactionBacktracking within a transaction

Allows app to Allows app to cancel parts of a cancel parts of a transaction prior transaction prior to committo commit

This is in most This is in most SQL productsSQL products((save transactionsave transaction in MS SQL Server) in MS SQL Server)

BEGIN WORK:1

SAVE WORK:2

action

action

action

action

action

action

action

SAVE WORK:4

SAVE WORK:3

ROLLBACK WORK(2)

SAVE WORK:5

action

action

SAVE WORK:6

action

SAVE WORK:7

action

action

action

action

ROLLBACK WORK(7)

SAVE WORK:8

action

action

action

COMMIT WORK

Page 61: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

61

Chained TransactionsChained Transactions Commit of T1 implicitly begins T2. Carries context forward to next transaction

cursorscursors lockslocks other stateother state

Transaction #1 Transaction #2Commit

Begin

Processingcontext

establishedestablished

Processingcontextusedused

Page 62: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

62

Nested TransactionsNested TransactionsGoing Beyond Flat TransactionsGoing Beyond Flat Transactions

Need transactions within transactionsNeed transactions within transactions Sub-transactions commit only if root doesSub-transactions commit only if root does Only root commit is durable.Only root commit is durable. Subtransactions may rollbackSubtransactions may rollback

if so, all its subtransactions rollbackif so, all its subtransactions rollback Parallel version of nested transactionsParallel version of nested transactions

T1

T114

T113

T112T111

T11

T123T122T121T12

T131 T132 T133T13

Page 63: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

63

Workflow: Workflow: A Sequence of TransactionsA Sequence of Transactions

Application transactions are multi-stepApplication transactions are multi-step order, build, ship & invoice, reconcileorder, build, ship & invoice, reconcile

Each step is an ACID unitEach step is an ACID unit Workflow is a script describing stepsWorkflow is a script describing steps Workflow systems Workflow systems

Instantiate the scriptsInstantiate the scripts Drive the scriptsDrive the scripts Allow query against scriptsAllow query against scripts

Examples Examples Manufacturing Work In Process (WIP)Manufacturing Work In Process (WIP)Queued processingQueued processingLoan application & approval,Loan application & approval,Hospital admissions…Hospital admissions…

Database

Business Objects

workflow

Presentation

Page 64: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

64

Workflow ScriptsWorkflow Scripts Workflow scripts are programsWorkflow scripts are programs

(could use VBScript or JavaScript)(could use VBScript or JavaScript)

If step fails, compensation action handles errorIf step fails, compensation action handles error Events, messages, time, other steps cause step.Events, messages, time, other steps cause step. Workflow controller drives flowsWorkflow controller drives flows

Source

Step

branchcase

fork

loopCompensationAction

join

Page 65: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

65

Workflow and ACIDWorkflow and ACID Workflow is not Atomic or IsolatedWorkflow is not Atomic or Isolated Results of a step visible to allResults of a step visible to all Workflow is Consistent and DurableWorkflow is Consistent and Durable Each flow may take hours, weeks, monthsEach flow may take hours, weeks, months Workflow controller Workflow controller

keeps flows movingkeeps flows moving maintains context (state) for each flowmaintains context (state) for each flow provides a query and operator interfaceprovides a query and operator interface

e.g.: “what is the status of Job # 72149?”e.g.: “what is the status of Job # 72149?”

Page 66: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

66

ACID Objects Using ACID DBsACID Objects Using ACID DBsThe easy way to build transactional objectsThe easy way to build transactional objects

Application uses transactional objectsApplication uses transactional objects(objects have ACID properties)(objects have ACID properties)

If object built on top of ACID objects, If object built on top of ACID objects, then object is ACID.then object is ACID. Example: New, EnQueue, DeQueue Example: New, EnQueue, DeQueue

on top of SQLon top of SQL

SQL provides ACIDSQL provides ACID

SQL

Business Object: Customer

Business Object Mgr: CustomerMgr

SQL

dim c as Customerdim CM as CustomerMgr...set C = CM.get(CustID)...C.credit_limit = 1000...CM.update(C, CustID)..Persistent Programming languages automate this.

Page 67: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

67

ACID Objects From Bare Metal ACID Objects From Bare Metal The Hard Way to Build Transactional ObjectsThe Hard Way to Build Transactional Objects

Object Class is a Object Class is a Resource Manager (RM)Resource Manager (RM) Provides ACID objects from persistent storageProvides ACID objects from persistent storage Provides Undo (on rollback)Provides Undo (on rollback) Provides Redo (on restart or media failure)Provides Redo (on restart or media failure) Provides Isolation for concurrent opsProvides Isolation for concurrent ops

Microsoft SQL Server, IBM DB2, Oracle,…Microsoft SQL Server, IBM DB2, Oracle,…are Resource managers.are Resource managers.

Many more coming.Many more coming. RM implementation techniques described laterRM implementation techniques described later

Page 68: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

68

OutlineOutline Why DistributedWhy Distributed Distributed data & objectsDistributed data & objects Distributed executionDistributed execution Three tier architecturesThree tier architectures Transaction conceptsTransaction concepts

Why transactions?Why transactions?

Using transactionsUsing transactions programmingprogramming save pointssave points nested, chainednested, chained workflowworkflow

Two Phase CommitTwo Phase Commit

Prepare and commit phasesPrepare and commit phases Transaction & Resource ManagersTransaction & Resource Managers

How transactions?How transactions?

Page 69: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

69

Transaction ManagerTransaction Manager Transaction Manager (TM): Transaction Manager (TM):

manages transaction objects.manages transaction objects. XID factoryXID factory tracks themtracks them coordinates themcoordinates them

App gets XID from TMApp gets XID from TM Transactional RPC Transactional RPC

passes XID on all callspasses XID on all calls manages XID inheritancemanages XID inheritance

TM manages commit & rollbackTM manages commit & rollback

AppAppRMRM

TMTM

begin

XID

call(..XID)

enlist

Page 70: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

70

TM Two-Phase CommitTM Two-Phase CommitDealing with multiple RMsDealing with multiple RMs

If all use one RM, then all or none commitIf all use one RM, then all or none commit If multiple RMs, then need coordinationIf multiple RMs, then need coordination Standard technique:Standard technique:

Marriage: Do you? I do. I pronounce…KissMarriage: Do you? I do. I pronounce…Kiss Theater: Ready on the set? Ready! Action! ActTheater: Ready on the set? Ready! Action! Act Sailing: Ready about? Ready! Helm’s a-lee! TackSailing: Ready about? Ready! Helm’s a-lee! Tack Contract law: Escrow agentContract law: Escrow agent

Two-phase commit:Two-phase commit: 1. Voting phase: can you do it?1. Voting phase: can you do it? 2. If all vote yes, then commit phase: do it! 2. If all vote yes, then commit phase: do it!

Page 71: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

71

Two-Phase Commit In PicturesTwo-Phase Commit In Pictures Transactions managed by TM Transactions managed by TM App gets unique ID (XID) from TM at App gets unique ID (XID) from TM at

Begin()Begin() XID passed on Transactional RPCXID passed on Transactional RPC RMs Enlist when first do work on XIDRMs Enlist when first do work on XID

AppApp RM1RM1

TMTM

RM2RM2

Begin

XID

Call(..XID..)

Enlist

Call(..XID..)

Enlist

Page 72: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

72

When App Requests CommitWhen App Requests CommitTwo Phase Commit in PicturesTwo Phase Commit in Pictures

TM tracks all RMs enlisted on an XIDTM tracks all RMs enlisted on an XID TM calls enlisted RM’s Prepared() callbackTM calls enlisted RM’s Prepared() callback If all vote yes, TM calls RM’s Commit() If all vote yes, TM calls RM’s Commit() If any vote no, TM calls RM’s Rollback()If any vote no, TM calls RM’s Rollback()

AppApp RM1RM1

TMTM

RM2RM2

1. Application requests Commit1. Application requests Commit

Commit1

Prepare

Prepare2

2

2. TM broadcasts prepared?2. TM broadcasts prepared?

YesYes

3

3

3. RMs all vote Yes3. RMs all vote Yes

4. TM decides Yes, 4. TM decides Yes, broadcastsbroadcasts

4

Comm

it

Comm

it4

5. RMs 5. RMs acknowledgeacknowledge

5

Yes5

Yesyes

6. TM says6. TM saysyesyes

Page 73: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

73

X/Open StandardizesX/Open StandardizesTwo-Phase CommitTwo-Phase Commit

TMTM

ClientClient

RMRM

CommCommmgrmgr ServerServer

CommCommmgrmgr

RMRM

TMTM

TX:TX:beginbegin

commitcommitrollbackrollback

SQLSQLoror

MTSMTSor..or..

XA+:XA+:outgoingoutgoingincomingincoming

XA:XA:enlist,enlist,

PreparePrepareCommitCommit

Standardized APIs for apps and to RMsStandardized APIs for apps and to RMsPoints to OSI/TP for interoperationPoints to OSI/TP for interoperation

CommCommmgrmgr

Page 74: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

74

How Does ThisHow Does ThisRelate To Microsoft?Relate To Microsoft?

SQL Server is transactionalSQL Server is transactional(so is Oracle, DB2, Informix, Sybase)(so is Oracle, DB2, Informix, Sybase)

MS Distributed Transaction Coordinator MS Distributed Transaction Coordinator (DTC) packaged with SQL Server, MTS, (DTC) packaged with SQL Server, MTS, and other RMsand other RMs

Connects to CICS, Encina, Topend, TuxedoConnects to CICS, Encina, Topend, Tuxedo Any RM (Any RM (SNA LU6.2, DB2, Oracle, Sybase, Informix, …)SNA LU6.2, DB2, Oracle, Sybase, Informix, …)

can participate in transactionscan participate in transactions

Page 75: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

75

OLE Transactions: the MovieOLE Transactions: the Movie

ClientClient

DtcGetTransactionManager()

TMI TransactionDispenser BeginTransaction

ITransaction GetTransactionInfo Commit Abort

Transaction

CommMgr

Resource Manageraka (sql, viper,…)

Commit / AbortCommit / Abort

Two styles:

(1) Bind an RM connection to the transaction. All work on that connection is now part of that transaction.

(2) pass transaction object on every RM call.

Not shown: client can get async notification of transaction outcome.

begincommitrollback

Page 76: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

76

OLE Transactions RM EnlistOLE Transactions RM Enlist

begincommitrollbackResource

Manageraka (sql, viper,…)

DtcGetTransactionManager()

TMIResourceManagerFactory

Create

IResourceManager Enlist ReEnlist ReEnlistmentComplete

RM

Transaction

ITransactionResourceAsync PrepareRequest CommitRequest AbortRequest TMDown

ENLIST ()!!!!

ITransactionEnlistmentAsync PrepareReqDone CommitReqDone AbortReqDone

Enlistment

RM registers with TM

RM Enlists intransaction (provides callbacks)

Page 77: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

77

begincommitrollbackResource

Manageraka (sql, viper,…)

Transaction

COMMIT

OLE Transactions RM CommitOLE Transactions RM Commit

TM

Two phase commit

Enlisted RMs get prepare & commit callbacks

Abort callbacks are similar

ITransactionResourceAsync PrepareRequest CommitRequest AbortRequest TMDown

ITransactionEnlistmentAsync PrepareReqDone CommitReqDone AbortReqDone

Enlistment

Page 78: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

78

OutlineOutline Why DistributedWhy Distributed Distributed data & objectsDistributed data & objects Distributed executionDistributed execution Three tier architecturesThree tier architectures Transaction conceptsTransaction concepts

Why transactions?Why transactions?

Using transactionsUsing transactions Two Phase CommitTwo Phase Commit

Prepare and commit phasesPrepare and commit phases Transaction and Resource ManagersTransaction and Resource Managers

How transactions?How transactions? logginglogging locking or versioninglocking or versioning

Page 79: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

79

Implementing TransactionsImplementing Transactions

AtomicityAtomicity The DO/UNDO/REDO protocolThe DO/UNDO/REDO protocol IdempotenceIdempotence Two-phase commitTwo-phase commit

DurabilityDurability Durable logsDurable logs Force at commitForce at commit

IsolationIsolation Locking or versioningLocking or versioning

Page 80: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

80

Each action generates a log recordEach action generates a log record

Has an UNDO action Has an UNDO action

Has a REDO actionHas a REDO action

DO/UNDO/REDODO/UNDO/REDO

New stateNew stateOld stateOld state

DODO

Log Log

New stateNew state Old stateOld state

UNDOUNDO

Log Log

New stateNew stateOld stateOld state

REDOREDO

Log Log

Page 81: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

81

What Does A Log Record What Does A Log Record Look Like?Look Like?

Log record has Log record has Header (transaction ID, timestamp… )Header (transaction ID, timestamp… ) Item IDItem ID Old valueOld value New valueNew value

For messages: just message textFor messages: just message textand sequence #and sequence #

For records: old and new valueFor records: old and new valueon updateon update

Keep records small Keep records small

? Log ? ? Log ?

Page 82: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

82

Transaction Is A Transaction Is A Sequence Of ActionsSequence Of Actions

Each action changes stateEach action changes state Changes database Changes database Sends messagesSends messages Operates a display/printer/drill pressOperates a display/printer/drill press

Leaves a log trailLeaves a log trail New stateNew stateOld stateOld state

DODO

Log Log

Log Log

New stateNew stateOld stateOld state

DODO

DODO

Log Log

New stateNew stateOld stateOld state

Old stateOld state

DODO

Log Log

New stateNew state

Page 83: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

83

Transaction UNDO Is EasyTransaction UNDO Is Easy

Read log backwardsRead log backwards UNDO one step at a timeUNDO one step at a time Can go half-way back toCan go half-way back to

get nested transactionsget nested transactions

New stateNew stateOld stateOld state

UNDOUNDO

Log Log

Log Log

Old stateOld state

UNDOUNDO

New stateNew state

UNDOUNDO

Log Log

Old stateOld state New stateNew state

Old stateOld state

UNDOUNDO

Log Log

New stateNew state

New stateNew stateOld stateOld state

UNDOUNDO

Log Log

Log Log

Old stateOld state

UNDOUNDO

New stateNew state

UNDOUNDO

Log Log

Old stateOld state New stateNew state

New stateNew stateOld stateOld state

UNDOUNDO

Log Log

Log Log

Old stateOld state

UNDOUNDO

New stateNew state

New stateNew stateOld stateOld state

UNDOUNDO

Log Log

Page 84: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

84

Durability: Protecting The LogDurability: Protecting The Log

When transaction commitsWhen transaction commits Put its log in a durable place (duplexed disk)Put its log in a durable place (duplexed disk) Need log to redo transaction Need log to redo transaction

in case of failurein case of failure System failure: lostSystem failure: lost

in-memory updatesin-memory updates Media failure (lost disk)Media failure (lost disk)

This makes transaction durableThis makes transaction durable Log is sequential fileLog is sequential file

Converts random IO to single sequential IOConverts random IO to single sequential IO See NTFS or newer UNIX file systemsSee NTFS or newer UNIX file systems

Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log Log

Writ

eW

rite

Page 85: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

85

Recovery After ASystem FailureRecovery After ASystem Failure During normal processing, During normal processing,

write checkpoints on non-volatile storagewrite checkpoints on non-volatile storage When recovering from a system failure…When recovering from a system failure…

return to the checkpoint statereturn to the checkpoint state Reapply log of all committed transactionsReapply log of all committed transactions Force-at-commit insures log will survive restartForce-at-commit insures log will survive restart

Then UNDO all uncommitted transactionsThen UNDO all uncommitted transactionsNew stateNew stateOld stateOld state

REDOREDO

Log Log

New stateNew stateOld stateOld state

REDOREDO

Log Log

REDOREDO

New stateNew stateOld stateOld state

Log Log Log Log

Old stateOld state

REDOREDO

New stateNew state

Page 86: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

86

IdempotenceIdempotenceDealing with failureDealing with failure

What if fail during restart? What if fail during restart? REDO many timesREDO many times

What if new state not around at restart?What if new state not around at restart? UNDO something not doneUNDO something not done

Old stateOld state

REDOREDO

New stateNew state

Log Log Log Log

REDOREDO

New stateNew state New stateNew state

UNDOUNDO

Old stateOld state

Log Log Log Log

UNDOUNDO

Old stateOld state

Page 87: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

87

IdempotenceIdempotenceDealing with failureDealing with failure

Solution: make F(F(x))=F(x) (idempotence)Solution: make F(F(x))=F(x) (idempotence) Discard duplicates Discard duplicates

Message sequence numbers Message sequence numbers to discard duplicatesto discard duplicates

Use sequence numbers on pages to detect stateUse sequence numbers on pages to detect state (Or) make operations idempotent (Or) make operations idempotent

Move to position x, write value V to byte B…Move to position x, write value V to byte B…

Old stateOld state

REDOREDO

New stateNew state

Log Log Log Log

REDOREDO

New stateNew state New stateNew state

UNDOUNDO

Old stateOld state

Log Log Log Log

UNDOUNDO

Old stateOld state

Page 88: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

88

RecapRecap

ACID makes it easy to program ACID makes it easy to program distributed applicationsdistributed applications

DO/UNDO/REDO + logDO/UNDO/REDO + logallows atomicityallows atomicity

Multiple logs need two-phase commitMultiple logs need two-phase commit Persistent log gives durabilityPersistent log gives durability

Recover from system failureRecover from system failure Recover from media failureRecover from media failure

Page 89: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

89

OutlineOutline Why DistributedWhy Distributed Distributed data & objectsDistributed data & objects Distributed executionDistributed execution Three tier architecturesThree tier architectures Transaction conceptsTransaction concepts

Why transactions?Why transactions?

Using transactionsUsing transactions Two Phase CommitTwo Phase Commit How transactions?How transactions?

logginglogging locking or versioninglocking or versioning

Page 90: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

90

Concurrency ControlConcurrency ControlLockingLocking

How to automatically preventHow to automatically preventconcurrency bugs?concurrency bugs?

Serialization theorem:Serialization theorem: If you lock all you touch and hold to commit: If you lock all you touch and hold to commit:

no bugsno bugs If you do not follow these rules, you may see bugsIf you do not follow these rules, you may see bugs

Automatic Locking:Automatic Locking: Set automatically (well-formed)Set automatically (well-formed) Released at commit/rollback (two-phase locking)Released at commit/rollback (two-phase locking)

Greater concurrency for locks:Greater concurrency for locks: Granularity: objects or containers or serverGranularity: objects or containers or server Mode: shared or exclusive or…Mode: shared or exclusive or…

Page 91: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

91

Reduced Isolation LevelsReduced Isolation Levels

It is possible to lock less and risk fuzzy dataIt is possible to lock less and risk fuzzy data Example: want statistical summary of DB Example: want statistical summary of DB

But do not want to lock whole databaseBut do not want to lock whole database

Reduced levels:Reduced levels: Repeatable Read: may see fuzzy inserts/deleteRepeatable Read: may see fuzzy inserts/delete

But will serialize all updatesBut will serialize all updates Read Committed: see only committed dataRead Committed: see only committed data Read Uncommitted: may see uncommitted updatesRead Uncommitted: may see uncommitted updates

Page 92: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

92

MultiversionMultiversionConcurrency ControlConcurrency Control

Run transaction at some timestamp in the pastRun transaction at some timestamp in the past No locking needed, No locking needed,

reconstruct “old” state from logreconstruct “old” state from log Add in your transaction’s updatesAdd in your transaction’s updates At commit assure updates do not collide with At commit assure updates do not collide with

other committed transactionsother committed transactions Almost as good as serializableAlmost as good as serializable

(only obscure bugs)(only obscure bugs)

Page 93: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

93

SummarySummary

ACID eases error handling ACID eases error handling Atomic: all or nothingAtomic: all or nothing Consistent: correct transformationConsistent: correct transformation Isolated: no concurrency bugsIsolated: no concurrency bugs Durable: survives failuresDurable: survives failures

Allows you to build robust Allows you to build robust distributed applicationsdistributed applications

ACID becoming standard part of systemsACID becoming standard part of systems It’s realIt’s real

Page 94: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

94

OutlineOutline

Why DistributedWhy Distributed Distributed Distributed

data & objectsdata & objects Distributed executionDistributed execution Three tier architecturesThree tier architectures Transaction conceptsTransaction concepts

2-Tier

3-Tier

Acid

Atomic

Autonomy

Commit

Consistent

Delegation

Durable

Fat Client

Idempotent

Isolated

Lock

Log

ORB

Partitioned Data

Queue

Queued or Direct

Replicated Data

Resource Manager

Rollback (Abort)

RPC

Serializable

Server Pool

Thin Client

Transaction Manager

Two Phase Commit

Undo/Redo

Update Anywhere

Workflow

XID

Queue

Queued or Direct

Replicated Data

Resource Manager

Rollback (Abort)

RPC

Serializable

Server Pool

Thin Client

Transaction Manager

Two Phase Commit

Undo/Redo

Update Anywhere

Workflow

XID

Page 95: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

95

ReferencesReferences Essential Client/Server Survival Guide 2nd ed.Essential Client/Server Survival Guide 2nd ed.

Orfali, Harkey & Edwards, J. Wiley, 1996Orfali, Harkey & Edwards, J. Wiley, 1996

Principles of Transaction ProcessingPrinciples of Transaction Processing Bernstein & Newcomer, Morgan Kaufmann, 1997Bernstein & Newcomer, Morgan Kaufmann, 1997

Transaction Processing ConceptsTransaction Processing Conceptsand Techniquesand Techniques Gray & Reuter, Morgan Kaufmann, 1993Gray & Reuter, Morgan Kaufmann, 1993

Page 96: 1 Fundamentals of Distributed Systems. Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com Prof. Andreas Reuter Professor U. Stuttgart Reuter@Informatik.uni-stuttgart.de.

96


Top Related