Introduction to Distributed Programming Per Brand.

Introduction to Distributed Programming

Per Brand

Introduction

• Global distributed computing needs an infra structure.

• The Internet provides the first steps towards a global distributed applications: – a global namespace (URLs)

– a global communications protocol (TCP/IP).

• Platforms such as Java and CORBA that take advantage of this infrastructure have become widely-used.

• Distributed programming is still difficult.

• Writing efficient, open, and robust distributed applications remains much harder than writing centralized applications.

• Making them secure increases the difficulty by another quantum leap.

• A distributed system is set of processes, linked by a network

• No global information, no global time• Unpredictable communication delays• Concurrency and nondeterminism• Large probability of localized faults• Easy access by unauthorized users

What are the properties of global distributed systems?

Additional Properties of the Internet

• A global network that is partitioned into several protection domains (Firewalls)

• Private sub networks with multiple reassignment of IP addresses across networks

• Dynamic reassignment of IP addresses -- ISP’s reuse a pool of IP addresses among customers

The issues in distributed programming

Functionality

Fault tolerance

Part of problem

Interaction

Distribution

Openness

Resource Control

Security

Scalability

Classical problems of software engineering,

code reuse, maintainability, etc. are all here

Distributed Programming

• Centralized programming– difficult enough– research & development for 50 years– still ongoing

• Distributed programming– in general much more difficult – why??

Adding/changing distribution

Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability

E.g. RMI -semantics

E.g. new kinds offailure

E.g. new securityconsiderations

Adding/changing distribution -2

Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

ScalabilityE.g. recovery changes


Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability E.g. persistence/errorrecovery consume

resources

E.g. security in recovery

E.g. functional operationson entities mixed with

error-recovery


Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability

E.g. further subdivision of tasks

Largest problem:Keeping needing to

comeback here

Adding/incrementing openness

Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability

E.g. more potentialsecurity problems

E.g. resource usemore unpredictable

E.g. more kindsof failure

Example: allow users to share with their buddies - programs, games,

virtual community

Adding/incrementing openness - 2

Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability E.g. resource controlconsumes resources

E.g. resource controlcode mixed with functional

code

E.g. resource overusenew kinds of faults

Levels of Difficulty-1

• Client-Server Applications – Most Internet Applications still of this type

– Client/server interface very limited and controlled

• http

• forms

– Little fault-tolerance beyond classical database transactions on server-side

– In the controlled server environment, issues of openness, security, and resource-control hardly apply

– Fixed and simple distribution

– Scalability an issue so if you can’t buy a bigger server then ...

Functionality

Fault tolerance

Distribution

Openness

Resource Control

Security

Scalability

Levels of Difficulty - 2

• Client side– Security (mobile code)

– Resource control• memory/cpu

– Orthogonal aspects from server side

Functionality

Fault tolerance

Distribution

Openness

Security

Scalability

Resource Control

Levels of Difficulty-3• Server Clusters

– Distribution and Fault-tolerance within the cluster

– Fault-tolerance simplified by the fact that there is no network partitioning within the cluster

– Distribution simplified by uniformity of cluster - latencies can almost be ignored.

– In the controlled server environment, issues of openness, security, and resource-control hardly apply.

Functionality

Fault tolerance

Distribution

Openness

Security

Scalability

Resource Control

Levels of Difficulty-4• Multi-tier server architectures

– Fault-tolerance between tiers/clusters, i.e. distributed transactions

– Latencies important, alternative service providers

– In the controlled server environment, issues of openness and resource-control hardly apply.

– Security considerations lesser because of lack of openness

Functionality

Fault tolerance

Distribution

Openness

Security

Scalability

Resource Control

Levels of Difficulty-5• Virtual Community

– End-users add services to a shared environment

– Openness with security is essential

– Resource control important - mobile code

Functionality

Fault tolerance

Distribution

Openness

Security

Scalability

Resource Control

Distributed Programming Platform - DPP

• DPPs– language/tools/implementation aimed at

providing the developer of distributed applications what he needs

– general-purpose programming system– more than just a centralized programming

system– subsumes a centralized programming system

Groping for DPPs

• RPC

• Java and offshoots– Original and Pure Java - sharing code across the net– RMI (based on RPC)– Java Enterprise Beans (within a cluster)– Object Voyager– Continually evolving

• often because of shortcomings in previous version (e.g. security manager in Java 1.1 vs 1.2)

• Corba (for interoperability too)

• Erlang

• E-language (system)

• Mozart

• What is the common element ??

• What is missing??

How to answer these Questions

• Present a vision of what DPP should be– DPP provides 3 basic properties

– The 3 basic properties are not new, only the context - analogies with programming languages used

• Examining current tools– See how they partly fulfill these goals

– Show they fall short.

• Our view - we are the beginning of DPP development

DPP for distributed global applications

• The DPP abstracts the complexity of the underlying system of connected computers

• Provides transparency/hiding (network and location) as much as possible or as much as desirable.

• Provides awareness - i.e. models the aspects of distribution that effect – performance(e.g. latency)

– reliability (e.g. partial failure)

• Provides control for tuning application with respect to fundamental tradeoffs in distributed systems– e.g. consistency protocol for state

The Network

The DPP runtime

The applications

Connected Computers

Transparent View

Machine

Application

Machine

Communication Medium

DPP

Machine

The network and individual computers are abstracted away

Programmer sees a global computation space

Awareness View

Fundamental aspects of distribution presented to the programmer as abstractly and

simply as possible

without losing necessary informationMachine

Application

Machine


Middleware

Machine

DPP DPPDPP

Control View

Machine

Application

Machine


Middleware

Machine

DPP DPPDPP

The necessary control to to tune performance available

Litmus test:

It should not be possible to improve performance by much by removing the middleware and implementing on a lower level.

Compare: high-level languages and assembler

The Three Principles in Programming Languages

• Transparency/hiding– Program constructs hide or make transparent

• memory locations

• actual machine instructions

• hardware architecture

– E.g iteration and recursion in C++

• Awareness– Programmers have a mental model of performance for logically-equivalent

program constructs

– E.g. Iteration gives better performance by orders of magnitude

• Control– So basic that we forget this.

– Consider a C++ compiled as it is today that only provided recursion.

– Slower by many orders of magnitude (memory consumption increases)

– Litmus test fails - the programmer would program in assembler instead

DPP in the broadest sense• Across the entire network, i.e. not just for server cluster

architecture– Clients, between clusters, between clusters that cross administrative

boundaries, even devices.

• General-purpose– For all types of applications

– Compare general-purpose programming languages with domain-specific ones

DPPs and programming languages

• What is the relationship between DPP and programming languages?– DPP is not another word for programming language

– A DPP subsumes, extends, and adds a new dimension to

programming languages

• Traditionally programming languages are an abstraction of a single machine.

• A DPP abstracts over a set of connected machines – still includes a set of one -

– still includes basic computation - for functionality

– it is natural to base DPPs on a existing programming language (no reinventing the wheel)

Extension

• DPPs introduces many more abstractions that are not needed in centralized programming languages, e.g.

• Failure- shared object may fail due to network partitioning, crash of other site, etc. – At the very least new exceptions

– For sophisticated fault-tolerance need to couple error recovery to object.

• Resource control - imported code

– Execute procedure with specified resource limits • Scalability - moving computations

New Dimension

• For awareness and control DPPs may need to make distinctions on program constructs: the programmer may find these – new

– artificial

– unnatural and burdensome

• Example - object (shared object)– Choice of consistency protocol- best choice for performance is

application dependent.

– Three fundamental types as developed in distributed systems• stationary

• mobile - with token protocol

• mobile - with invalidation protocol

– To fulfill control goals need all 3 kinds.

New Dimension -2

• The burden of the new distinctions is dependent on the program language base that the middleware is based upon.

• Example - object (shared object)– Stateful vs. stateless (in pure-object oriented languages) - for

efficiency across the network the platform needs to know that information is stateless.

• Stateless information can be replicated across the net

• No consistency protocol

• No infrastructure for consistency protocol.

– Synchronous vs. asynchronous

• New dimension latency.

Minimality

• Also a Distributed programming language should be as similar to a programming language as possible– without losing awareness and control.!!

• Minimal extensions, and minimal new dimensions.

The goal of a DPP-separation of aspects

Functionality

Fault tolerance

Distribution Security

Openness

Resource Control

Functionality

Distribution

Openness

Security

Resource Control

Fault tolerance

Scalability

Scalability

Date post:	26-Dec-2015
Category:	Documents
Upload:	jayson-bennett
View:	221 times
Download:	0 times

Introduction to Distributed Programming Per Brand.

Documents