Distributed systems [Fall 2014] G22.3033-002 Lec 1: Course Introduction.

Distributed systems [Fall 2014]

G22.3033-002

Lec 1: Course Introduction

Waitlist status

• Course admittance priority: Ph.D., M.S.

• If you are not going to take the class, drop early to let others in

Class staff

• Instructor: Prof. Jinyang Li (me)– [email protected]– Office Hour: Wed 4-5pm (715 Bway Rm 708)

• Instructional Assistant: Yang Cui– [email protected]– Office Hour: Thu 4-5pm (715 Bway Rm 707)

mailto:[email protected]



Background

• What I assume you already know:– OS organization– Programming experience in C or C++– Concurrency and threading– Programming w/ sockets, TCP/IP

Course readings• No official textbook• Lectures are based on research papers

– Check webpage for schedules

• Useful reference books– Principles of Computer System Design. (Saltzer and

Kaashoek)– Distributed Systems (Tanenbaum and Steen)– Advanced Programming in the UNIX environment

(Stevens)– UNIX Network Programming (Stevens)

Meeting times & Lecture structure

• Tuesdays 5:10-7pm – With a 10-minute break in the middle

• Lecture will do basic concepts followed by paper discussion– Read assigned papers before lecture

• Sometimes instructional assistant will do a 30-min discussion on labs.

Important addresses

• URL: http://www.news.cs.nyu.edu/~jinyang/fa14-ds– Check regularly for schedule

• We’ll use Piazza.com for making announcements and conducting discussion

How are you evaluated?

• Participation 10%• Labs 40%• Quizzes 50%

– mid-term and final (90 minutes each)

Using Piazza

• Please post all questions on Piazza instead of emailing course staff

• You can make your post as either private (only staff can see it) or public (visible to the whole class)

• We encourage you to make public posts– Whole class benefits from seeing your question and its answer

Participation

• Participation is 10% of your final grade1. Paper summary submitted (before lecture) via Piazza

• Summarize the assigned paper before class– 3 things you’ve learnt from the paper– 1 weakness of the paper– Answer the assigned question (if there’s any)

2. In class participation3. Piazza discussion

• Asking questions and answering others’ questions

Questions?

What are distributed systems?

• Examples?

Multiple hosts

A local or wide area network

Machines communicate to provide some service for applications

Why distributed systems?for ease-of-use

• Handle geographic separation

• Provide users (or applications) with location transparency:– Web: access information with a few “clicks”– Network file system: access files on remote

servers as if they are on a local disk, share files among multiple computers

Why distributed systems?for availability

• Build a reliablesystem out of unreliable parts– Hardware can fail: power outage, disk failures,

memory corruption, network switch failures…– Software can fail: bugs, mis-configuration,

upgrade …– How to achieve 0.99999 availability?

Why distributed systems?for scalable capacity

• Aggregate resources of many computers– CPU: MapReduce, Spark, Grid computing– Bandwidth: Akamai CDN, BitTorrent– Disk: Google file system, Hadoop File System

Why distributed systems?for modular functionality

• Only need to build a service to accomplish a single task well. – Authentication server– Backup server.

• Compose multiple simple services to achieve sophisticated functionality– A distributed file system: a block service + a

meta-data lookup service

The downside

A distributed system is a system in which I can’t do my work because some computer that I’ve never even heard of has failed.”

-- Leslie Lamport

• Much more complex

The important things in distributed systems design

#1 Abstraction & Interface

• Application users access your service via some interface

• An example, a storage service’s API:– File system (mkdir, readdir, write, read)– Database (create tables, SQL queries)– Disk (read block, write block)

• Conflicting goals: – simple vs. efficient to implement

#2: Fault Tolerance

• How to keep the system running when some machine is down?

• Does the system still give “correct” service?

• How to incorporate recovered machine correctly?

#3: Consistency

• Contract with apps/users about meaning of operations. Difficult due to:– Failure, multiple copies of data, concurrency

• E.g. how to keep 2 replicas “identical”– If one is down, it will miss updates– If net is broken, both might process different

updates

#4 Performance• Latency & Throughput• To increase throughput, exploit parallelism

– Many resources exist in multiples• CPU cores, IO and CPU

• To reduce latency, – Figure out what takes time: queuing, network,

storage, some expensive algorithm, many serial steps?

• How much performance is enough?

Date post:	29-Dec-2015
Category:	Documents
Upload:	stephany-ursula-logan
View:	216 times
Download:	2 times

Distributed systems [Fall 2014] G22.3033-002 Lec 1: Course Introduction.

Documents