Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | stephany-ursula-logan |
View: | 216 times |
Download: | 2 times |
Distributed systems [Fall 2014]
G22.3033-002
Lec 1: Course Introduction
Waitlist status
• Course admittance priority: Ph.D., M.S.
• If you are not going to take the class, drop early to let others in
Class staff
• Instructor: Prof. Jinyang Li (me)– [email protected]– Office Hour: Wed 4-5pm (715 Bway Rm 708)
• Instructional Assistant: Yang Cui– [email protected]– Office Hour: Thu 4-5pm (715 Bway Rm 707)
Background
• What I assume you already know:– OS organization– Programming experience in C or C++– Concurrency and threading– Programming w/ sockets, TCP/IP
Course readings• No official textbook• Lectures are based on research papers
– Check webpage for schedules
• Useful reference books– Principles of Computer System Design. (Saltzer and
Kaashoek)– Distributed Systems (Tanenbaum and Steen)– Advanced Programming in the UNIX environment
(Stevens)– UNIX Network Programming (Stevens)
Meeting times & Lecture structure
• Tuesdays 5:10-7pm – With a 10-minute break in the middle
• Lecture will do basic concepts followed by paper discussion– Read assigned papers before lecture
• Sometimes instructional assistant will do a 30-min discussion on labs.
Important addresses
• URL: http://www.news.cs.nyu.edu/~jinyang/fa14-ds– Check regularly for schedule
• We’ll use Piazza.com for making announcements and conducting discussion
How are you evaluated?
• Participation 10%• Labs 40%• Quizzes 50%
– mid-term and final (90 minutes each)
Using Piazza
• Please post all questions on Piazza instead of emailing course staff
• You can make your post as either private (only staff can see it) or public (visible to the whole class)
• We encourage you to make public posts– Whole class benefits from seeing your question and its answer
Participation
• Participation is 10% of your final grade1. Paper summary submitted (before lecture) via Piazza
• Summarize the assigned paper before class– 3 things you’ve learnt from the paper– 1 weakness of the paper– Answer the assigned question (if there’s any)
2. In class participation3. Piazza discussion
• Asking questions and answering others’ questions
Questions?
What are distributed systems?
• Examples?
Multiple hosts
A local or wide area network
Machines communicate to provide some service for applications
Why distributed systems?for ease-of-use
• Handle geographic separation
• Provide users (or applications) with location transparency:– Web: access information with a few “clicks”– Network file system: access files on remote
servers as if they are on a local disk, share files among multiple computers
Why distributed systems?for availability
• Build a reliablesystem out of unreliable parts– Hardware can fail: power outage, disk failures,
memory corruption, network switch failures…– Software can fail: bugs, mis-configuration,
upgrade …– How to achieve 0.99999 availability?
Why distributed systems?for scalable capacity
• Aggregate resources of many computers– CPU: MapReduce, Spark, Grid computing– Bandwidth: Akamai CDN, BitTorrent– Disk: Google file system, Hadoop File System
Why distributed systems?for modular functionality
• Only need to build a service to accomplish a single task well. – Authentication server– Backup server.
• Compose multiple simple services to achieve sophisticated functionality– A distributed file system: a block service + a
meta-data lookup service
The downside
A distributed system is a system in which I can’t do my work because some computer that I’ve never even heard of has failed.”
-- Leslie Lamport
• Much more complex
The important things in distributed systems design
#1 Abstraction & Interface
• Application users access your service via some interface
• An example, a storage service’s API:– File system (mkdir, readdir, write, read)– Database (create tables, SQL queries)– Disk (read block, write block)
• Conflicting goals: – simple vs. efficient to implement
#2: Fault Tolerance
• How to keep the system running when some machine is down?
• Does the system still give “correct” service?
• How to incorporate recovered machine correctly?
#3: Consistency
• Contract with apps/users about meaning of operations. Difficult due to:– Failure, multiple copies of data, concurrency
• E.g. how to keep 2 replicas “identical”– If one is down, it will miss updates– If net is broken, both might process different
updates
#4 Performance• Latency & Throughput• To increase throughput, exploit parallelism
– Many resources exist in multiples• CPU cores, IO and CPU
• To reduce latency, – Figure out what takes time: queuing, network,
storage, some expensive algorithm, many serial steps?
• How much performance is enough?