CEG7380 Cloud ComputingLecture 1
Keke Chen
Outline Syllabus
Scope of this course Tentative schedule Prerequisites Resources Assignments
Introduction
Scope of this course Understand the basic ideas of cloud
computing Get familiar with
Tools Systems
Expose to some research topics
Two major parts: Processing large data with the cloud Scaling up/down web applications
with the cloud
Note: some programming parts need self-study
Prerequisites Some programming skills
Java, python, shell Comfortable with learning new
programming frameworks
Sufficient knowledge about Data structure and databases Operating systems Distributed systems
Assignments and Grading Reading papers (~3) (10%) Some miniprojects (4~5) (60%)
Help you master the concepts Learn to use tools and systems
Self-motivated research projects are strongly encouraged!
Final exam (20%) Class attendance and discussion
(10%)
Resources updated reference list Inhouse hadoop cluster AWS access
coupon code for each student
Pilot Submitting reading assignments and
projects
Tentative Schedule Parallel data processing
Distributed file systems (GFS, HDFS) MapReduce High-level distributed data management
Cloud infrastructures Virtualization AWS and Eucalyptus Interactive front-end – Google App Engine
Cloud security and privacy Research topics
In projects, we will learn to use Hadoop Mapreduce, Pig Latin AWS google app engine
Cloud Computinglecture 1-2
Some slides are borrowed from UC Berkeley RAD Lab
Keke Chen
Outline What is cloud computing? Why now? Cloud killer applications Cloud economics Challenges and opportunities
“above the cloud” “Clairemont Report”
What is Cloud Computing?
Old idea: Software as a Service (SaaS) Def: delivering applications over the
Internet Recently: “[Hardware, Infrastrucuture,
Platform] as a service”
Utility Computing: pay-as-you-use computing Illusion of infinite resources No up-front cost Fine-grained billing (e.g. hourly)
12
Cloud computing vs. grid computing Cloud computing = virtualization+
grid + services + utility computing Grid computing: resource provisioning,
load balancing, parallel processing
Views of different users System admin/hadoop users: grid Application owners/service users:
service, utility
Users and cloud providers
Why Now?
Experience with very large datacenters – profitable for cloud providers economics of scale Pervasive broadband Internet Fast x86 virtualization Pay-as-you-go billing model
Large user base Online payment Online Ads Content distribution Web 2.0 lowers the entry point to e-business
more small e-business owners Large user base of clouds
15
Spectrum of Clouds
Instruction Set VM (Amazon EC2, 3Tera)
Bytecode VM (Microsoft Azure) Framework VM
Google AppEngine, Force.com
EC2 Azure AppEngine Force.com
Lower-level,Less management
Higher-level,More management
16
Cloud Killer Apps
Mobile and web applications Batch processing / MapReduce
Data analytics (big data) E.g., OLAP, data mining, machine learning
Extensions of desktop software Matlab, Mathematica
17
Unused resources
Cloud Economics
• Pay by use instead of provisioning for peak
Static data center Data center in the cloud
Demand
Capacity
Time
Demand
Capacity
Time
18
Unused resources
Economics of Cloud Users
• Risk of over-provisioning: underutilization
Static data center
Demand
Capacity
Time
19
Economics of Cloud Users
• Heavy penalty for under-provisioning
Lost revenue
Lost users
Demand
Capacity
Time (days)1 2 3
Demand
Capacity
Time (days)1 2 3
Demand
Capacity
Time (days)1 2 3
20
Economics of Cloud Providers
5-7x economies of scale [Hamilton 2008]
Extra benefits Amazon: utilize off-peak capacity Microsoft: sell .NET tools Google: reuse existing infrastructure
ResourceCost in
Medium DCCost in
Very Large DC Ratio
Network $95 / Mbps / month $13 / Mbps / month 7.1x
Storage $2.20 / GB / month $0.40 / GB / month 5.7x
Administration ≈140 servers/admin >1000 servers/admin 7.1x
21
Adoption Challenges
Challenge Opportunity
Availability Multiple providers & DCs
Data lock-in Standardization
Data Confidentiality, Auditability, and privacy
Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing
22
Growth Challenges
Challenge Opportunity
Data transfer bottlenecks
FedEx-ing disks, Data Backup/Archival
Performance unpredictability
Improved VM support, flash memory, scheduling VMs
Scalable storage Invent scalable store
Bugs in large distributed systems
Invent Debugger that relies on Distributed VMs
Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots
23
Policy and Business Challenges
Challenge Opportunity
Reputation Fate Sharing Offer reputation-guarding services like those for email
Software Licensing Pay-for-use licenses; Bulk use sales
24
Research Challenges Mentioned by Database Community (Claremont
Report)
Functionality and operational cost Background: compare massive-scale
data intensive computing systems with today’s DBMS
Limited functionality Simple APIs (e.g. mapreduce) Pushes more burden on developers
Benefits Easier to manage Lower operational cost Service Level Agreement (SLA) that is hard
to provide for a SQL DBMSP.S. DB Systems are notorious for their expenses in
installation and maintenance.
Manageability Features of cloud systems
Limited human intervention High variance workloads A variety of shared infrastructures No DBAs or Administrators to assist developers
Systems need to do work automatically Self-managing Adaptive (autonomous) computing
Data security and privacy Users sharing physical resources in a
cloud Protect from each other (security) Protect from curious cloud providers
(privacy)
Successes may depend on specific target usage scenarios Examples
Query based services Mining based services
Datasets over multiple clouds Interesting datasets might be
available in different clouds Different cloud providers Private or public clouds
Services mashing up datasets Inevitably crossing clouds
Federated cloud architectures
Algorithms on Big data Working on “Big Data”
Data mining Machine learning Visualization
Traditionally assume data is in flat files or relational databases
Distributed data organization puts new challenges Redesign algorithms Redesign frameworks