S2
Big Switch: PowerBurden Iron Works
Edison Power Plant & Power Grid
S3
Big Switch: Computing
Corporate Data Center
Cloud Computing Center & Internet
PC
S4
Definition of Cloud Computing“A pool of abstracted, highly scalable, and managed compute infrastructure capable of hosting end-customer applications and billed by consumption”
- “Is Cloud Computing Ready for The Enterprise?”, Forrester Research
“A style of computing where IT-related capabilities are provided „as a service‟, allowing users to access technology-enabled services from the Internet („in the cloud‟) without knowledge of, expertise with, or control over the technology infrastructure that supports them”
- “Cloud Computing”, Wikipedia
“Cloud computing is an emerging approach to shared infrastructure in which large pools of systems are linked together to provide IT services”
- Press release on “Blue Cloud”, IBM
“A paradigm in which information is permanently stored in servers on the Internet and cached temporarily on clients that include desktops, entertainment centers, table computers, notebooks, wall computers, handhelds, etc”
-” ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing”, IEEE Internet Computing
S5
My Definition
• For companies
– Providing IT infrastructure and environment to develop/host/run services and apps, on demand, with pay-as-you-go pricing, as a service
• For end-users
– Providing resource and services to store data and run application, in various devices, anytime, anywhere, as a service
S6
Features
• Prescripted & Abstracted Infrastructure
• Fully Virtualized
• Equipped with Dynamic Infrastructure Software
• Pay by Consumption
• Free of Long-Term Contracts
• Application and OS Independent
• Free of Software or Hardware Installation
Source: “Is Cloud Computing Ready For The Enterprise”, Forrester Research
S7
Advantages
• Economies of scale• Cost
– No upfront CapEx(Capital Expenditure)– Pay-as-you-go pricing model
• Scalability– Scale capacity on demand– Handling dynamic workloads
• Productivity– Easy to use– Reduced time-to-market
• Maintenance– Easy or no management– Instant software updates
S8
The Evolution of Computing
Grid
Computing
Utility
Computing
Cloud
Computing
S9
Grid + Utility + Autonomic = Cloud Computing
• Grid Computing– A form of distributed computing whereby a "super
and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks
• Utility Computing– The packaging of computing resources, such as
computation and storage, as a metered service similar to a traditional public utility such as electricity
• Autonomic Computing– Computer systems capable of self-management
“Cloud Computing”, Wikipedia
S10
Difference from Previous Computing
Grid Computing Utility
Computing
Cloud Computing
Enterprise End-User
As a Service
S11
Trends: x-Computing
Google Trends
S12
Market Volume Estimation
Year 2011$160 billion =
$95 billion (business and productivity apps
(e-mail, office, CRM, etc.))
+$65 billion
(online advertising)
“The Cloud Wars: $100+ billion at stake”, Merrill Lynch Research note
S13
Classifying Cloud Computing
Cloud Services/Applications(Software as a Service)
Apple MobileMe, Google Apps, Nokia Ovi, Salesforce.com Apps, etc
Cloud Platform(Platform as a Service)
Google App Engine, force.com, Facebook F8, Bungee Labs, etc
Cloud Infrastructure(Infrastructure as a Service)
Amazon S3&EC2, Joyent, GoGrid, AT&T, etc
Ente
rprise
Clo
ud C
om
puting
Clo
ud C
om
puting S
oft
ware
Hadoop, 3Te
ra, Xen, VM
ware
, N
exR
VCC,
IBM
Blu
e C
loud, etc
S14
Cloud Infrastructure
• Definition
– Offering virtualized infrastructure resources such as storage, compute, and network, over Intenet for services and apps
S15
Players: Cloud Infrastructure
S16
Case Study: Amazon Web Services
Simple QueueService
Simple StorageService
Elastic ComputeCloud
Simple DB
Infrastructure as a Service
Alexa WebInfo. Service
Alexa TopSites
Alexa SiteThumbnail
Alexa Web Search Platform
Search as a Service
E-CommerceService
HistoricalPricing
Data as a Service
Mechanical Turk
People as as Service
The First & Best Cloud Computing
Cloud Infrastructure
S17
Amazon Cloud Infrastructure
S3$0.15 per GB-Month
EC2$0.10 per
Instance-Hour
SimpleDB$1.50 per GB-Month
S18
Why Amazon AWS?• online video mixing utility• 25,000 to 250,000 users in 3 days• At peak, 20,000 new users per hour• 50 to 4000 instances (servers) in 5 days• At peak, 40 new instances (servers) per hour
S19
Success Story: NY Times
• Image Processing at New York Times– Convert 11 million articles (1851-1980) of TIFF format
into PDF– Using Amazon S3 and EC2 for HW, Hadoop for SW
TIFFformat
(http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/)
S20
NY Times Architecture
TIFF Image (4TB) PDF (1.5TB)
Hadoop MapReduce
Amazon S3
AMI
Amazon EC2(100 instances)
S21
NY Times CostStorage: 5.5 TB
Data Transfer-in: 4 TB Instances: 100 X 24 hours
S3 EC2
http://calculator.s3.amazonaws.com/calc5.html
Only $ 1,465
Actually under $ 400
S22
Cloud Platform (PaaS)
• Definition– Platform offering all of the facilities required to support the
end-to-end life cycle of building and delivering web applications and services entirely available from the Internet -with no software downloads or installation for developers (from “Platform as a service”, Wikipedia)
– Also known as Cloudware
• Supporting functions– Workflow management
• Design development testing deployment hosting maintenance
– Development tool• Web GUI, Client SDK, Team collaboration, version control, developer
community facilitation
– Cloud infrastructure• Storage, computation, persistence, state management, scalability, web
service integration, database integration, security
S23
Players: Cloud Platform
S24
Case Study:Google App Engine
Run your web applications on Google's infrastructure
Offering infrastructure free
500MB Storage, 10 GB Bandwidth In&Out/day, 5 million PV/1 month
Offering python development environment
http://code.google.com/appengine/
S25
Google App Engine
1.Scalable Service Infrastructure
2. Python Runtime and APIs
3. Software Development Kit
4. Web-based Admin Console
5. Scalable Datastore
S26
Scalable Service Infra
GFS: Distributed File System
Bigtable: Distributed Data Store
Commodity PC Cluster
Python Runtime: Service Execution
Services Services ServicesServices
S27
Software Development Kit
App Engine SDK
Web Server
Python Framework
webapp, Django
API local version
datastore
Google Acount
URL Fetch
Uploader
Development, Testing
Deploy
S28
App Engine APIs
S29
Scalable Datastore
Datastore Model Class
GQL Query
S31Code: Using webappframework
from google.appengine.ext import webappfrom google.appengine.ext.webapp.util import run_wsgi_app
class MainPage(webapp.RequestHandler):def get(self):self.response.headers['Content-Type'] = 'text/plain'self.response.out.write('Hello, webapp World!')
application = webapp.WSGIApplication([('/', MainPage)],debug=True)
def main():run_wsgi_app(application)
if __name__ == "__main__":main()
application: helloworldversion: 1 runtime: python api_version: 1
handlers: - url: /.* script: helloworld.py
google_appengine/dev_appserver.py helloworld/ http://localhost:8080/
helloworld.py app.yaml
Testing the app
S32
Code: Using User Service
from google.appengine.api import usersfrom google.appengine.ext import webappfrom google.appengine.ext.webapp.util import run_wsgi_app
class MainPage(webapp.RequestHandler):def get(self):user = users.get_current_user()
if user:self.response.headers['Content-Type'] = 'text/plain'self.response.out.write('Hello, ' + user.nickname())
else:self.redirect(users.create_login_url(self.request.uri))
application = webapp.WSGIApplication([('/', MainPage)],debug=True)
def main():run_wsgi_app(application)
if __name__ == "__main__":main()
helloworld.py
S33
Code: Using Datastore
from google.appengine.ext import db
class Greeting(db.Model): // defining data modelauthor = db.UserProperty()content = db.StringProperty(multiline=True)date = db.DateTimeProperty(auto_now_add=True)
class Guestbook(webapp.RequestHandler): // storing datadef post(self):greeting = Greeting()if users.get_current_user():greeting.author = users.get_current_user()
greeting.content = self.request.get('content')greeting.put()self.redirect('/')
class MainPage(webapp.RequestHandler): // querying datadef get(self):greetings = db.GqlQuery("SELECT * FROM Greeting ORDER BY date DESC LIMIT 10")for greeting in greetings:if greeting.author:
helloworld.py
S34
Code: Uploading the App
1. Signing in to App Engine: http://appengine.google.com/ 2. Creating an App http://application-id.appspot.com/
Registering the app
appcfg.py update helloworld/ http://application-id.appspot.com
Uploading the app
S35
Cloud Software
• Definition
– Software to help building and running cloud computing service and environment
• There are too many softwares
– Especially, open source cloud software
– LAMP, Hadoop
S36
Case Study: Hadoop
HDFS: Distributed File System
HBase: Distributed Data Store
MapReduce: Distributed Data Processing
Commodity PC Cluster
Nutch: Open Source Search Engine
GoogleSearch
MapReduce
Bigtable
GFS
GooglePlatform
S37
Hadoop Ecosystem- Power of Open Platform
NexR VCCHadoop on Virtualization
Yahoo PigQuery Language Interface
on Hadoop
Yahoo ZookeeperDistributed Management
IBM MapReduce ToolsEclipse plug-in for
MapReduce programs
Facebook HiveData warehousing on
Hadoop
Mahout & HamaMachine Learning using
Hadoop MapReduce
HDFS, MapReduce
HBase, HOD, Streaming,Fuse-DFS, EC2 Support
KattaDistributed indexing with
Hadoop
ParhelyORM for HBase
CascadingWorkflow management for
Hadoop MapReduce
?
S38
Case Study: NexR VCC
• The first server virtualization tool in Korea• Features
– Xen-based (paravirtualization & HVM)– Web-based AJAX Interface– Multiple Virtual Clusters
• Allocated virtual clusters(VC) to teams• Each team can manage their VC
– Automatic Hadoop Deployment & Execution– User/Group Management– VM Image Store– VM Migration– Web Services API
S39
VCC: Hadoop on Virtualization Auto-Deployment & Execution Integration with Hadoop management
S40
Caution
Be practical!
S41
Key Issues To Overcome- From Forrester Report
• Concerns about stability
• Few big-name players offering clouds
• Few enterprise reference accounts
• Concerns around security
• Lack of commercial ISV support
• Little geographic locality
• Not for the faint-of-tech
• Not very enterprise friendly
S42
Key Issues To Overcome- Others
• Integration with in-house systems
• Application licensing complexity
• Privacy
• Constant network connectivity
• Confidence to service providers
• Open standard
• Interoperability between services
S43
Cloud Computing Incidents Database
“CloudComputing:Incidents Database”, Wikipedia
S44
Outage of Cloud Computing
• Amazon S3 Outage– 8 hours in July 20, 2008 (Affected: all)– Cause: Design fault (server-to-server communication)
• Flexiscale Outage– 2 days in August 26, 2008 (Affected: all)– Cause: Engineer mistake
• Gmail Outage– 2 hours in August 11, 2008 (Affected: many)– Cause: Change management
• Apple MobileMe Outage– Several hours in July 10, 2008 (Affected: many)– Cause: Migration from .Mac to MobileMe
“CloudComputing:Incidents Database”, Wikipedia
S45
Closure of Cloud Computing
• MediaMax/Linkup– Cloud storage service
– Data loss of half of user files in July 2007
– 20,000 paid users are affected
– Finally, service closure in July 2008
• Zimki– Early cloud platform service (from 2006)
– Service closure in December 2007
– Caused by the cease of investment
“CloudComputing:Incidents Database”, Wikipediahttp://www.theregister.co.uk/2007/07/30/canon_stalls_fotango/, TheRegister
S46
Where is Cloud Computing?
More political & psychological than technical
S47
Slide Download
http://www.nexr.co.kr
Thank You!!!
Korea Hadoop Community
http://www.hadoop.or.kr