2008. 10. 23 Jaesun Han (NexR CEO & Founder) jshan0000 ... · Salesforce.com Apps, etc Cloud...

Post on 27-May-2020

2 views 0 download

transcript

S1

2008. 10. 23

Jaesun Han (NexR CEO & Founder)

jshan0000@gmail.com

http://www.nexr.co.kr

S2

Big Switch: PowerBurden Iron Works

Edison Power Plant & Power Grid

S3

Big Switch: Computing

Corporate Data Center

Cloud Computing Center & Internet

PC

S4

Definition of Cloud Computing“A pool of abstracted, highly scalable, and managed compute infrastructure capable of hosting end-customer applications and billed by consumption”

- “Is Cloud Computing Ready for The Enterprise?”, Forrester Research

“A style of computing where IT-related capabilities are provided „as a service‟, allowing users to access technology-enabled services from the Internet („in the cloud‟) without knowledge of, expertise with, or control over the technology infrastructure that supports them”

- “Cloud Computing”, Wikipedia

“Cloud computing is an emerging approach to shared infrastructure in which large pools of systems are linked together to provide IT services”

- Press release on “Blue Cloud”, IBM

“A paradigm in which information is permanently stored in servers on the Internet and cached temporarily on clients that include desktops, entertainment centers, table computers, notebooks, wall computers, handhelds, etc”

-” ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing”, IEEE Internet Computing

S5

My Definition

• For companies

– Providing IT infrastructure and environment to develop/host/run services and apps, on demand, with pay-as-you-go pricing, as a service

• For end-users

– Providing resource and services to store data and run application, in various devices, anytime, anywhere, as a service

S6

Features

• Prescripted & Abstracted Infrastructure

• Fully Virtualized

• Equipped with Dynamic Infrastructure Software

• Pay by Consumption

• Free of Long-Term Contracts

• Application and OS Independent

• Free of Software or Hardware Installation

Source: “Is Cloud Computing Ready For The Enterprise”, Forrester Research

S7

Advantages

• Economies of scale• Cost

– No upfront CapEx(Capital Expenditure)– Pay-as-you-go pricing model

• Scalability– Scale capacity on demand– Handling dynamic workloads

• Productivity– Easy to use– Reduced time-to-market

• Maintenance– Easy or no management– Instant software updates

S8

The Evolution of Computing

Grid

Computing

Utility

Computing

Cloud

Computing

S9

Grid + Utility + Autonomic = Cloud Computing

• Grid Computing– A form of distributed computing whereby a "super

and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks

• Utility Computing– The packaging of computing resources, such as

computation and storage, as a metered service similar to a traditional public utility such as electricity

• Autonomic Computing– Computer systems capable of self-management

“Cloud Computing”, Wikipedia

S12

Market Volume Estimation

Year 2011$160 billion =

$95 billion (business and productivity apps

(e-mail, office, CRM, etc.))

+$65 billion

(online advertising)

“The Cloud Wars: $100+ billion at stake”, Merrill Lynch Research note

S13

Classifying Cloud Computing

Cloud Services/Applications(Software as a Service)

Apple MobileMe, Google Apps, Nokia Ovi, Salesforce.com Apps, etc

Cloud Platform(Platform as a Service)

Google App Engine, force.com, Facebook F8, Bungee Labs, etc

Cloud Infrastructure(Infrastructure as a Service)

Amazon S3&EC2, Joyent, GoGrid, AT&T, etc

Ente

rprise

Clo

ud C

om

puting

Clo

ud C

om

puting S

oft

ware

Hadoop, 3Te

ra, Xen, VM

ware

, N

exR

VCC,

IBM

Blu

e C

loud, etc

S14

Cloud Infrastructure

• Definition

– Offering virtualized infrastructure resources such as storage, compute, and network, over Intenet for services and apps

S15

Players: Cloud Infrastructure

S16

Case Study: Amazon Web Services

Simple QueueService

Simple StorageService

Elastic ComputeCloud

Simple DB

Infrastructure as a Service

Alexa WebInfo. Service

Alexa TopSites

Alexa SiteThumbnail

Alexa Web Search Platform

Search as a Service

E-CommerceService

HistoricalPricing

Data as a Service

Mechanical Turk

People as as Service

The First & Best Cloud Computing

Cloud Infrastructure

S17

Amazon Cloud Infrastructure

S3$0.15 per GB-Month

EC2$0.10 per

Instance-Hour

SimpleDB$1.50 per GB-Month

S18

Why Amazon AWS?• online video mixing utility• 25,000 to 250,000 users in 3 days• At peak, 20,000 new users per hour• 50 to 4000 instances (servers) in 5 days• At peak, 40 new instances (servers) per hour

S19

Success Story: NY Times

• Image Processing at New York Times– Convert 11 million articles (1851-1980) of TIFF format

into PDF– Using Amazon S3 and EC2 for HW, Hadoop for SW

TIFFformat

(http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/)

S20

NY Times Architecture

TIFF Image (4TB) PDF (1.5TB)

Hadoop MapReduce

Amazon S3

AMI

Amazon EC2(100 instances)

S21

NY Times CostStorage: 5.5 TB

Data Transfer-in: 4 TB Instances: 100 X 24 hours

S3 EC2

http://calculator.s3.amazonaws.com/calc5.html

Only $ 1,465

Actually under $ 400

S22

Cloud Platform (PaaS)

• Definition– Platform offering all of the facilities required to support the

end-to-end life cycle of building and delivering web applications and services entirely available from the Internet -with no software downloads or installation for developers (from “Platform as a service”, Wikipedia)

– Also known as Cloudware

• Supporting functions– Workflow management

• Design development testing deployment hosting maintenance

– Development tool• Web GUI, Client SDK, Team collaboration, version control, developer

community facilitation

– Cloud infrastructure• Storage, computation, persistence, state management, scalability, web

service integration, database integration, security

S23

Players: Cloud Platform

S24

Case Study:Google App Engine

Run your web applications on Google's infrastructure

Offering infrastructure free

500MB Storage, 10 GB Bandwidth In&Out/day, 5 million PV/1 month

Offering python development environment

http://code.google.com/appengine/

S25

Google App Engine

1.Scalable Service Infrastructure

2. Python Runtime and APIs

3. Software Development Kit

4. Web-based Admin Console

5. Scalable Datastore

S26

Scalable Service Infra

GFS: Distributed File System

Bigtable: Distributed Data Store

Commodity PC Cluster

Python Runtime: Service Execution

Services Services ServicesServices

S27

Software Development Kit

App Engine SDK

Web Server

Python Framework

webapp, Django

API local version

datastore

Google Acount

URL Fetch

Mail

Uploader

Development, Testing

Deploy

S28

App Engine APIs

S29

Scalable Datastore

Datastore Model Class

GQL Query

S30

Web-based Admin Console

S31Code: Using webappframework

from google.appengine.ext import webappfrom google.appengine.ext.webapp.util import run_wsgi_app

class MainPage(webapp.RequestHandler):def get(self):self.response.headers['Content-Type'] = 'text/plain'self.response.out.write('Hello, webapp World!')

application = webapp.WSGIApplication([('/', MainPage)],debug=True)

def main():run_wsgi_app(application)

if __name__ == "__main__":main()

application: helloworldversion: 1 runtime: python api_version: 1

handlers: - url: /.* script: helloworld.py

google_appengine/dev_appserver.py helloworld/ http://localhost:8080/

helloworld.py app.yaml

Testing the app

S32

Code: Using User Service

from google.appengine.api import usersfrom google.appengine.ext import webappfrom google.appengine.ext.webapp.util import run_wsgi_app

class MainPage(webapp.RequestHandler):def get(self):user = users.get_current_user()

if user:self.response.headers['Content-Type'] = 'text/plain'self.response.out.write('Hello, ' + user.nickname())

else:self.redirect(users.create_login_url(self.request.uri))

application = webapp.WSGIApplication([('/', MainPage)],debug=True)

def main():run_wsgi_app(application)

if __name__ == "__main__":main()

helloworld.py

S33

Code: Using Datastore

from google.appengine.ext import db

class Greeting(db.Model): // defining data modelauthor = db.UserProperty()content = db.StringProperty(multiline=True)date = db.DateTimeProperty(auto_now_add=True)

class Guestbook(webapp.RequestHandler): // storing datadef post(self):greeting = Greeting()if users.get_current_user():greeting.author = users.get_current_user()

greeting.content = self.request.get('content')greeting.put()self.redirect('/')

class MainPage(webapp.RequestHandler): // querying datadef get(self):greetings = db.GqlQuery("SELECT * FROM Greeting ORDER BY date DESC LIMIT 10")for greeting in greetings:if greeting.author:

helloworld.py

S34

Code: Uploading the App

1. Signing in to App Engine: http://appengine.google.com/ 2. Creating an App http://application-id.appspot.com/

Registering the app

appcfg.py update helloworld/ http://application-id.appspot.com

Uploading the app

S35

Cloud Software

• Definition

– Software to help building and running cloud computing service and environment

• There are too many softwares

– Especially, open source cloud software

– LAMP, Hadoop

S36

Case Study: Hadoop

HDFS: Distributed File System

HBase: Distributed Data Store

MapReduce: Distributed Data Processing

Commodity PC Cluster

Nutch: Open Source Search Engine

GoogleSearch

MapReduce

Bigtable

GFS

GooglePlatform

S37

Hadoop Ecosystem- Power of Open Platform

NexR VCCHadoop on Virtualization

Yahoo PigQuery Language Interface

on Hadoop

Yahoo ZookeeperDistributed Management

IBM MapReduce ToolsEclipse plug-in for

MapReduce programs

Facebook HiveData warehousing on

Hadoop

Mahout & HamaMachine Learning using

Hadoop MapReduce

HDFS, MapReduce

HBase, HOD, Streaming,Fuse-DFS, EC2 Support

KattaDistributed indexing with

Hadoop

ParhelyORM for HBase

CascadingWorkflow management for

Hadoop MapReduce

?

S38

Case Study: NexR VCC

• The first server virtualization tool in Korea• Features

– Xen-based (paravirtualization & HVM)– Web-based AJAX Interface– Multiple Virtual Clusters

• Allocated virtual clusters(VC) to teams• Each team can manage their VC

– Automatic Hadoop Deployment & Execution– User/Group Management– VM Image Store– VM Migration– Web Services API

S39

VCC: Hadoop on Virtualization Auto-Deployment & Execution Integration with Hadoop management

S40

Caution

Be practical!

S41

Key Issues To Overcome- From Forrester Report

• Concerns about stability

• Few big-name players offering clouds

• Few enterprise reference accounts

• Concerns around security

• Lack of commercial ISV support

• Little geographic locality

• Not for the faint-of-tech

• Not very enterprise friendly

S42

Key Issues To Overcome- Others

• Integration with in-house systems

• Application licensing complexity

• Privacy

• Constant network connectivity

• Confidence to service providers

• Open standard

• Interoperability between services

S43

Cloud Computing Incidents Database

“CloudComputing:Incidents Database”, Wikipedia

S44

Outage of Cloud Computing

• Amazon S3 Outage– 8 hours in July 20, 2008 (Affected: all)– Cause: Design fault (server-to-server communication)

• Flexiscale Outage– 2 days in August 26, 2008 (Affected: all)– Cause: Engineer mistake

• Gmail Outage– 2 hours in August 11, 2008 (Affected: many)– Cause: Change management

• Apple MobileMe Outage– Several hours in July 10, 2008 (Affected: many)– Cause: Migration from .Mac to MobileMe

“CloudComputing:Incidents Database”, Wikipedia

S45

Closure of Cloud Computing

• MediaMax/Linkup– Cloud storage service

– Data loss of half of user files in July 2007

– 20,000 paid users are affected

– Finally, service closure in July 2008

• Zimki– Early cloud platform service (from 2006)

– Service closure in December 2007

– Caused by the cease of investment

“CloudComputing:Incidents Database”, Wikipediahttp://www.theregister.co.uk/2007/07/30/canon_stalls_fotango/, TheRegister

S46

Where is Cloud Computing?

More political & psychological than technical

S47

Slide Download

http://www.nexr.co.kr

Thank You!!!

Korea Hadoop Community

http://www.hadoop.or.kr