PracticalCloud Computing
Stefan Tilkov | @stilkov | JBoss One Day Talk 2011
http://www.innoq.com
© 2011 innoQ Deutschland GmbH
Web & Internet
Utility Computing
Grid ComputingService Orientation
Virtualization
Cloud Computing
© 2011 innoQ Deutschland GmbH
“Cloud Computing” is an approach to IT architecture where resources (such
as virtualized hardware, storage capacity, CPU time or higher-level
services) can be dynamicallyreserved, used, and released over a network—usually the Internet—as
needed.
“Cloudonomics”
Joe Weinman http://gigaom.com/2008/09/07/the-10-laws-of-cloudonomics/
1. Utility services cost less2. On-demand trumps forecasting3. Peak of the sum <= sum of the peaks4. Reduced average unit costs5. Superiority in numbers
Taxonomy
Application
Service
Platform
Virtualization Layer
Hardware
You: Somebody else:
Whatever you call what you do today
Application
Service
Platform
Virtualization Layer
Hardware
You: Somebody else:
Infrastructure as a Service
Application
Service
Platform
Virtualization Layer
Hardware
IaaS
You: Somebody else:
Platform as a Service
Virtualization Layer
Hardware
Application
Service
PlatformPaaS
You: Somebody else:
Software as a Service
Virtualization Layer
Hardware
Application
Service
PlatformSaaS
Cloud Types
Public Cloud
Private Cloud
Hybrid Cloud
Community Cloud
Virtual Private Cloud
Let’s build our own!
How hard can it be?
© 2011 innoQ Deutschland GmbH
Step 1: Manual
Physical System
User
IT Ops
Users ask for resources
Manual installation by ops team
“Real” hardware
© 2011 innoQ Deutschland GmbH
Step 2: Virtualized
Users ask for resources
Manual installation by ops team
“Virtual” hardware
Pre-built images
Physical System(s)
User
IT Ops
VM VM VM
ImageImage
Image
© 2011 innoQ Deutschland GmbH
Step 3: IT-Supported
Application for user requests
Manual installation by ops team
“Virtual” hardware
Pre-built images
Physical System(s)
User
IT Ops
VM VM
IT Support App
VM
User DB
ImageImage
Image
© 2011 innoQ Deutschland GmbH
Step 4: Automated
Provisioning application for user self-service
Automated installation
“Virtual” hardware
User-de!ned images
Physical System(s)
User
IT Ops
VM VMVM
User DBPersistence Store
ImageImageImage
Provisioning App
© 2011 innoQ Deutschland GmbH
Step 5: Autoscaling
Provisioning application for user self-service
Automated installation by the application
“Virtual” hardware
User-de!ned imagesPhysical System(s)
VM VM
Provisioning App
VM
User DBPersistence Store
ImageImageImage
Application
© 2011 innoQ Deutschland GmbH
Step 6: High-level Services
Physical System(s)
User
IT Ops
VM VMVM
User DBPersistence Store
ImageImageImage
Provisioning App
Management
Monitoring
BillingMulti-tenancy
Shared Images
Licensing
Backup
Load Balancing
Autoscaling
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
First of all …
© 2011 innoQ Deutschland GmbH
0. Don’t do anything
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
Deploy application without modi!cation
Same tools, same tasks Programmatic virtualization
“So" Hardware”Most popular: Amazon EC2
1. Dynamic, virtualized deployment
Characteristics
© 2011 innoQ Deutschland GmbH
Amazon EC2
Elastic Computing Cloud (EC2)Simple Storage Service (S3)
Elastic Block Storage (EBS)
© 2011 innoQ Deutschland GmbH
1. Dynamic, virtualized deployment
AdvantagesFast deployment
Easy backup
Simple packaging
Utilization-based licensing models
(Limited) scaling
Re-use of pre-packaged instances
Easy volume snapshots
Increased reliability via availability zones
© 2011 innoQ Deutschland GmbH
1. Dynamic, virtualized deployment
DisadvantagesExpensive when used un-elastically
Transient instances
No guarantees for latency between server instances
(Limited) scaling
(Slight) Vendor lock-in
Security concerns
© 2011 innoQ Deutschland GmbH
1. Dynamic, virtualized deployment
Use CasesReference installations
Performance and load testing
Development services (e.g. build)
Time-limited hosting
Use of pre-packaged images
Rarely used (“exotic”) apps
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
Distribution across independent instances
No single point of failure
Nothing shared
Everything partitioned
Growable/shrinkable dynamically
2. Do-it-yourself scaling
Characteristics
© 2011 innoQ Deutschland GmbH
Simple, distributed persistence
Key/value, document, or column-based
No (distributed) transactions
Eventual consistency
Examples: HBase, Cassandra, Riak, …
2. Do-it-yourself scaling
NoSQL Datastores
© 2011 innoQ Deutschland GmbH
Partitioning into “Shards”
Each shard handled by independent component
Distribute across multiple boxes, possibly at di!erent locations
2. Do-it-yourself scaling
Sharding
© 2011 innoQ Deutschland GmbH
Simple approach:
target server = hash(key) mod n
What happens when server dies?
Solution: HashRing
Client-side (re-)partitionin
2. Do-it-yourself scalingClient-controlled Partitioning
© 2011 innoQ Deutschland GmbH
2. Do-it-yourself scaling
Consistent Hashing
Diagrams by Tom White,http://tinyurl.com/cons-hash
A, B, C: nodes(e.g. caches)
1, 2, …: Hash values
Move clock-wise to !nd cache
Introduce virtual replicas to ensure distribution
© 2011 innoQ Deutschland GmbH
Full control
Choice of technology & products
Optimized solution
Vendor independence
2. Do-it-yourself scaling
Advantages
© 2011 innoQ Deutschland GmbH
Challenging technologies
Many low-level tasks
Emerging practices
Signi"cant e!ort required
2. Do-it-yourself scaling
Disadvantages
© 2011 innoQ Deutschland GmbH
Strong elasticity/scaling requirements
Building higher-level platform
Virtualization of existing scalable solution
Speci"c technology requirements
2. Do-it-yourself scaling
Use Cases
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
Use high-level service APIs
Let someone else handle operations
Scale (more or less) seamlessly
3. Higher-level services
Characteristics
© 2011 innoQ Deutschland GmbH
11/11/09 2:01 PMAPI Directory - Google Data Protocol - Google Code
Page 1 of 2http://code.google.com/apis/gdata/docs/directory.html
More personalization in Google Friend Connect New!
Google Data Protocol
The following Google services provide APIs that implement the Google Data Protocol.
Each API has its own set of guides and resources, including information about using client libraries. If you're trying toaccomplish a certain task with an API, the Developer's Guide for that API should point you in the right direction. Most APIsalso include code samples and other easy ways to get started.
API Home Guides Client Libraries
Google Analytics Data Export API Developer's Guide Reference Guide
Client Libraries and Sample Code(JS, Java, PHP, Python, Ruby)
Google Apps APIs List of All Apps APIs
Google Base Data API Developer's Guide Reference Guide
Blogger Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Booksearch Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, PHP)
Google Calendar Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Code Search Data API Developer's Guide Reference Guide
Google Contacts Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, Python, JS, Obj-C)
Google Documents List Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Finance Portfolio Data API Developer's Guide Reference Guide
Google Health Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Ruby)
API Directory
11/11/09 2:01 PMAPI Directory - Google Data Protocol - Google Code
Page 2 of 2http://code.google.com/apis/gdata/docs/directory.html
©2009 Google - Code Home - Terms of Service - Privacy Policy - Site Directory
Google Code offered in: English - Español - 日本語 - ��� - Português - Pусский - 中文(�体) - 中文(繁體)
Google Maps Data API Developer's Guide Reference Guide
Picasa Web Albums Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Sidewiki Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, JavaScript)
Google Sites Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Spreadsheets Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Webmaster Tools Data API Developer's Guide Reference Guide
YouTube Data API Developer's Guide Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
11/11/09 2:01 PMAPI Directory - Google Data Protocol - Google Code
Page 1 of 2http://code.google.com/apis/gdata/docs/directory.html
More personalization in Google Friend Connect New!
Google Data Protocol
The following Google services provide APIs that implement the Google Data Protocol.
Each API has its own set of guides and resources, including information about using client libraries. If you're trying toaccomplish a certain task with an API, the Developer's Guide for that API should point you in the right direction. Most APIsalso include code samples and other easy ways to get started.
API Home Guides Client Libraries
Google Analytics Data Export API Developer's Guide Reference Guide
Client Libraries and Sample Code(JS, Java, PHP, Python, Ruby)
Google Apps APIs List of All Apps APIs
Google Base Data API Developer's Guide Reference Guide
Blogger Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Booksearch Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, PHP)
Google Calendar Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Code Search Data API Developer's Guide Reference Guide
Google Contacts Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, Python, JS, Obj-C)
Google Documents List Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Finance Portfolio Data API Developer's Guide Reference Guide
Google Health Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Ruby)
API Directory
11/11/09 2:01 PMAPI Directory - Google Data Protocol - Google Code
Page 1 of 2http://code.google.com/apis/gdata/docs/directory.html
More personalization in Google Friend Connect New!
Google Data Protocol
The following Google services provide APIs that implement the Google Data Protocol.
Each API has its own set of guides and resources, including information about using client libraries. If you're trying toaccomplish a certain task with an API, the Developer's Guide for that API should point you in the right direction. Most APIsalso include code samples and other easy ways to get started.
API Home Guides Client Libraries
Google Analytics Data Export API Developer's Guide Reference Guide
Client Libraries and Sample Code(JS, Java, PHP, Python, Ruby)
Google Apps APIs List of All Apps APIs
Google Base Data API Developer's Guide Reference Guide
Blogger Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Booksearch Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, PHP)
Google Calendar Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, JS, Obj-C)
Google Code Search Data API Developer's Guide Reference Guide
Google Contacts Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, Python, JS, Obj-C)
Google Documents List Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Obj-C)
Google Finance Portfolio Data API Developer's Guide Reference Guide
Google Health Data API Developer's Guide Reference Guide
Client Libraries and Sample Code(Java, .NET, PHP, Python, Ruby)
API Directory
3. Higher-level services
Google Service APIs
© 2011 innoQ Deutschland GmbH
Elastic Computing Cloud (EC2)Simple Storage Service (S3)
Simple Queueing Service (SQS)
Simple DB
Cloudfront
Elastic Block Storage (EBS)
Elastic MapReduce
DevPay
FPS
Relational Data Service (RDS)
3. Higher-level services
Amazon Service APIs
© 2011 innoQ Deutschland GmbH
Ease of use
Independence from implementation details
Maximum reach
High availability
No installation, operations, maintenance
3. Higher-level services
Advantages
© 2011 innoQ Deutschland GmbH
High latency
Proprietary APIs
Dubious SLAs
Little to no portability/vendor lock-in
3. Higher-level services
Disadvantages
© 2011 innoQ Deutschland GmbH
Storage of publicly accessible data
Global Collaboration
Coexistence with models 1 & 2
3. Higher-level services
Use Cases
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
O#en based on MapReduce Concept
Amazon ElasticMapReduce (model 3)
Essentially Grid computing
Dynamic due to scalable platform
4. Parallel processing
Characteristics
© 2011 innoQ Deutschland GmbH
Used internally by Google
Described in a research paper
Massive parallelization of large data set processing
4. Parallel processing
MapReduce
Map Map Map
Reduce Reduce Reduce
Shuffle/Sort
Collect Results
Split Input
Input
Output
© 2011 innoQ Deutschland GmbH
4. Parallel processing
Apache Hadoop
Hadoop Common
HBase
HDFS
Hive
MapReduce
Pig
ZooKeeper
© 2011 innoQ Deutschland GmbH
Simple yet highly scalable model
Unusual for developers
Makes certain impossible tasks possible
Highly cost-e!ective with elasticity
4. Parallel processing
Advantages
© 2011 innoQ Deutschland GmbH
Requires re-implementation
Hard for some (most?) developers
Only rarely applicable
4. Parallel processing
Disadvantages
© 2011 innoQ Deutschland GmbH
Very complex calculations
Computation over very large data sets
Large data migration tasks
4. Parallel processing
Use Cases
© 2011 innoQ Deutschland GmbH
Usage Models
1. Dynamic, virtualized deployment
2. Do-It-Yourself scaling
3. Higher-level services
4. Parallel processing
5. Someone else’s platform
© 2011 innoQ Deutschland GmbH
Deploy application into vendor cloud
No machines, instances, OS or infrastructure so#ware to maintain
Automatic scalability
New container model
5. Someone else’s platform
Characteristics
© 2011 innoQ Deutschland GmbH
Web app request/response programming modelQueueing/Async processingScalable PersistenceCachingMonitoringMuch more: Identity/SSO, Billing, …
5. Someone else’s platform
Programming model
© 2011 innoQ Deutschland GmbH
Restricted, sandboxed environmentno threads
no !le system
For Java:no java.lang.System
restricted reflection
5. Someone else’s platform
Google App Engine
© 2011 innoQ Deutschland GmbH
MailURL FetchXMPPImage Manipulation
MemcacheTaskQueueUser API
5. Someone else’s platform
Google App EngineSimpli!ed Python/Java Web App Container low level: key value storehigher-level Java persistence APIs: JDO, JPA
© 2011 innoQ Deutschland GmbH
Live Services
App Fabric(formerly .NET Services)
SQL Azure(formerly SQL Services)
SharePoint Services
Dynamic CRM Services
5. Someone else’s platform
Microsoft Azure
© 2011 innoQ Deutschland GmbH
Transparent hosting and scaling
Initially Rails, now also Node.js, Clojure, Java, Python, Scala
Synchronization via git
High-level APIs for Web requests, caching, etc.
Runs on AWS/EC2
Acquired by Salesforce.com (2010)
5. Someone else’s platform
Heroku
© 2011 innoQ Deutschland GmbH
Plattform based on Apache Tomcat 6
(Limited) Autoscaling
Administration console
Con"gurable
Customizable
5. Someone else’s platform
Amazon Elastic Beanstalk
© 2011 innoQ Deutschland GmbH
CloudBees.com/org
OpenShi#.com (Redhat/JBoss)
Oracle Public Cloud
5. Someone else’s platform
Emerging Cloud platforms
© 2011 innoQ Deutschland GmbH
Complete container model
No infrastructure, no middleware to maintain
Pre-de"ned, scalable architecture
Integration with higher-level services(e.g. identity, billing, …)
Unlimited scaling (in theory)
5. Someone else’s platform
Advantages
© 2011 innoQ Deutschland GmbH
Very little control
Vendor lock-in
New and di!erent APIs
5. Someone else’s platform
Disadvantages
© 2011 innoQ Deutschland GmbH
Public Web applications with unknown scalability requirements
Non-mission-critical, internal applications
Platform extensions (e.g. Facebook)
Programming model of the future (?)
5. Someone else’s platform
Use Cases
Conclusion
1.Cloud Computing is real
2.You have use cases today
3.What are you waiting for?
Stefan [email protected]://www.innoq.com/blog/st/@stilkovPhone: +49 170 471 2625
innoQ Deutschland GmbH innoQ Schweiz GmbH
www.innoq.com [email protected]
Halskestr. 17D-40880 RatingenPhone: +49 21 02 77 172-100
Gewerbestr. 11CH-6630 ChamPhone: +41 41 02 743 01 11
Thank you!
Q&A
Somebody always asks:
“But what about Security?”
© 2011 innoQ Deutschland GmbH
Public Cloud = insecurePrivate Cloud = secure?
Counter-arguments
Private data centers are insecure
“O!line” is insecure
Attacks come from inside
Publicity/Visibility leads to scrutiny
“Compliance as a Service”
NSA sees everything, anyway
Counter-counter-arguments
Interesting business data
Really sensitive data
Who care about arguments?