Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | chris-bunch |
View: | 11,280 times |
Download: | 0 times |
Scalable and Open AppEngine Development and Deployment
Navraj Chohan Chris Bunch Sydney Pang Chandra Krintz Nagy Mostafa Sunil Soman
Rich Wolski
http://www.capgemini.com/technology-blog/2009/04/from_lamp_to_leap_and_beyond.php
Terminology
Infrastructure-as-a-Service (IaaS) e.g., Amazon Web Services Provides full system images
Platform-as-a-Service (PaaS) e.g., Google App Engine
Provides scalable runtime stack
Software-as-a-Service (SaaS) e.g., SalesForce, Gmail
Provides remote application access
• Open-source, Platform-as-a-Service for research and engineering of cloud computing components, applications, and services
• Automated deployment of applications to high-performance databases
• Fine grain control over application environment • Google App Engine apps hosting on your cluster
– Real applications – Familiar API (that is extensible for lock-in avoidance) – Your data and code on your resources
From Google App Engine (GAE) to AppScale
• GAE Application Programming Interface – Datastore (get/put) – Memcache – URL Fetching – Mail – Images – Authentication
• Write Python/Java GAE app – Use SDK locally to test and generate indexes
• APIs implemented as non-scalable, simple versions
From Google App Engine (GAE) to AppScale
• GAE Application Programming Interface – Datastore (get/put) BigTable – Memcache Memcached – URL Fetching – Mail GMail – Images – Authentication Google Accounts
• Write Python/Java GAE app – Use SDK locally to test and generate indexes
• APIs implemented as non-scalable, simple versions – Upload to Google resources
• Highly scalable API implementation
Sandboxed Runtime
• Restricted subset of library calls • No reading/writing from/to file system • Data persistence only via get/put interface • Computation bounded: 30 secs per request • Access web services over via HTTP / HTTPS
only (ports 80 and 443)
Recent GAE Additions
• Python and JVM SDKs – JRuby, Clojure, etc. available through Java
• Task Queue, Cron, XMPP APIs • New SLAs for paying customers
– $0.10 per CPU core hour – $0.10 per GB bandwidth in – $0.12 per GB bandwidth out – $0.15 per GB data stored per month
Protocol Buffers
• Google App Engine’s internal data format – And AppScale’s
• Similar to C-style structs:
message Person { required int32 id = 1; optional string name = 2; }
From Google App Engine (GAE) to AppScale
• AppScale extends the GAE SDK – Replaces the simple, non-scalable API implementation
with pluggable, distributed, scalable components • Using open-source solutions as available/possible • Communication over SSL
• Available as source and as system image – Each instance can implement any component
• Self configuring as part of AppScale cloud deployment – Deploys over
• Virtual machine monitors (Xen, KVM) • Infrastructure (IaaS) cloud layers
IaaS Cloud Systems • Amazon Web Services (AWS)
– Elastic Compute Cloud (EC2), Persistent Storage (S3, EBS) – For-fee, as negotiated in SLA (CPU, network, storage) – Vast resources available
• Users access small (opaque) subset, can scale-out
• Eucalyptus – Open source implementation of the AWS APIs – Inspiration for AppScale – familiar, widely-used API
implementation for execution on your cluster • Limited only by the hardware you have available
Differences in AppScale Deployment Options
• Xen / KVM: – Static deployment
• Can use as many nodes as are manually configured
• Eucalyptus / EC2 – Dynamic deployment
• Can use as many nodes as the system can support (or pay for for EC2 deployment)
– As part of ongoing/future work: support for dynamic scaling • Front-end (user-facing) & back-end (data managment & computation) • SLA renegotiation
AppScale System Layout
GAE App Developer (AppScale
Admin)
GAE App Users
AppScale tools
HTTPS
App Controller
ALB DB M/P
DB S/P
AS GAE App Users GAE App
Users
• AppLoadBalancer (ALB) • AppServer (AS) • Database Master/Slave/Peer (DB M/S/P)
AppController (AC)
• SOAP Server written in Ruby – Runs on all nodes
• Middleware layer • Controls and sets up a node for use
– Sets up configuration files (data replication) – Sets up firewall for security
• Master AC “heartbeats” all other nodes – Collects performance info as well
AppLoadBalancer (ALB)
• Ruby on Rails application • Handles authentication and routing of users
to AppServers • Three copies are deployed via Mongrel
– Load balanced via nginx
Database Management
• Five databases currently available: – HBase, Hypertable: Master / Slave – Cassandra, Voldemort: Peer / Peer – Clustered MySQL: Relational
• Two main components – Protocol Buffer Server: Data access / storage – User / App Server: Authentication
AppServer (AS) • Modified Google App Engine SDK • App requests internally are Protocol Buffers
– Forwards requests to PB Server • Minimal request set:
– Put(id) – Get(id) – Query: Equivalent to get_all_in_table – Delete(id) – Count: Total number of items in database – GetSchema
AppScale Tools • Ruby scripts that initiate AppScale
deployment – Initializes the first AppController for use – Uploads AppEngine app
• Conceptually similar to Amazon AWS EC2 tools – describe-instances – upload-app: Introduce additional apps – terminate-instances
Fault Tolerance
• System can survive the following failures: – AppServer failure – Database Slave failure – Database Peer failure – AppLoadBalancer failure * – AppController failure *
Testing Methodology • Load testing done via the Grinder • Test specifics:
– Initially 3 users – 3 users added every 5 seconds – Done until 160 seconds have passed
• Each user navigates the page, performs some scripted action
• Measured total transactions performed and average response time
AppScale Evaluation Cluster
• Three Grinder nodes, four AppScale nodes – One master, three slaves – Virtualized via Xen – Database: HBase (3x replication) 64 MB HDFS blocks
• PBServer via Thrift; stores entire protocol buffers
• Hardware – Quad-core 2.66 GHz machines – 8 GB of RAM – Connected via Gigabit Ethernet
Applications Tested • Tasks - a to-do list
– Read and write intensive (44 transactions per user) • Cccwiki – allows users to edit web pages
– Read intensive, updates only (74 transactions per user)
• Guestbook – allows users to post messages – Retrieves ten most recent posts only (9 transactions
per user) • Shell – provides an interactive Python shell
– Compute intensive (14 transactions per user)
Transactions per App
App Response Time
Comparison with Google
Room for Improvement
• Current bottlenecks: – Queries perform filtering server-side – Filtering is done outside of the DB – AppEngine, PB Server are single-threaded – Entry point to some DBs is single-threaded
• Future work will address these problems – Will also compare performance across DBs – e.g., BigTable-like DBs vs. P2P DBs
Related Work
• AppDrop – Proof-of-concept Rails app
• TyphoonAE – Relatively new (alpha release) – Runs MongoDB only
• Microsoft Azure – Uses .NET as the platform – Has a similar pricing model to AppEngine
AppScale Recap
• Distributed, multi-component system – Deployed as a single system image (self
configuring) • Static deployment over Xen/KVM • Dynamic deployment over Eucalyptus/EC2
• Databases supported: – HBase, Hypertable, MySQL, Cassandra,
Voldemort • Fault-tolerant
AppScale Recap
• Open cloud research platform – International user community
• Goals – Easy to use and extend – Automatic deployment of PaaS cloud and
GAE apps on resources other than Google’s – Support real applications and users
• Experimentation and testing in real environments
• Current performance results are a baseline
Performance Improvements
• AppEngine now multi-process, load balanced • PB Server now multi-threaded • Storing data like Google for HBase and
Hypertable – Three tables: Reference, Sort Ascending, Sort
Descending
Future Work
• Expand out of the web services domain – Investigating opportunities in streaming – Integrated MapReduce support for high-
performance computing (HPC) – Co-locate AppEngines and use shared
memory • Additional databases:
– MongoDB, Scalaris, CouchDB
Thanks!
• To the AppScale team! – Co-lead Navraj Chohan – Advisor Prof. Chandra Krintz
• To the open-source community • To Google, NSF, and IBM for financial support • To you all for coming out today • Check us out on the web:
– http://appscale.cs.ucsb.edu