DEUTSCH-FRANZSISCHE SOMMERUNIVERSITT
FR NACHWUCHSWISSENSCHAFTLER 2011
UNIVERSIT DT FRANCO-ALLEMANDE POUR JEUNES CHERCHEURS 2011
CLOUD COMPUTING :
DFIS ET OPPORTUNITS CLOUD COMPUTING :
HERAUSFORDERUNGEN UND MGLICHKEITEN
17.7. 22.7. 2011
Windows Azure as a Platform as a Service (PaaS)
Jared Jackson Microsoft Research
Before we begin Some Results
Vanilla 23%
Chocolate 13%
Strawberry 10%
Coffee 3%
Banana 3%
Pistachio 7%
Mango 3%
Amarena 3%
Malaga 3%
Cherry 3%
Tiramisu 3%
Stratiatella 10%
Cheescake 3%
Cookies and Cream
3%
Walnut 3%
Cinamon 3%
Favorite Ice Cream
Vanilla 33%
Chocolate 11%
Strawberry 5%
Coffee 2%
Cherry 2%
Cookies and Cream 4%
Butter Pecan 7%
Neapolitan 4%
Chocolate Chip 4%
Other 29%
Ice Cream Consumption
Source: International Ice Cream Association (makeicecream.com)
Windows Azure Overview
4
Web Application Model Comparison
Machines Running IIS / ASP.NET
Machines Running Windows Services
Machines Running SQL Server
Ad Hoc Application Model
5
Web Application Model Comparison
Machines Running IIS / ASP.NET
Machines Running Windows Services
Machines Running SQL Server
Ad Hoc Application Model
Web Role Instances Worker Role
Instances
Azure Storage Blob / Queue / Table
SQL Azure
Windows Azure Application Model
Key Components Fabric Controller
Manages hardware and virtual machines for service
Compute Web Roles
Web application front end
Worker Roles Utility compute
VM Roles Custom compute role; You own and customize the VM
Storage Blobs
Binary objects
Tables Entity storage
Queues Role coordination
SQL Azure SQL in the cloud
Key Components Fabric Controller
Think of it as an automated IT department
Cloud Layer on top of:
Windows Server 2008
A custom version of Hyper-V called the Windows Azure Hypervisor
Allows for automated management of virtual machines
Key Components Fabric Controller
Think of it as an automated IT department Cloud Layer on top of:
Windows Server 2008
A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines
Its job is to provision, deploy, monitor, and maintain applications in data centers
Applications have a shape and a configuration. The configuration definition describes the shape of a service
Role types
Role VM sizes
External and internal endpoints
Local storage
The configuration settings configures a service Instance count
Storage keys
Application-specific settings
Key Components Fabric Controller
Manages nodes and edges in the fabric (the hardware) Power-on automation devices Routers / Switches Hardware load balancers Physical servers Virtual servers
State transitions Current State Goal State Does what is needed to reach and maintain the goal state
Its a perfect IT employee! Never sleeps Doesnt ever ask for raise Always does what you tell it to do in configuration definition and settings
Creating a New Project
Windows Azure Compute
Key Components Compute Web Roles
Web Front End
Cloud web server
Web pages
Web services
You can create the following types:
ASP.NET web roles
ASP.NET MVC 2 web roles
WCF service web roles
Worker roles
CGI-based web roles
Key Components Compute Worker Roles
Utility compute
Windows Server 2008
Background processing
Each role can define an amount of local storage.
Protected space on the local drive, considered volatile storage.
May communicate with outside services
Azure Storage
SQL Azure
Other Web services
Can expose external and internal endpoints
Suggested Application Model Using queues for reliable messaging
Scalable, Fault Tolerant Applications
Queues are the application glue Decouple parts of application, easier to scale independently; Resource allocation, different priority queues and backend servers Mask faults in worker roles (reliable messaging).
Key Components Compute VM Roles
Customized Role
You own the box
How it works:
Download Guest OS to Server 2008 Hyper-V
Customize the OS as you need to
Upload the differences VHD
Azure runs your VM role using
Base OS
Differences VHD
Application Hosting
Grokking the service model
Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate
The service model is the same diagram written down in a declarative format
You give the Fabric the service model and the binaries that go with each of those nodes
The Fabric can provision, deploy and manage that diagram for you
Find hardware home
Copy and launch your app binaries
Monitor your app and the hardware
In case of failure, take action. Perhaps even relocate your app
At all times, the diagram stays whole
Automated Service Management Provide code + service model
Platform identifies and allocates resources, deploys the service, manages service health
Configuration is handled by two files
ServiceDefinition.csdef
ServiceConfiguration.cscfg
Service Definition
Service Configuration
GUI
Double click on Role Name in Azure Project
Deploying to the cloud
We can deploy from the portal or from script
VS builds two files.
Encrypted package of your code
Your config file
You must create an Azure account, then a service, and then you deploy your code.
Can take up to 20 minutes
(which is better than six months)
Service Management API
REST based API to manage your services
X509-certs for authentication
Lets you create, delete, change, upgrade, swap,.
Lots of community and MSFT-built tools around the API
- Easy to roll your own
The Secret Sauce The Fabric
The Fabric is the brain behind Windows Azure.
1. Process service model
1. Determine resource requirements
2. Create role images
2. Allocate resources
3. Prepare nodes
1. Place role images on nodes
2. Configure settings
3. Start roles
4. Configure load balancers
5. Maintain service health
1. If role fails, restart the role, based on policy
2. If node fails, migrate the role, based on policy
Storage
Durable Storage, At Massive Scale
Blob
- Massive files e.g. videos, logs
Drive
- Use standard file system APIs
Tables - Non-relational, but with few scale limits
- Use SQL Azure for relational data
Queues
- Facilitate loosely-coupled, reliable, systems
Blob Features and Functions
Store Large Objects (up to 1TB in size)
Can be served through Windows Azure CDN service
Standard REST Interface
PutBlob
Inserts a new blob, overwrites the existing blob
GetBlob
Get whole blob or a specific range
DeleteBlob
CopyBlob
SnapshotBlob
LeaseBlob
Two Types of Blobs Under the Hood
Block Blob
Targeted at streaming workloads
Each blob consists of a sequence of blocks Each block is identified by a Block ID
Size limit 200GB per blob
Page Blob
Targeted at random read/write workloads
Each blob consists of an array of pages Each page is identified by its offset
from the start of the blob
Size limit 1TB per blob
Windows Azure Drive
Provides a durable NTFS volume for Windows Azure applications to use
Use existing NTFS APIs to access a durable drive Durability and survival of data on application failover
Enables migrating existing NTFS applications to the cloud
A Windows Azure Drive is a Page Blob
Example, mount Page Blob as X:\ http://.blob.core.windows.net//
All writes to drive are made durable to the Page Blob Drive made durable through standard Page Blob replication
Drive persists even when not mounted as a Page Blob
Windows Azure Tables
Provides Structured Storage
Massively Scalable Tables Billions of entities (rows) and TBs of data
Can use thousands of servers as traffic grows
Highly Available & Durable Data is replicated several times
Familiar and Easy to use API
WCF Data Services and OData .NET classes and LINQ
REST with any platform or language
Windows Azure Queues
Queue are performance efficient, highly available and provide reliable message delivery
Simple, asynchronous work dispatch
Programming semantics ensure that a message can be processed at least once
Access is provided via REST
Storage Partitioning
Understanding partitioning is key to understanding performance
Different for each data type (blobs, entities, queues) Every data object has a partition key
A partition can be served by a single server
System load balances partitions based on traffic pattern
Controls entity locality Partition key is unit of scale
Load balancing can take a few minutes to kick in
Can take a couple of seconds for partition to be available on a different server
System load balances
Use exponential backoff on Server Busy
Our system load balances to meet your traffic needs
Single partition limits have been reached Server Busy
Partition Keys In Each Abstraction
Entities w/ same PartitionKey value served from same partition Entities TableName + PartitionKey
PartitionKey (CustomerId) RowKey (RowKind)
Name CreditCardNumber OrderTotal
1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx
1 Order 1 $35.12
2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx
2 Order 3 $10.00
Every blob and its snapshots are in a single partition Blobs Container name + Blob name
All messages for a single queue belong to the same partition Messages Queue Name
Container Name Blob Name
image annarbor/bighouse.jpg
image foxborough/gillette.jpg
video annarbor/bighouse.jpg
Queue Message
jobs Message1
jobs Message2
workflow Message1
Scalability Targets
Storage Account
Capacity Up to 100 TBs
Transactions Up to a few thousand requests per second
Bandwidth Up to a few hundred megabytes per second
Single Queue/Table Partition
Up to 500 transactions per second
To go above these numbers, partition between multiple storage accounts and partitions
When limit is hit, app will see 503 server busy: applications should implement exponential backoff
Single Blob Partition
Throughput up to 60 MB/s
PartitionKey (Category)
RowKey (Title)
Timestamp ReleaseDate
Action Fast & Furious 2009
Action The Bourne Ultimatum 2007
Animation Open Season 2 2009
Animation The Ant Bully 2006
PartitionKey (Category)
RowKey (Title)
Timestamp ReleaseDate
Comedy Office Space 1999
SciFi X-Men Origins: Wolverine 2009
War Defiance 2008
PartitionKey (Category)
RowKey (Title)
Timestamp ReleaseDate
Action Fast & Furious 2009
Action The Bourne Ultimatum 2007
Animation Open Season 2 2009
Animation The Ant Bully 2006
Comedy Office Space 1999
SciFi X-Men Origins: Wolverine 2009
War Defiance 2008
Partitions and Partition Ranges
Key Selection: Things to Consider
Distribute load as much as possible Hot partitions can be load balanced PartitionKey is critical for scalability
See http://www.microsoftpdc.com/2009/SVC09 and http://azurescope.cloudapp.net for more information
Avoid frequent large scans Parallelize queries Point queries are most efficient
Transactions across a single partition Transaction semantics & Reduce round trips
Scalability
Query Efficiency & Speed
Entity group transactions
Expect Continuation Tokens Seriously!
Maximum of 1000 rows in a response
At the end of partition range boundary
Maximum of 1000 rows in a response
At the end of partition range boundary
Maximum of 5 seconds to execute the query
Tables Recap Efficient for frequently used queries
Supports batch transactions
Distributes load
Select PartitionKey and RowKey that help scale
Avoid Append only patterns
Always Handle continuation tokens
OR predicates are not optimized
Implement back-off strategy for retries
Distribute by using a hash etc. as prefix
Expect continuation tokens for range queries
Execute the queries that form the OR predicates as separate queries
Server busy
Load balance partitions to meet traffic needs
Load on single partition has exceeded the limits
WCF Data Services
Use a new context for each logical operation
AddObject/AttachTo can throw exception if entity is already being tracked
Point query throws an exception if resource does not exist. Use IgnoreResourceNotFoundException
Queues Their Unique Role in Building Reliable, Scalable Applications
Want roles that work closely together, but are not bound together. Tight coupling leads to brittleness
This can aid in scaling and performance
A queue can hold an unlimited number of messages Messages must be serializable as XML
Limited to 8KB in size
Commonly use the work ticket pattern
Why not simply use a table?
Queue Terminology
Message Lifecycle
Queue
Msg 1
Msg 2
Msg 3
Msg 4
Worker Role
Worker Role
PutMessage
Web Role
GetMessage (Timeout) RemoveMessage
Msg 2 Msg 1
Worker Role
Msg 2
POST http://myaccount.queue.core.windows.net/myqueue/messages
HTTP/1.1 200 OK Transfer-Encoding: chunked Content-Type: application/xml Date: Tue, 09 Dec 2008 21:04:30 GMT Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/2.0
5974b586-0df3-4e2d-ad0c-18e3892bfca2 Mon, 22 Sep 2008 23:29:20 GMT Mon, 29 Sep 2008 23:29:20 GMT YzQ4Yzg1MDIGM0MDFiZDAwYzEw Tue, 23 Sep 2008 05:29:20GMT PHRlc3Q+dG...dGVzdD4=
DELETE http://myaccount.queue.core.windows.net/myqueue/messages/messageid?popreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw
Truncated Exponential Back Off Polling
Consider a backoff polling approach Each empty poll
increases interval by 2x
A successful sets the interval back to 1.
44
2 1
1 1
C1
C2
Removing Poison Messages
1 1
2 1
3 4 0
Producers Consumers
P2
P1
3 0
2. GetMessage(Q, 30 s) msg 2
1. GetMessage(Q, 30 s) msg 1
1 1
2 1
1 0
2 0
45
C1
C2
Removing Poison Messages
3 4 0
Producers Consumers
P2
P1
1 1
2 1
2. GetMessage(Q, 30 s) msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s) msg 1
1. GetMessage(Q, 30 s) msg 1 5. C1 crashed
1 1
2 1
6. msg1 visible 30 s after Dequeue 3 0
1 2
1 1
1 2
46
C1
C2
Removing Poison Messages
3 4 0
Producers Consumers
P2
P1
1 2
2. Dequeue(Q, 30 sec) msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec) msg 1 8. C2 crashed
1. Dequeue(Q, 30 sec) msg 1 5. C1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec) msg 1 12. DequeueCount > 2 13. Delete (Q, msg1) 1
2
6. msg1 visible 30s after Dequeue 9. msg1 visible 30s after Dequeue
3 0
1 3
1 2
1 3
Queues Recap
No need to deal with failures Make message
processing idempotent
Invisible messages result in out of order Do not rely on order
Enforce threshold on messages dequeue count Use Dequeue count to remove
poison messages
Messages > 8KB
Batch messages
Garbage collect orphaned blobs
Dynamically increase/reduce workers
Use blob to store message data with
reference in message
Use message count to scale
No need to deal with failures
Invisible messages result in out of order
Enforce threshold on messages dequeue count
Dynamically increase/reduce workers
Windows Azure Storage Takeaways
Blobs
Drives
Tables
Queues
http://blogs.msdn.com/windowsazurestorage/
http://azurescope.cloudapp.net
49
A Quick Exercise
Then lets look at some code and some tools
50
Code AccountInformation.cs public class AccountInformation { private static string storageKey = tHiSiSnOtMyKeY"; private static string accountName = "jjstore"; private static StorageCredentialsAccountAndKey credentials; internal static StorageCredentialsAccountAndKey Credentials { get { if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName, storageKey); return credentials; } } } }
51
Code BlobHelper.cs public class BlobHelper { private static string defaultContainerName = "school"; private CloudBlobClient client = null; private CloudBlobContainer container = null; private void InitContainer() { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudBlobClient(); container = client.GetContainerReference(defaultContainerName); container.CreateIfNotExist(); BlobContainerPermissions permissions = container.GetPermissions(); permissions.PublicAccess = BlobContainerPublicAccessType.Container; container.SetPermissions(permissions); } }
52
Code BlobHelper.cs
public void WriteFileToBlob(string filePath) { if (client == null || container == null) InitContainer(); FileInfo file = new FileInfo(filePath); CloudBlob blob = container.GetBlobReference(file.Name); blob.Properties.ContentType = GetContentType(file.Extension); blob.UploadFile(file.FullName); // Or if you want to write a string replace the last line with: // blob.UploadText(someString); // And make sure you set the content type to the appropriate MIME type (e.g. text/plain) }
53
Code BlobHelper.cs
public string GetBlobText(string blobName) { if (client == null || container == null) InitContainer(); CloudBlob blob = container.GetBlobReference(blobName); try { return blob.DownloadText(); } catch (Exception) { // The blob probably does not exist or there is no connection available return null; } }
54
Application Code - Blobs private void SaveToCloudButton_Click(object sender, RoutedEventArgs e) { StringBuilder buff = new StringBuilder(); buff.AppendLine("LastName,FirstName,Email,Birthday,NativeLanguage,FavoriteIceCream,YearsInPhD,Graduated"); foreach (AttendeeEntity attendee in attendees) { buff.AppendLine(attendee.ToCsvString()); } blobHelper.WriteStringToBlob("SummerSchoolAttendees.txt", buff.ToString()); }
The blob is now available at: http://.blob.core.windows.net// Or in this case: http://jjstore.blob.core.windows.net/school/SummerSchoolAttendees.txt
55
Code - TableEntities using Microsoft.WindowsAzure.StorageClient; public class AttendeeEntity : TableServiceEntity { public string FirstName { get; set; } public string LastName { get; set; } public string Email { get; set; } public DateTime Birthday { get; set; } public string FavoriteIceCream { get; set; } public int YearsInPhD { get; set; } public bool Graduated { get; set; } }
56
Code - TableEntities public void UpdateFrom(AttendeeEntity other) { FirstName = other.FirstName; LastName = other.LastName; Email = other.Email; Birthday = other.Birthday; FavoriteIceCream = other.FavoriteIceCream; YearsInPhD = other.YearsInPhD; Graduated = other.Graduated; UpdateKeys(); } public void UpdateKeys() { PartitionKey = "SummerSchool"; RowKey = Email; }
57
Code TableHelper.cs public class TableHelper { private CloudTableClient client = null; private TableServiceContext context = null; private Dictionary allAttendees = null; private string tableName = "Attendees"; private CloudTableClient Client { get { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudTableClient(); return client; } } private TableServiceContext Context { get { if (context == null) context = Client.GetDataServiceContext(); return context; } } }
58
Code TableHelper.cs private void ReadAllAttendees() { allAttendees = new Dictionary(); CloudTableQuery query = Context.CreateQuery(tableName).AsTableServiceQuery(); try { foreach (AttendeeEntity attendee in query) { allAttendees[attendee.Email] = attendee; } } catch (Exception) { // No entries in table - or other exception } }
59
Code TableHelper.cs public void DeleteAttendee(string email) { if (allAttendees == null) ReadAllAttendees(); if (!allAttendees.ContainsKey(email)) return; AttendeeEntity attendee = allAttendees[email]; // Delete from the cloud table Context.DeleteObject(attendee); Context.SaveChanges(); // Delete from the memory cache allAttendees.Remove(email); }
60
Code TableHelper.cs public AttendeeEntity GetAttendee(string email) { if (allAttendees == null) ReadAllAttendees(); if (allAttendees.ContainsKey(email)) return allAttendees[email]; return null; }
Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables
61
Pseudo Code TableHelper.cs public void UpdateAttendees(List updatedAttendees) { foreach (AttendeeEntity attendee in updatedAttendees) { UpdateAttendee(attendee, false); } Context.SaveChanges(SaveChangesOptions.Batch); } public void UpdateAttendee(AttendeeEntity attendee) { UpdateAttendee(attendee, true); } private void UpdateAttendee(AttendeeEntity attendee, bool saveChanges) { if (allAttendees.ContainsKey(attendee.Email)) { AttendeeEntity existingAttendee = allAttendees[attendee.Email]; existingAttendee.UpdateFrom(attendee); Context.UpdateObject(existingAttendee); } else { Context.AddObject(tableName, attendee); } if (saveChanges) Context.SaveChanges(); }
62
Application Code Cloud Tables private void SaveButton_Click(object sender, RoutedEventArgs e) { // Write to table tableHelper.UpdateAttendees(attendees); }
Thats it! Now your tables are accessible using REST service calls or any cloud storage tool.
63
Tools Fiddler2
Best Practices
Picking the Right VM Size
Having the correct VM size can make a big difference in costs
Fundamental choice larger, fewer VMs vs. many smaller instances
If you scale better than linear across cores, larger VMs could save you money
Pretty rare to see linear scaling across 8 cores.
More instances may provide better uptime and reliability (more failures needed to take your service down)
Only real right answer experiment with multiple sizes and instance counts in order to measure and find what is ideal for you
Using Your VM to the Maximum
Remember: 1 role instance == 1 VM running Windows.
1 role instance != one specific task for your code
Youre paying for the entire VM so why not use it?
Common mistake split up code into multiple roles, each not using up CPU.
Balance between using up CPU vs. having free capacity in times of need.
Multiple ways to use your CPU to the fullest
Exploiting Concurrency
Spin up additional processes, each with a specific task or as a unit of concurrency.
May not be ideal if number of active processes exceeds number of cores
Use multithreading aggressively
In networking code, correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads
In .NET 4, use the Task Parallel Library
Data parallelism
Task parallelism
Finding Good Code Neighbors
Typically code falls into one or more of these categories:
Find code that is intensive with different resources to live together
Example: distributed network caches are typically network- and memory-intensive; they may be a good neighbor for storage IO-intensive code
Memory Intensive
CPU Intensive
Network IO Intensive
Storage IO Intensive
Scaling Appropriately
Monitor your application and make sure youre scaled appropriately (not over-scaled).
Spinning VMs up and down automatically is good at large scale.
Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running.
Being too aggressive in spinning down VMs can result in poor user experience.
Trade-off between risk of failure/poor user experience due to not having excess capacity and the costs of having idling VMs.
Performance Cost
Storage Costs
Understand an applications storage profile and how storage billing works
Make service choices based on your app profile
E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction.
Service choice can make a big cost difference based on your app profile
Caching and compressing. They help a lot with storage costs.
Saving Bandwidth Costs
Bandwidth costs are a huge part of any popular web apps billing profile
Sending fewer things over the wire often means getting fewer things from storage
Saving bandwidth costs often lead to savings in other places
Sending fewer things means your VM has time to do other tasks
All of these tips have the side benefit of improving your web apps performance and user experience
Compressing Content
1. Gzip all output content
All modern browsers can decompress on the fly.
Compared to Compress, Gzip has much better compression and freedom from patented algorithms
2.Tradeoff compute costs for storage size
3.Minimize image sizes
Use Portable Network Graphics (PNGs)
Crush your PNGs
Strip needless metadata
Make all PNGs palette PNGs
Uncompressed
Content
Compressed
Content
Gzip
Minify JavaScript
Minify CCS
Minify Images
Best Practices Summary
Doing less is the key to saving costs
Measure everything
Know your application profile in and out
Research Examples in the Cloud
on another set of slides
Map Reduce on Azure
Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web Hadoop implementation Hadoop has a long history and has been improved for stability Originally Designed for Cluster Systems
Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure Designed from the start to use cloud primitives Built-in fault tolerance REST based interface for writing your own clients
76
Project Daytona - Map Reduce on Azure
http://research.microsoft.com/en-us/projects/azure/daytona.aspx
77
Questions and Discussion
Thank you for hosting me at the Summer School
BLAST (Basic Local Alignment Search Tool)
The most important software in bioinformatics
Identify similarity between bio-sequences
Computationally intensive
Large number of pairwise alignment operations
A BLAST running can take 700 ~ 1000 CPU hours
Sequence databases growing exponentially
GenBank doubled in size in about 15 months.
It is easy to parallelize BLAST
Segment the input
Segment processing (querying) is pleasingly parallel
Segment the database (e.g., mpiBLAST)
Needs special result reduction processing
Large volume data
A normal Blast database can be as large as 10GB
100 nodes means the peak storage bandwidth could reach to 1TB
The output of BLAST is usually 10-100x larger than the input
Parallel BLAST engine on Azure
Query-segmentation data-parallel pattern split the input sequences
query partitions in parallel
merge results together when done
Follows the general suggested application model Web Role + Queue + Worker
With three special considerations Batch job management
Task parallelism on an elastic Cloud
Wei Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in Proceedings of the 1st Workshop
on Scientific Cloud Computing (Science Cloud 2010), Association for Computing Machinery, Inc., 21 June 2010
A simple Split/Join pattern
Leverage multi-core of one instance argument a of NCBI-BLAST
1,2,4,8 for small, middle, large, and extra large instance size
Task granularity Large partition load imbalance
Small partition unnecessary overheads NCBI-BLAST overhead
Data transferring overhead.
Best Practice: test runs to profiling and set size to mitigate the overhead
Value of visibilityTimeout for each BLAST task, Essentially an estimate of the task run time.
too small repeated computation;
too large unnecessary long period of waiting time in case of the instance failure. Best Practice:
Estimate the value based on the number of pair-bases in the partition and test-runs
Watch out for the 2-hour maximum limitation
BLAST task
Splitting task
BLAST task
BLAST task
BLAST task
Merging Task
Task size vs. Performance
Benefit of the warm cache effect
100 sequences per partition is the best choice
Instance size vs. Performance
Super-linear speedup with larger size worker instances
Primarily due to the memory capability.
Task Size/Instance Size vs. Cost
Extra-large instance generated the best and the most economical throughput
Fully utilize the resource
Web
Portal
Web
Service
Job registration
Job Scheduler
Worker
Worker
Worker
Global
dispatch
queue
Web Role
Azure Table
Job Management Role
Azure Blob
Database
updating Role
Scaling Engine
Blast databases,
temporary data, etc.)
Job Registry NCBI databases
BLAST task
Splitting task
BLAST task
BLAST task
BLAST task
Merging Task
ASP.NET program hosted by a web role instance Submit jobs
Track jobs status and logs
Authentication/Authorization based on Live ID
The accepted job is stored into the job registry table Fault tolerance, avoid in-memory states
Web Portal
Web Service
Job registration
Job Scheduler
Job Portal
Scaling Engine
Job Registry
R. palustris as a platform for H2 production Eric Shadt, SAGE Sam Phattarasukol Harwood Lab, UW
Blasted ~5,000 proteins (700K sequences) Against all NCBI non-redundant proteins: completed in 30 min
Against ~5,000 proteins from another strain: completed in less than 30 sec
AzureBLAST significantly saved computing time
Discovering Homologs Discover the interrelationships of known protein sequences
All against All query The database is also the input query
The protein database is large (4.2 GB size) Totally 9,865,668 sequences to be queried
Theoretically, 100 billion sequence comparisons!
Performance estimation Based on the sampling-running on one extra-large Azure instance
Would require 3,216,731 minutes (6.1 years) on one desktop
One of biggest BLAST jobs as far as we know This scale of experiments usually are infeasible to most scientists
Allocated a total of ~4000 instances 475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe
8 deployments of AzureBLAST Each deployment has its own co-located storage service
Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution
Each segment consists of smaller partitions
When load imbalances, redistribute the load manually
5
0
62 6
2 6
2 6
2 6
2 5
0 62
Total size of the output result is ~230GB
The number of total hits is 1,764,579,487
Started at March 25th, the last task completed on April 8th (10 days compute)
But based our estimates, real working instance time should be 6~8 day
Look into log data to analyze what took place
5
0
62 6
2 6
2 6
2 6
2 5
0 62
A normal log record should be
Otherwise, something is wrong (e.g., task failed to complete)
3/31/2010 6:14 RD00155D3611B0 Executing the task 251523...
3/31/2010 6:25 RD00155D3611B0 Execution of task 251523 is done, it took 10.9mins
3/31/2010 6:25 RD00155D3611B0 Executing the task 251553...
3/31/2010 6:44 RD00155D3611B0 Execution of task 251553 is done, it took 19.3mins
3/31/2010 6:44 RD00155D3611B0 Executing the task 251600...
3/31/2010 7:02 RD00155D3611B0 Execution of task 251600 is done, it took 17.27 mins
3/31/2010 8:22 RD00155D3611B0 Executing the task 251774...
3/31/2010 9:50 RD00155D3611B0 Executing the task 251895...
3/31/2010 11:12 RD00155D3611B0 Execution of task 251895 is done, it took 82 mins
North Europe Data Center, totally 34,256 tasks processed
All 62 compute nodes lost tasks
and then came back in a group.
This is an Update domain
~30 mins
~ 6 nodes in one group
35 Nodes experience blob
writing failure at same
time
West Europe Datacenter; 30,976 tasks are completed, and job was killed
A reasonable guess: the
Fault Domain is working
MODISAzure : Computing Evapotranspiration (ET) in The Cloud
You never miss the water till the well has run dry Irish Proverb
ET = Water volume evapotranspired (m3 s-1 m-2)
= Rate of change of saturation specific humidity with air temperature.(Pa K-1)
v = Latent heat of vaporization (J/g)
Rn = Net radiation (W m-2)
cp = Specific heat capacity of air (J kg-1 K-1)
a = dry air density (kg m-3)
q = vapor pressure deficit (Pa)
ga = Conductivity of air (inverse of ra) (m s-1)
gs = Conductivity of plant stoma, air (inverse of rs) (m s-1)
= Psychrometric constant ( 66 Pa K-1)
Estimating resistance/conductivity across a
catchment can be tricky
Lots of inputs: big data reduction
Some of the inputs are not so simple
= +
( + 1 + )
Penman-Monteith (1964)
Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration, or evaporation through plant membranes, by plants.
NASA MODIS
imagery source
archives
5 TB (600K files)
FLUXNET curated
sensor dataset
(30GB, 960 files)
FLUXNET curated
field dataset
2 KB (1 file)
NCEP/NCAR
~100MB
(4K files)
Vegetative
clumping
~5MB (1file)
Climate
classification
~1MB (1file)
20 US year = 1 global year
Data collection (map) stage
Downloads requested input tiles from NASA ftp sites
Includes geospatial lookup for non-sinusoidal tiles that will
contribute to a reprojected
sinusoidal tile
Reprojection (map) stage
Converts source tile(s) to intermediate result sinusoidal tiles
Simple nearest neighbor or spline algorithms
Derivation reduction stage
First stage visible to scientist
Computes ET in our initial use
Analysis reduction stage
Optional second stage visible to scientist
Enables production of science analysis artifacts such as maps,
tables, virtual sensors
Reduction #1
Queue
Source
Metadata
AzureMODIS
Service Web Role Portal
Request
Queue
Scientific
Results
Download
Data Collection Stage
Source Imagery Download Sites
. . .
Reprojection
Queue
Reduction #2
Queue
Download
Queue
Scientists
Science results
Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage
http://research.microsoft.com/en-us/projects/azure/azuremodis.aspx
ModisAzure Service is the Web Role front door Receives all user requests
Queues request to appropriate Download, Reprojection, or Reduction Job Queue
Service Monitor is a dedicated Worker Role Parses all job requests into tasks
recoverable units of work
Execution status of all jobs and tasks persisted in Tables
Request
JobStatus
Persist Job Queue
MODISAzure Service
(Web Role)
Service Monitor
(Worker Role)
Parse & Persist TaskStatus
Dispatch
Task Queue
All work actually done by a Worker Role
Sandboxes science or other executable
Marshalls all storage from/to Azure blob storage to/from local Azure Worker instance files
Service Monitor
(Worker Role)
Parse & Persist TaskStatus
GenericWorker
(Worker Role)
Dispatch
Task Queue
Data Storage
Dequeues tasks created by the Service Monitor
Retries failed tasks 3 times
Maintains all task status
Reprojection Request
Service Monitor
(Worker Role)
ReprojectionJobStatus Persist
Parse & Persist ReprojectionTaskStatus
GenericWorker
(Worker Role)
Job Queue
Dispatch
Task Queue
Points to
ScanTimeList
SwathGranuleMeta
Reprojection Data
Storage
Each entity specifies a
single reprojection job
request
Each entity specifies a
single reprojection task (i.e.
a single tile)
Query this table to get
geo-metadata (e.g.
boundaries) for each swath
tile
Query this table to get the
list of satellite scan times
that cover a target tile
Swath Source
Data Storage
Computational costs driven by data scale and need to run reduction multiple times
Storage costs driven by data scale and 6 month project duration
Small with respect to the people costs even at graduate student rates !
Reduction #1 Queue
Source Metadata
Request Queue
Scientific Results Download
Data Collection Stage
Source Imagery Download Sites
. . .
Reprojection Queue
Reduction #2 Queue
Download
Queue
Scientists
Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage
400-500 GB
60K files
10 MB/sec
11 hours
Clouds are the largest scale computer centers ever constructed and have the
potential to be important to both large and small scale science problems.
Equally import they can increase participation in research, providing needed
resources to users/communities without ready access.
Clouds suitable for loosely coupled data parallel applications, and can
support many interesting programming patterns, but tightly coupled low-
latency applications do not perform optimally on clouds today.
Provide valuable fault tolerance and scalability abstractions
Clouds as amplifier for familiar client tools and on premise compute.
Clouds services to support research provide considerable leverage for both
individual researchers and entire communities of researchers.
Day 2 - Azure as a PaaSDay 2 - Applications