Season 2 Episode 1March 12, 2014
Evening Outline
Lightning Talks:- S3mper- PigPen- STAASH- Dynomite- Aegisthus- Suro- Zeno
- Lipstick on GCE- AnsWerS- IBM- Coursera
41 projects… Now what?
● Cohesive platform
● Workshops / Training / Documentation
● Participate and contribute : [email protected]
Lightning talks
Lipstick, Hadoop, andBig Data on the Google CloudMatt BookmanSolutions Architect
Google Confidential and Proprietary
Google Compute Engine - VMs in Google Datacenters
● Public Preview - May 2013
● General Availability - December 2013
Google Confidential and Proprietary
Demo (Summer 2013): Pig on Compute Engine
Sweet demo!
Google Confidential and Proprietary
Netflix OSS Meetup - July 17, 2013
Google Confidential and Proprietary
Lipstick - Providing insights
Google Confidential and Proprietary
Google Confidential and Proprietary
Hadoop on GCE + Cloud Storage (GCS) Connector
Accenture:Cloud vs. Bare-Metal
● Cloud-based Hadoop deployments offer better price-performance ratios than bare-metal
● Cloud’s virtualization expands performance-tuning opportunities
● Using remote storage outperforms local disk HDFS
Google Confidential and Proprietary
Data in GCS, Lipstick DB in Cloud SQLGoogle Cloud Platform
Output Data
Lipstick Database
Hadoop Master
MapReduce JobTracker
Hadoop Worker
MapReduce TaskTrackerHadoop Worker
MapReduce TaskTrackerHadoop Worker
MapReduce TaskTrackerLipstick Server
Input Data
Google Confidential and Proprietary
● Netflix Lipstick on Google Compute Enginehttps://cloud.google.com/developers/articles/netflix-lipstick-on-google-compute-engine
● GCS Connector for Hadoophttps://developers.google.com/hadoop/google-cloud-storage-connector
● Cloud-based Hadoop Deployments: Benefits and Considerationshttp://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Cloud-Based-Hadoop-Deployments-Benefits-and-Considerations.pdf
● Apache Hadoop, Hive, and Pig on Google Compute Enginehttps://cloud.google.com/developers/articles/apache-hadoop-hive-and-pig-on-google-compute-engine
Resources
Google Confidential and Proprietary
Thank you
@Answers4AWS
Cloud Prize and Beyond
Peter Sankauskas@pas256
@Answers4AWS
March 2013
@Answers4AWS
First idea
• AsgardFormation
• CloudFormation for Asgard
@Answers4AWS
@Answers4AWS
@Answers4AWS
Requirements
• AsgardFormation
• Asgard running
• AWS Credentials
• IAM user
• Policy
• Security Group
• EC2 instance
• Asgard downloaded and configured
• Tomcat downloaded and configured
• Java downloaded and installed
• Linux configured
@Answers4AWS
@Answers4AWS
Asgard playbook
• Base
• Install usual Linux packages
• Basic system hardening and security packages
• Oracle Java 7
• Tomcat 7
• Asgard
• Latest release from GitHub
• Add BASIC authentication
@Answers4AWS
Other playbooks
• Eureka
• Edda
• Simian Army
• Ice
• Aminator
• Genie
@Answers4AWS
AMIs
• Initially built using my own scripts based on Eric Hammond’s (@esh) work
• Then using Aminator
• Created Ubuntu Foundation AMIs
• Added the Ansible Provisioner for Aminator
• Put a couple of them on the AWS Marketplace for free
@Answers4AWS
CloudFormation
• One-click deploy
• Well, about 10 going through the AWS Web Console wizard
• Designed to get you up and running quickly
• Test it out, see if you like it
• NOT production quality
• No real security
• No HA
• No scalability
@Answers4AWS
@Answers4AWS
What’s next?
@Answers4AWS
Do you do this? (this is not my slide)
@Answers4AWS
@Answers4AWS
Beta users
• From a successful CI build
• To a Fully Baked AMI
• Use in Testing and Production
• Without you doing anything
• ZERO clicks
• Signups are open
bakery.answersforaws.com
@Answers4AWS
Thank you
http://bakery.answersforaws.com/
See me at the demo station
Peter Sankauskas@pas256
IBM Scalable Services Fabricfor Netflix S2E1 Meetup
Andrew Spyker
@aspyker
History and Future
2012
SPECjEnterprise
2013
AcmeAir RunOn IBM Cloud at“Web Scale”
2014
Scalable ServicesFabric internally for
IBM Services
Scalable ServicesFabric SaaS and On-Prem?
Sample applicationcloud prize work
AcmeAir Cloud/MobileSample/Benchmark born
Codename: BlueMix
Portability cloudprize work
Scalable Service Fabric WorkNetflix OSS IBM port/enablement
Netflix “Zen” of Cloud • Worked with initial services to enable cloud native arch• Worked with initial services to enable NetflixOSS usages• Created scorecard and tests for “cloud native” readiness
Highly Available IaaS and Cloud Services
• Deployment across multiple IBM SoftLayer IaaS datacenters and global and local load balancers
• Complete automation via IBM SoftLayer IaaS API’s• Ensured facilities for automatic failure recovery
Micro-service Runtimes (Karyon, Eureka Client, Ribbon, Hystrix, Archaius)
• Ported to work with IBM SoftLayer IaaS and on the WebSphere Liberty Profile application server
• Created “eureka-sidecar” for non-Java runtimes and ElasticSearch discovery
Netflix OSS Servers(Asgard, Eureka Server, Turbine)
• Ported to work with IBM SoftLayer IaaS + RightScale• Operationalized HA and secure deployments for multiple service tenants
Adopted Chaos Testing • Ported Chaos Monkey to IBM SoftLayer IaaS• Performed manual Chaos Gorilla validation on services
Worked through devops tool chain
• Worked with initial services to enable continuous delivery with devops (and imagine baking via Animator like tool)
Come meet the team!Looks like … Tweets from … Talks about …
Adolfo @adolforod API Management and Cloud Integration, user of NetflixOSS platform. Appliances in the cloud.
Brian @bkmartin IBM BlueMix (PaaS), enabling composable apps in PaaS
Darrell IBM Research focusing on NetflixOSS devops and on-premise deployments
David @dcurrie WebSphere Liberty Profile application server NetflixOSS development and PaaS integration
Jonathan @ma4jpb NetflixOSS portability across many aspectsCloud messaging (in relation to Suro)
Matt @matrober API Management, user of NetflixOSS platformConverted service to be cloud native
Rachel @rreinitz IBM Services, interested in helping you get to this cloud native in SaaS and on-premise
Ricky @rickymoorhouse API Management, user of NetflixOSS platformCreator of Imaginator
Will @auwilli98 API Management operations, user of NetflixOSS platform
Priam + Aegisthus@CourseraNetflixOSS Meetup
Introduction@DanielChiaJH
Software Engineer, Infrastructure Team
Coursera
Overview• Philosophy
• Priam
• Aegisthus
• Conclusion
Philosophy• Architecture Patterns
• Use what we can
• Incorporate the spirit of others
Priam – Wins• Token Management
• S3 Backup + Restore
• Config
Priam – Next Steps?• SimpleDB -> DynamoDB
• Backups blow out OS disk buffer cache
• Compatibility with newer C* versions
Aegisthus - Wins• Novel workflow
• Data reduced to one authoritative copy
• Possibility for incremental jobs
Aegisthus – What Next?• C* 1.2 / 2.0
• CQL3
• Priam <–> Aegisthus
• Better compressed SSTable support
Conclusion• Come chat with me!
• Especially if you have similar goals to me
Zeno
● In-memory data distribution platform
● Contains tools for:○ data quality management○ data serialization
● We use it to distribute and keep up to date gigabytes of video metadata on tens of thousands of servers across the globe
ZenoWhy in-memory data?
- Netflix serves billions of requests per day
- Each request requires metadata about many movies to answer
Zeno
Netflix Use Case:● Gigabytes of in-memory data
● Hundreds of thousands of in-memory cache requests per second, per application instance
● Tens of thousands of application instances
DistributionFastBlob:
Binary serialization of a complete state of data, and/or the changes in data over time.
Serialization format designed to propagate, and keep up to date, a large amount of in-memory data across many servers.
Optimized for: memory GC effects, memory footprint, data transfer size, deserialization CPU usage
Data QualityDiff Reports - inspect data changes between releases
Data QualityDiff History - inspect changes in data over time
Zeno Framework
Data Schema (Serializers)
Operation (SerializationFramework)
Input Data (POJOs)
Output
Zeno Framework
Data Schema (Serializers)
Operation (SerializationFramework)
Input Data (POJOs)
Output
JsonSerializationFramework
HashSerializationFramework
DiffSerializationFramework
FastBlobStateEngine
Zeno BenefitsDevelopment Agility:● Easy to evolve data model, no need to change serialization formats or
operation logic● Easy to create new functionality, no need to think about data model
structure or semantics● Included “Diff” tools support high data quality across releases without too
much effort
Resource Efficiency:● Included “FastBlob” optimized for Netflix scale● Ask about in-development functionality!
Suro
To Be Processed in Different Ways
A Simple Solution That Supports All These
STAASHSTorage As A Service over Http
STAASH
STAASH● Storage-Agnostic● Language-Agnostic● REST Interface to data● Pattern Automation / Aware End Points● Wrapper Around Astyanax Recipes● Possibilities: Auditing, Cascading CL, Replication
across multiple storages, MapReduce …...many more..
Dynomite!!
Dynomite
● Cross AZ & Region replication to existing Key Value stores○ memcached○ Reddis
● Thin Dynamo implementation provides the replication
● Keep existing native KV protocol○ No code refactoring
Dynomite
Dynomite
memcached
Dynomite
memcached
App 1
AZ 1 AZ 2
What do all those events mean?
{“deviceid”: 12345, “action”: “played”, “titleid”: 99999}
{“deviceid”: 12345, “action”: “played”, “titleid”: 99999}
Device C*
12345: “PS3”
{“deviceid”: 12345, “action”: “played”, “titleid”: 99999}
Device C*
12345: “PS3”
Content C*
99999: “HOC”
Don’t hurt production/our customers
Device/Content C*
“My Devices”: {“PS3:HOC”:”12345:99999”}
?!?!?
Sometimes you just want all the data
C* Priam S3SSTables
S3SSTables
Move to HDFS*
Convert to JSON
Compact Rows
S3JSON
● A splittable input format for SSTables○ Need less files from the cluster.○ Faster - just deserializing/serializing the files.
● An input format for the JSON○ Allow incremental processing of backups
● A reducer that can compact SSTables.
Big Data Platform
Eventual Consistency
Focus on Performance
● Get your job running faster
● Understand why it was slow
● Transition to Hadoop 2