About Herdy
• Senior Software Engineer atCitadel Technology Solutions (Singapore)
• The eternal student
• Find me on the internet:_hhandokohhandokohhandokohttps://au.linkedin.com/in/herdyhandoko
Presentation Overview
• Problem Domains
• Mesos Fundamentals
• Mesos Frameworks
• Mesos in the Real-World™
• Demo!
Image source: https://mesosphere.com/wp-content/uploads/2015/04/dcossdashboard.jpg
Once Upon a Tweet
• I’ve heard of:• LAMP
• WIMP
• MEAN
• But what is SMACK?
Source: https://twitter.com/theotown/status/643377504527495168
Mesos in One Paragraph
Apache Mesos abstracts CPU, memory, storage, and other
compute resources away from machines (physical or virtual),
enabling fault-tolerant and elastic distributed systems to easily be
built and run effectively.
Image source: https://mesosphere.com/wp-content/themes/mesosphere/library/images/views/why-mesos/mesos-architecture.png
Mesos in One Sentence
Operations / DevOps
Next-GenerationCluster Manager
Developers / Data Scientist
Distributed Systems SDK
Mesos in One Sentence (cont’d)
Datacentre timesharing
Image source: http://www.computersciencelab.com/ComputerHistory/HtmlHelp/Images2/IBM7094.jpg
Problem Domain: Static Partitioning
• Many and complex provisioning scripts
• ‘Snowflake’ servers
• No automated failure handling
• Repartition takes hours or days
Problem Domain: Resource Management
• Low utilisation rate (i.e. waste)
• Hard to predict workload
• Application performance jitter
• Scale and capacity are coupled
Image source: http://www.slideshare.net/mesosphere/apache-mesos-and-mesosphere-live-webcast-by-ceo-and-cofounder-florian-li
The Inspiration: Google Borg
• ’Top Secret’ orchestration system (in use since ~2004)
• Efficiently parcels work across Google’s vast fleet of computer servers
• Google is building Omega (Borg vNext)
Source: https://www.wired.com/2013/03/google-borg-twitter-mesos/all/
The Birth of Apache Mesos
• A research project at the University of California Berkeley• Hindman’s initial ideas from working with
many-cores Intel processor (64 – 128 cores)
• Hindman teamed up with Kowinski and Zaharia who was working on software platform that work on massive data centres
• Twitter took a keen interest and further developed Mesos (as an open-source project)
• Becomes an Apache project in 2013
Source: https://www.wired.com/2013/03/google-borg-twitter-mesos/all/
Mesos Architecture
• ZooKeeper coordinate master nodes and elect leader
• Mesos master manage agents and schedule Tasks
• Mesos agents make Offers and run Tasks
Key Concepts
• Frameworks• Mesos understands the technical
primitives of distributed computing but have no intelligence on how to do it
• Frameworks tell Mesos (kernel) how to run the applications
• A framework comprises of Scheduler and Executor
• Resource offers• Agents advertise available resources• Offers can contain user-defined
attributes
• Resource isolation via LXC
• Resource allocation• Roles• Weights• Resource Reservations
Two-tier Scheduling
1. Agents offer resources
2. Allocator decides where to offer the resources
3. Framework may accept an offer and execute a task in an agent, or
4. Framework may reject the offer and it will be passed along
General Purpose Framework: Marathon
• Container and framework orchestration platform
• Runs long running services (`init.d`), e.g. web applications
• Features• High availability (active / passive)• Service discovery & load balancing• Health checks• Event subscription• REST API
Image source: https://mesosphere.com/wp-content/themes/mesosphere/library/images/assets/continuous-deployment/marathon2.png
General Purpose Framework: Chronos
• Fault-tolerant jobs scheduler for Mesos
• Distributed `cron`
• Features• Distributed and fault-tolerant• Supports bash and custom executor• Schedules based on ISO8601
repeating interval notation• Handles jobs dependencies
Image source: https://mesos.github.io/chronos/img/chronos_ui-1.png
Framework: Aurora
• Service orchestration framework
• Functionality-wise, combined Marathon + Chronos, and so much more
• Twitter wanted an all-in-one framework for total control
Image source: http://aurora.apache.org/documentation/latest/images/components.png
BYO Framework
• Existing frameworks provide good coverage of most use cases (80/20)• Hadoop: Batch processing
• Storm: Stream processing
• Chronos: Task scheduling
• Marathon / Aurora: long-running services
Mesos and Mesosphere
Mesos is the name of the open-source Apache project
Mesosphere (Mesosphere Inc.) is the company which commercializes the open source project and provides consulting services
Demo Resources
• DC/OS Installation Instructions:https://dcos.io/docs/1.7/administration/installing/cloud/packet/
• Packet Hosting:https://www.packet.net
• Hashicorp’s Terraform:https://www.hashicorp.com/terraform.html
• Mesosphere Tweeter App:https://github.com/mesosphere/tweeter
Predictive Scheduler: Quasar
• Resource efficient and QoS-aware cluster manager
• Uses fast classification techniques in Machine Learning to profile workloads
Image source: http://regmedia.co.uk/2014/02/27/quasar.jpg
Mesos on Windows
• Mesosphere is working with Microsoft to port Apache Mesosto work with Windows Servers
• Platform-specific tasks will be run on the supported nodes
Image source: https://media.licdn.com/mpr/mpr/shrinknp_800_800/AAEAAQAAAAAAAARRAAAAJGVhZTYyODFlLTVkZmMtNGUzMi05MzRlLTcyZWVlZmE1YTU2MA.jpg
Fit for Purpose?
Good Fit
• Stateless systems• Web applications
• Spark
• Hadoop
• Distributed systems• Cassandra
Poor Fit
• Stateful systems*• Relational Database
*Note: Support for persistent storage volumes is under active development
Whitepapers
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S. and Stoica, I., 2011, March. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI (Vol. 11, pp. 22-22).
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E. and Wilkes, J., 2015, April. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (p. 18). ACM.