Tom Hite
DEV2858BU
#VMworld #DEV2858BU
The Shift to the Left: The Changing Role of Operations as Developers in a DevOps World
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
2#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
#DEV2858BU CONFIDENTIAL 3
1 Across Data Centers and Clouds – Framing the Problem
2Resolutions – All the Ops (SecOps, ClusterOps, xxOps . . . Ops):
Ops Shift to the Left (or is it Developer Shift to the Right?)
3 DevOps -- AppDev => AppOps => Ops == Great Cloud Operations
4 Quick RecapVMworld 2017 Content: Not fo
r publication or distri
bution
What Are We Discussing Today?
• ClusterOps
• DevOps
• DevSecOps(Really? Do we honestly need all three in a word?)
• PlatformOps
• SecOps
• xxOps (you name the ‘xx’ and there’ll be someone using it in print).
• Site Reliability Engineering Teams
• In Short, xxOps (as the ‘aggregation term for all of the above’) and how it aligns with secure, scalable, resilient and SLA/SLO honoring Multi-Cloud Operations
#DEV2858BU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
The Goal –Business Functionality: From Days & Months to Minutes
5
Developer Operator
App Deployment: 30-90 seconds
Rapid application deployment
Repeatable and scalable
APIs are everything!
Full Service Deployment: 1-15 Mins
Operate at five nines
Publish APIs / apps / services to production• Capacity on demand
• Operate at Scale: AuthZ/AuthN, NetSecOps,
ClusterOps, Governance/Policy, Storage . . .
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Next Gen Apps We See Today
#DEV2858BU CONFIDENTIAL 6
Greater agility in Software Dev / Deployments, Predictive Analytics, Drug/Chemical Discovery, Fraud Detection, Penetration & Net Breach Detection, Continuous Customer Engagement . . .
Ingestion Apps and Services
End User Apps and Services
Platform Services
Info
Valu
e
Machin
e
Learn
ing
DN
N
Tra
inin
g
Filt
ering
& M
inin
g
VMworld 2017 Content: Not fo
r publication or distri
bution
What That Looks Like in Action
#DEV2858BU CONFIDENTIAL 7
PlatformOps& AppDev
Application Release Automation
Automation & Orchestration Platforms
Promote & Deploy
Test Commits
Reviewers & Stakeholders
Test BuildsBuild/Test
Automation
OpsAdmins
OpsAdmins
VMworld 2017 Content: Not fo
r publication or distri
bution
Because Cross Cloud is Now the Norm – We Unify There
#DEV2858BU CONFIDENTIAL 8
Private Clouds
Expanding Our Leadership in Private CloudTransforming the data center with SDDC
1
Extending SDDC into the Public Cloud vCloud Air and vCloud Air Network and SDDC-as-a-Service
2
Multi-Cloud and Multi-Device StrategyConnectivity, security and visibility through NSX, vRealize, and AirWatch Leading to a new control plane
3
Public CloudsManaged Clouds
VMworld 2017 Content: Not fo
r publication or distri
bution
IBM Clouds
The Problem with Software Delivery – What Really Goes On
#DEV2858BU CONFIDENTIAL 9
AWS Clouds
On-Prem
Azure Clouds
IT & Operations
AppDevs
Data Scientists
???
VMworld 2017 Content: Not fo
r publication or distri
bution
What We End up with as ‘Cross-Cloud In Production’
10
Customer Data
Center
Management
(vCenter Server) VMware Cloud™ on AWSPowered by VMware Cloud Foundation
Access to all AWS Services
Amazon
EC2
Amazon
S3
Amazon
ML
AWS Direct
Connect
AWS IAMAWS IoT
…
…
…
…
vCenter Server
AWS Global InfrastructureAWS Global Infrastructure
vRealize Suite, PowerCLI
Amazon
Kenesis
AWS
Lambda
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
What Happens When It All Goes Wrong?
AppDev
Data Science
OpsClusterOps
DevSecOpsSecOpsxxOps
Example: Request timeouts are driving shopping cart abandonment.
DevOps
Conflict
#DEV2858BU CONFIDENTIAL 11
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
12
1 Across Data Centers and Clouds – Framing the Problem
2Resolutions – All the Ops (SecOps, ClusterOps, xxOps . . . Ops):
Ops Shift to the Left (or is it Developer Shift to the Right?)
3 DevOps -- AppDev => AppOps => Ops == Great Cloud Operations
4 Quick Recap
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Resolutions – Apparently the Slew of xxOps Roles Fixes All?
• SecOps:
– “[S[eamless collaboration between IT Security and IT Operations to effectively mitigate risk.” Source: BMC Software, http://www.bmc.com/it-solutions/secops-security-operations.html
– Why? “[P]roblems arise because the individual goals of these two groups are often misaligned, thanks to conflicting responsibilities and different metrics for evaluating and rewarding successful performance.” Forbes Insights, “Deeper Look At Closing The 'SecOpsGap’”, https://www.forbes.com/sites/forbesinsights/2016/02/05/a-deeper-look-at-closing-the-secops-gap/#594bbb8951d4
• ClusterOps:
– “Promoting operability and interoperability of [Kubernetes] clusters.”
– Why? The world needs processes and “tools to enable companies to treat their systems as a single entity, orchestrating various container technologies across clusters in hybrid clouds, federating them together on a grand scale.” Technoblogic.(io), “Post-DevOps: ClusterOps”, http://technoblogic.io/blog/2016/12/12/post-devops-clusterops/
• DevSecOps
– “DevSecOps is like DevOps, but with security principles baked in.” ZDNet, “DevSecOps: What it is and how it can help you innovate in cybersecurity”, http://www.zdnet.com/article/devsecops-what-it-is-and-how-it-can-help-you-innovate-in-cybersecurity/
– Why? Security threats are not slowing down. Programmatic updates are simply reality to cope
• StorageOps, NetOps . . . enough, you name the xxOps and there’ll be one, I’m sure
#DEV2858BU CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
OP
ER
AT
ING
MO
DE
L M
AT
UR
ITY
IT Value Model – Key Profiles
BUSINESS RELATIONSHIP
INFRASTRUCTURE PROVIDER
“Consolidate, virtualize, outsource resources
to cloud “
FUTURE-PROOF YOUR CLOUD
“Deliver the IT Defined Workspace“
BUSINESS PARTNER
“Provide on-demand business aligned
applications and services with quality, reducing costs
and increasing agility.”
ACCELERATEINNOVATION
“Provide employees with a User Centric Workspace”
DIGITALENTERPRISE
“IT and business convergence for generating
demand and increasing growth. Developing
and delivering digital business models “
NEW BUSINESS MODELS
“Leverage the Digital Workspace for new insights and delivery
capabilities”
Cloud Provider Cloud Operator (SRE/DevOps) Multi-Cloud, IoT
VMworld 2017 Content: Not fo
r publication or distri
bution
There’s More To It than xxOps Cloud Service Operators
15
Cloud Service Operators (“CSO”) are those digital businesses operating with high flexibility, rapid innovation and the independence necessary to drive
greater business via cloud operations. They scale and reliably operate their business solutions on multiple cloud sites with near zero risk of operational or
run-time deterioration of any kind.
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
We Can Overcome the Term Bloat – Let’s Look at It Closely
16
Simple Use CaseI am a [some_user_we_like] and need to extract [partition X] of our big
data (e.g., HDFS), which contains corporate sensitive and customer
personal information, and move that data from my VMC on AWS instance
to Azure, AWS or GCP so we can train some advanced neural networks.
After training is done we no longer need the data but need to assure it is
wiped on decommissioning.
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
CSOs Do Cross Cloud Continuous Delivery
#DEV2858BU CONFIDENTIAL 17
This is what concerns the business for fueling flexibility, not the infrastructure!
Apps Where Apps Belong – Cross Cloud Workload Placement
Infrastructure Deployed On Demand Where Needed for Apps
VMworld 2017 Content: Not fo
r publication or distri
bution
Seven Foundations of MultiCloud
#DEV2858BU CONFIDENTIAL 18
Instrumentation Storage
Data
Classification
Governance
Networking
PaaS
Security
VMworld 2017 Content: Not fo
r publication or distri
bution
MultiCloud xxOps Considerations
#DEV2858BU CONFIDENTIAL 19
Data Description
Data Partitioning
Data Placement
Connectivity and Access
Processing Cleanup
Instrumentation
Governance
Governance
Storage
Data
Classification
Governance
Networking
Security
Storage
PaaS
Governance
Security
Storage
Security
GovernanceSecurity
Storage InstrumentationSecurity Instrumentation
VMworld 2017 Content: Not fo
r publication or distri
bution
Foundations Heat Map – What We Often See in the Field
#DEV2858BU CONFIDENTIAL 20
Data Description
Data Partitioning
Data Placement
Connectivity and Access
Processing Cleanup
Instrumentation
Governance
Governance
Storage
Data
Classification
Governance
Networking
Security
Storage
PaaS
Governance
Security
Storage
Security
GovernanceSecurity
Storage InstrumentationSecurity Instrumentation
VMworld 2017 Content: Not fo
r publication or distri
bution
Multi-Cloud Decision Matrix Framework (simplified)
21
What is the
Data Security
Importance
Level
Open
Secure But
Not
Confidential
ConfidentialConfidential
Customer Data
Do you
Require
WebServices
ie. RDS,
Lambda
Are you
Working with
Large
Datasets?
Yes
Does the Project
require traditional
support model?
Does the Project have
an end date?
Are you Working with Large Datasets?
No
YesNo
Does the Project
understand its
growth?
NoYes
Yes
No
No
Yes
Private
Multi-Cloud
Public
Make Good Business SenseReview Costs at Every-step
NoVMworld 2017 Content: Not fo
r publication or distri
bution
Operating Clouds and Apps at Scale – Platforms and Roles
• Dell/EMC: Physical; VMware: Virtual; Pivotal: PaaS (K8s, Docker…)
• Infrastructure Is the Bread and Butter of All . . .
• Infrastructure is, however, contextual by role:
– “Infrastructures” becomes “Platforms” . . .
• To I&O, vSphere, NSX, et al are the (SaaS) Production Delivery Platforms
• To Developers, Kubernetes, PCF, PaaS et al are the (SaaS) Platforms
• To a Data Scientist, H20, TensorFlow, Spark, MXNet, NuPIC or the likes are the (SaaS) Platforms
22
The Point is – Great Cloud Service Operators Make Infrastructure Irrelevant
Focus on Automating Everything – True Cloud DevOps / Continuous Delivery
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
CSOs Deal Fully with xxOps and Multicloud Foundations
#DEV2858BU CONFIDENTIAL 23
That’s DevOps -- or, in Google parlance, Site Reliability Engineering. As three words, it doesn’t sound like much.
But it’s an enormously powerful idea. It has already produced Google. But particularly philosophical SREs like [Todd
Underwood, Site Reliability Director, Google] have even bigger ambitions. They envision a world where operations
shift even further towards code. “We long for the day,” Underwood says, “when nobody runs anything.”
Source: Cade Metz, Business, Wired
“Here’s How Google Makes Sure It (Almost) Never Goes Down”https://www.wired.com/2016/04/google-ensures-services-almost-never-go/
That’s Great for Google, But How Does VMware Help Customers Get There?
[Customer Reliability Engineering take[s] the principles and lessons of SRE and appl[ies] them towards customers. [We] Deeply
inspect[] the key elements of a customer’s critical production application — code, design, implementation and operational procedures. We
expect that the overwhelming majority of customers won’t participate because of the effort involved. We think big enterprises betting
multi-billion dollars businesses on the cloud, however, would be foolish to pass this up.
Source: Dave Rensin, Director of Google Customer Reliability Engineering (CRE)
“Introducing Google Customer Reliability Engineering”https://cloudplatform.googleblog.com/2016/10/introducing-a-new-era-of-customer-support-Google-Customer-Reliability-Engineering.html
VMworld 2017 Content: Not fo
r publication or distri
bution
Products, of Course, but Also:Services for Transforming into a CSO
• Adoption and transparent, simultaneous use of multiple cloud service providers;
• The entire service layer for multi-cloud CSP adoption, e.g.:
– Unified, multi-cloud governance model and related service(s); Unified metrics, logging and related analytics services; Multi-Cloud in-flight and at-rest data security; System hardening (e.g., bastion hosts and hardened VM and container images); Cybersecurity; Penetration breach prevention, detection and quarantining; Site-to-site network connectivity; Best practices and assistance regarding content (data) classification and storage models, retention and cross-cloud delivery (e.g., big data / metadata / application specific data); Multi-cloud identity and access management (a.k.a., AuthN/AuthZ); Access Key / Credential vault and management services; Image (VM and Container) Management; Platform as a Service (PaaS) services, inclusive of Big Data, Machine Learning, IoT and Application Management; DevOps with reliability engineering for continuous delivery and optimal production operations.
Yes – that should scare you. Not because of ridiculous word count (an anti-pattern in presentations), rather if you don’t have those kinds of services you are not in position to operate at scale.
• Best practices and assistance transforming I&O teams regarding continuous delivery, site reliability engineering and operations, management, development and use of fully automated, API driven IaaS, PaaS and iPaaS services for production application management.
• SME regarding best practices and assistance in creating the CSO business operations including billing, chargeback, staffing and skillset assessment and gap analysis.
#DEV2858BU CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
25
1 Across Data Centers and Clouds – Framing the Problem
2Resolutions – All the Ops (SecOps, ClusterOps, xxOps . . . Ops):
Ops Shift to the Left (or is it Developer Shift to the Right?)
3 DevOps -- AppDev => AppOps => Ops == Great Cloud Operations
4 Quick Recap
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
DevOps Requires Transformation across…
#DEV2858BU CONFIDENTIAL 26
People
ProcessTechnology
Role Changes &
New Skills
Automated Release
Processes & GovernanceAutomated Deployment Pipelines & On-Demand
InfrastructureVMworld 2017 Content: N
ot for publicatio
n or distribution
Roles and Platforms – Operate at Cloud Scale
AppDev
Data Science Ops
AppOps (SRE)
With an SRE-inspired approach, system logs and telemetry are continuously collected, in real time, from all components of the system,
and stored in a central data store. Machine-learning algorithms identify anomalous events (such as the rash of timeouts from mobile
devices that represent a statistical outlier compared to historical patterns) and surface them to the attention of IT staff.
“Are site reliability engineers the next data scientists?”
Donald Fischer, Venture Partner, General Catalyst
https://techcrunch.com/2016/03/02/are-site-reliability-engineers-the-next-data-scientists/
Example: Request timeouts are driving shopping cart abandonment.
#DEV2858BU CONFIDENTIAL 27
VMworld 2017 Content: Not fo
r publication or distri
bution
R&D / Lines of Business
Great CSOs Present Platforms and Reliability for the Business
#DEV2858BU CONFIDENTIAL 28
Cloud Service Consumer
Cloud Service Provider
Business Customers
Cloud Basis
Open APIs (e.g., OpenStack and Heat)
Structured PaaS(e.g., Pivotal Cloud Foundry,
OpenShift, etc.)
Unstructured PaaS(e.g., Docker, VIC,
Kubernetes, Kubo/PKS)
IaaS: (e.g., VIO / vCloud Suite)
Physical / VMware Platforms
AppDev
AppOps (SRE)
Ops
Cloud Service Operator
The DevOpsCarry
VMworld 2017 Content: Not fo
r publication or distri
bution
Enabling Rapid AppDev with AppOps (SREs as xxOps)
#DEV2858BU CONFIDENTIAL 29
If “software is eating the world,” then the meal will be prepared by developers (i.e., DDI [and SRE]).*
Planning (Agile Tasks)
CodingSCM
Commit
Auto-Triggered
Continuous Integration
Testing(static and run-time
KPIs)
Artifact Registration
DeploymentConfiguration Management
Instrument / Logging /
Perf Metrics
Anomaly Feedback
29
Continuous Integration & Continuous Delivery
AppDev -> PlatformOps -> Ops
People &
ProcessDay 2 Ops
• AppOps / Ops Proactively monitor and correct production environment
• High trust culture
• Win-win relationship between Apps and Ops
• Version control of all artifacts that could affect Production
• Peer-review of production changes
• High trust culture with cross-functional involvement
• Win-win relationship between Apps & Ops
• AppOps Create Environments on-Demand
• Continuous Integration and Deployment
• Automated Testing
• Foster High Trust Culture
* http://venturebeat.com/2015/04/01/the-geek-shall-inherit-the-earth-the-age-of-developer-defined-infrastructure/
VMworld 2017 Content: Not fo
r publication or distri
bution
What SRE Attacks In the Process
#DEV2858BU CONFIDENTIAL 30
Enabling Rapid AppDev with PlatformOps (SREs) and Ops – What the SRE Attacks
Planning
Priority Setting;
Cross-Team and Peer Review
Coding
Instruments / Dashboards,
IaC, Automations,
Test Integrations
SCM Commit
Peer Review, Post Mortem
Review
Continuous Integration
Automated Build, Test,
Deploy
Testing
Unit, Integration,
Performance. Early Failure
Detect, Assure
Dev/QA/Prod are Identical
Artifact Registration
Automated by CI Engine, but to include vRA Catalog Blueprints, VM/Docker images, etc.
Deployment
Dev/Test/Prod
Canary and Scaled Rollouts
Configuration Management
Assure Identical
Rollouts Per Environment
Instrument / Logging /
Perf Metrics
Failure Detection,
MTTD, MTTR,
Evaluate Prod
Readiness
Anomaly Feedback
Fault Free Post Mortem,
Feed Planning Phase
30
Continuous Integration & Continuous DeliveryPeople &
ProcessDay 2 Ops
• AppOps / Ops Proactively
monitor and correct
production environment
• High trust culture
• Peer-review of all Dev/Test/Prod
activities
• High trust culture with cross-
functional involvement
• Deployable, Automated Environments on-Demand
• Continuous Integration & Deploy
• Canary and Scaled Production Rollouts
• Automated Testing
• Fault Free Post Mortems
• Foster High Trust Culture
AppDev -> PlatformOps -> Ops
VMworld 2017 Content: Not fo
r publication or distri
bution
What Should the Tool(s) Provide?
#DEV2858BU CONFIDENTIAL 31
• Get visibility into where a build is in the release process
• Eliminate the costs and errors associated with manual tasks and hand-offs$$$
• Ensure that the right artifacts are deployed every time
• Ensure a consistent, repeatable & predictable software release process
• Leverage the value from all of the tools in your software development
release chain
VMworld 2017 Content: Not fo
r publication or distri
bution
Structured PaaS Stack: Pivotal
32
CDStack
ControlStack
ArtifactStack
Artifactory
CommitStackCIStack
Gerrit
Trigger
Plugins
ConfigStack
FeedBackStack
Issues
CodeStack
Geany
IntelliJ
LiteIDE
PlanStack
TestStack
Test
UAT
Staging
BOSH &
Ops
Manage
BOSH
Metrics
Logs
Health Mgr.
vRealize Orchestrator
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unstructured PaaS Stacks: Traditional Apps, Open Tools
33
Code Stream
vRA
vRO
CDStackTestStack
ControlStack
vRA
vROps
vRLI
vRB
ArtifactStack
Artifactory
CommitStackCIStack
Gerrit
Trigger
Plugins
CodeStack
ASD/vRO
Geany
vRealize Orchestrator
ConfigStack
PlanStack FeedBackStack
Issues
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Opinionated Stack: Microsoft Applications
#DEV2858BU CONFIDENTIAL 34
MS Release
Manager
vRA/vRO
Win DSC
CDStackTestStack
ControlStack
vRA
vROps
vRLI
vRB
Win DSC
ArtifactStack
CommitStackCIStack
CodeStack
ConfigStack
PlanStack FeedBackStack
Artifactory
MS
Release
Manager
Skype
SlackPagerDuty
NUNIT
Vagrant
Gauntlet
Orca
ThinAppAppVolumes
Vagrant
vRealize Orchestrator
VMworld 2017 Content: Not fo
r publication or distri
bution
What about SRE Tooling – Some Examples
• BASH/ZSH/… (this one is easy)
• PowerCLI, vRO, vRA, etc. (have to have some VMware for sure!)
• But really, it’s a plethora of tools that map to the relevant SRE team for builds, instrumentation, testing, probing, debugging and all the things necessary to keep production up:
35
Compilers
Debuggers
Graphite
Nmap
Go, Python, Perl, Ruby, …
ACARA II
ARAM
Ansible
Bazel
CASRE
Chaos Monkey
ETARA
GO (Graphics Oriented)
Gatling
Gradle
HARP/HARPO (NASA)
Make
ProConf
Puppet
Relex
SARA
SMERFS
SPRPM
SRMP
SaltStack
SoRel
SofteRel
Zap
Zipkin
awesome-sre
Some of these are pretty old, but the point is SRE toolchains are very broad.
#DEV2858BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
What is the Point of the xxOps/SRE Shift Left?
• The shift to the left is already occurring, so embrace it!
• Operations teams are becoming much more development oriented.
• Measuring Success is akin to measuring the level of automation (code based, automatic operations):
– "Google places a 50% cap on the aggregate “ops” work for all SREs . . . left to their own devices, the SRE team should end up with very little operational load and almost entirely engage in development tasks, because the service basically runs and repairs itself: we want systems that are automatic, not just automated.” Source: Site Reliability Engineering by Niall Richard Murphy, Jennifer Petoff, Chris Jones, Betsy Beyer, https://www.safaribooksonline.com/library/view/site-reliability-engineering/9781491929117/ch07.html
• Measuring Success is also akin to measuring high trust cultures between Dev and xxOps/SRE
– “By design, the number of development teams that request SRE support exceeds the available bandwidth of SRE teams.” Due to that and other reasons, “[n]ot all [] services receive close SRE engagement . . . [d]evelopers may also seek SRE consulting to discuss specific services or problem areas.” Source: Site Reliability Engineering by Niall Richard Murphy, Jennifer Petoff, Chris Jones, Betsy Beyer, https://landing.google.com/sre/book/chapters/evolving-sre-engagement-model.html
#DEV2858BU CONFIDENTIAL 36
VMworld 2017 Content: Not fo
r publication or distri
bution
Quick Recap
• Embracing Public and Private Cloud Likely Demands New Operating Models
• The “Shift to the Left,” where Dev and various Ops teams more closely align is unquestionably the path forward
• Multiple companies, having adopted SRE operational models have made Ops much more Development oriented
• Measuring Success in the the xxOps/SRE model involves both technical (development) and cultural (trust) basis growth
• A plethora of examples exist where SRE teams are the brunt of production success:
– VMware, Google, Facebook, Apple, Pinterest, . . .
– Look at ’job openings’ for SRE and DevOps, which is a converged line at this point
• Embracing the change and transformation is admittedly hard, but admittedly necessary
#DEV2858BU CONFIDENTIAL 37
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution