GigaOm Structure:Europe: IT ops of the future - AI and analytics-based automation

Post on 12-Sep-2014

513 views 2 download

Tags:

description

Presentation by Chris Boos (CEO arago AG) at GigaOm's Structure:Europe 2013 in London on 19 September 2013

transcript

Structure Europe

IT Operations of the future

AI and analytics based automation

18. – 19. September 2013

Hans-Christian Boos @boosc

Wouldn't it be cool to have more time?

Because you can choose how to use your time

convert your time to

quality

innovation

$ $ $ savings

Focus on creating time out of operations

operations improvement

In IT we spend 80% of our time on operations

Converting time to money

success story on operating

a major banking application

112

no

automation

30%

automation

78

45

22

8

50%

automation

80%

automation

arago 93%

automation

status at

project start

minimum expected

time gain

time gain

achieved

maximum expected

result

number of FTE experts needed

business case for

first automation project

Converting time to quality

success story on automating the portal of the

world market leader in lighting solutions

Mar 2013

10th generation of

application 900Mio€

transactions p.a.

Apr 2006

start of project

Aug 2006

1 month after end

of migration

Dec 2006

70 %

availability

95 %

availability

99,999 %

availability

99,998 %

availability

ChangeabilityQuality

Cost Fun

ChangeabilityQuality

Converting time to innovation

success story on moving from a governance to a

brokerage culture in IT provider management

Cost

choose 2, forget 1 have all

We treat IT like industrial production:

standardize, taylorize, consolidate

8 years in design and development –

unlikely

15+ years of monetisation on one platform –

wishful thinking

Accepting complexity as a fact of IT is key

We take a new approach combining

the best of both worlds

there are 2 approaches to operate IT

- no flexibility

- limited to

IaaS / PaaS

- no durability

+ efficiency

+ repeatability

PEOPLE CENTRICROBOTIC /

STANDARDIZED

- cost

- varying

quality

- availability

+ agility

+ accountability

A machine continuously trained by humans

and learning from its own activity is the solution

a new way of operating IT

- cost

- varying

quality

- availability

+ agility

+ accountability

- no flexibility

- limited to

IaaS / PaaS

- no durability

+ efficiency

+ repeatability

PEOPLE CENTRICROBOTIC /

STANDARDIZED

+ continuous cost

optimization

+ stable quality

+ scalability

+ agility

+ accountability

ANALYTICS BASED

& KNOWLEDGE

CENTRIC

Automation augmented engineers

Engineering enabled automation

We have built a machine that operates IT just

like a human would – we call it AutoPilot

AutoPilot does not act like a machine!

To solve a maze, a machine follows a

pre-defined run-book.

1. Go forward 1

2. Turn left

3. Go forward 1

4. Turn Right

5. Go forward 2

6. Turn left

7. Go forward 2

8. Turn Right

9. Go forward 3

10.Turn Right

11.Go forward 1

12.Turn left

13.Go forward 1

But we do not know the future….

… and the machine cannot use the

same run-book for a different maze!

1. Go forward 1

2. Turn left

3. Go forward 1

4. Turn Right

5. Go forward 2

6. Turn left

7. Go forward 2

8. Turn Right

9. Go forward 3

10.Turn Right

11.Go forward 1

12.Turn left

13.Go forward 1

Knowledge Item 2Knowledge Item 1

AutoPilot only needs two pieces of

knowledge to solve these mazes!

IF T

HIS

TH

EN

TH

AT

My right

hand touches

the wall

Walk forward

IF T

HIS

TH

EN

TH

AT

I cannot walk

forward

Turn left 90°

AutoPilot’s solutions start out more

complicated, but AutoPilot learns.

Before machine learning After machine learning

And this works for every maze (non

cyclic ones, with the given knowledge).

Before machine learning After machine learning

No matter how big they get!

AutoPilot even deals with unexpected

or unforeseeable ad hoc changes.

As AutoPilot

approaches

this wall it

suddenly

closes

And this one

opens

Because AutoPilot looks at its situation

like a subject matter expert would.

As AutoPilot

approaches

this wall it

suddenly

closes

And this one

opens

Sounds cool, doesn’t it?

But how is this relevant

to IT operations?

Because finding a way to solve an IT

task is like solving a maze on the fly.

Solving IT tasks is rarely a straight

forward process (contrary to the anticipation of run-books).

AutoPilot picks knowledge one by one

to create a solution on the fly.

A piece of knowledge – a Knowledge

Item – is a simple rule with context.

Knowledge Item – KI

Abstraction

TH

EN

TH

AT

Action

AN

D

TH

IS Execute

Condition

IF IN

CO

NTE

XT

Bind

Condition

Knowledge Item – KI

Example

TH

EN

TH

AT

Find location

of log file

AN

D

TH

IS Want to look

at log file

IF IN

CO

NTE

XT

On Linux

machine

A KI in its raw XML format (easy to

transform, easy to exchange).

So let us look at real life

usage of knowledge.

Here is an excerpt from a knowledge pool

we use at arago to perform SW tests.

Check is engine

installed

Install SW if

needed

Create EC2 Spot

instance

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Shutdown

unused EC2 inst.

Start some test

with EC2 instance

Start tests if

precond. are OK

EC2 install

repository on SLES

Install SW on SLES

Install SW on

Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot

CLI

React on „No

provider of“ msg.

Check is engine

installed

Install SW if

needed

Create EC2 Spot

instance

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Shutdown

unused EC2 inst.

Start some test

with EC2 instance

Start tests if

precond. are OK

EC2 install

repository on SLES

Install SW on SLES

Install SW on

Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot

CLI

React on „No

provider of“ msg.

Here knowledge is categorized into

four classes for easier visualization.

Solutions are built step-by-step using the

knowledge in the pool.

When examining the solution after

execution it looks like a script.

So let’s take a look what is possible with

this pool of just 22 KIs.

Example 1

Setup AutoPilot test environment

and run software tests

First we give AutoPilot

a task it can identify.

Start some test with

EC2 instance

Give the task of

performing AutoPilot

tests to the machine

AutoPilot found

knowledge how to

do test on EC2

Issue Detail View

Do Autopilot

Test

We prepare to allocate IaaS for our tests

at AWS using spot priced instances.

Start some test with

EC2 instance

Do Autopilot

Test

Give the task of

performing AutoPilot

tests to the machine

AutoPilot found

knowledge how to

do test on EC2

Issue Detail View

Set Type/Price for

Spot Inst. Request

Figure out what we

are willing to pay for

IaaS

The task we gave to AutoPilot

requested a test on CentOS.

Start some test with

EC2 instance

AutoPilot found

knowledge how to

do test on EC2

Issue Detail View

Set Type/Price for

Spot Inst. Request

Set EC2 AMI ID

CentOS

Figure out what we

are willing to pay for

IaaS

Choose CentOS as

OS for the IaaS

etc. etc.

After our tests are done we

decommission our EC2 instances.

Issue Detail View

Check other test pre

conditions

Start tests if

precond. are OK

Perform SW test for

AutoPilot CLI

package

Run Simple

Autopilot CLI Test

Decommission EC2

instance

Set EC2 AMI ID

CentOSFully automated.

We are finished

The steps shown before are

summarized in a run-book next.

Start some test with

EC2 instance

AutoPilot found

knowledge how to

do test on EC2

Set Type/Price for

Spot Inst. Request

Figure out what we

are willing to pay for

IaaS

Set EC2 AMI ID

CentOS

Choose CentOS as

OS for the IaaS

Create EC2 Spot

instance

Request the server

from AWS

Analyze the output

AWS gave us

Parse EC2 Spot

Request Output

Check if the spot

pricing request

issued was granted

Check EC2 Spot

Request Status

Analyze the output

AWS gave us

Parse EC2 Spot

Request Output

Check if the spot

pricing request

issued was granted

Check EC2 Spot

Request Status

Analyze the output

AWS gave us

Parse EC2 Spot

Request Output

Read EC2 instance

Information

Retrieve information

on AWS instance

Extract EC2

instance FQDN

Retrieve FQDN of

machine using AWS

API call

Install requested

software

Install SW if needed

Install software on

Linux

Install SW on Linux

Check for working

AutoPilot installation

Check is engine

installed

Install software on

Linux

Install SW on LinuxReact on error

„repository not found“

Decide what to do

with a “repository

not found” error

EC2 install

repository

Install repository

from EC2

Retry Install

Clear installation

history from new

install

Install software on

Linux

Install SW on Linux

Check for working

AutoPilot installation

Check is engine

installed

Check if AutoPilot

CLI is available

Check Autopilot CLI

Perform SW test for

AutoPilot CLI

package

Run Simple

AutoPilot CLI Test

Decommission EC2

instance

Set EC2 AMI ID

CentOS

Check other test pre

conditions

Start tests if

precond. are OK

This is only one script, one automatically

generated to solve a specific problem.

The same knowledge can solve millions

of tasks.

A representation more adequate to

AutoPilot is the Knowledge Graph.

All steps, just explained as a

Knowledge Graph.

Check is engine

installed

Install SW if needed

Create EC2 Spot

instance

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Shutdown unused

EC2 instances

Start some test with

EC2 instance

Start tests if

precond. are OK

EC2 install

repository on SLES

Install SW on SLES

Install SW on Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot CLI

React on „No

provider of“ msg.

1

2

3

4

5,7,9

6,8

10

11

12

16

13,15,19

17

18

14,20

21

22

23

24

Example 2

With no additional knowledge, just by

posing a different task, we can generate

any single EC2 instance, this time not

CentOS but SLES

We skip how AutoPilot develops the

solution step by step… The result:

Check is engine

installed

Install SW if needed

Create EC2 Spot

instance

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Shutdown unused

EC2 instances

Start some test with

EC2 instance

Start tests if

precond. are OK

EC2 install

repository on SLES

Install SW on SLES

Install SW on Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot CLI

React on „No

provider of“ msg.

1

2,4

3

5

6

7,8

9, 14, 15

10

11

12

13

Same knowledge, different task,

still fully automated.

Example Summary

With these 2 examples you can begin

to imagine how many other possible

tasks AutoPilot can perform with a

Knowledge Pool of only 22 KIs.

Here are just a few more real tasks

AutoPilot can perform with these 22 KIs.

Install any software on SLES CentOS from repositories

Install any software on SLES CentOS from packages

Provide EC2 Instance Status information

Create AWS EC2 Instances

Download and install software packages

Scale up AWS EC2 instance size

Install AutoPilot 4.1 SLES CentOS

Install AutoPilot 4.0 SLES CentOs

Install AutoPilot unstable SLES CentOS

Start/Terminate AWS EC2 Instances

Create AWS EC2 Spot Instances

Run AutoPilot tests in AWS environment

Run AutoPilot tests in AWS Spot Instance

Dynamically create and setup AutoPilot cluster nodes in AWS

Add software repositories SLES CentOS

Provide information about broken dependencies

Fix broken dependencies SLES CentOS

Shutdown EC2 instance when no longer needed

Clone and configure general purpose server

Check if AutoPilot engine instance is running properly

Update dynamic domain name from instance

Run commands with AutoPilot CLI

Install standard Linux Web Server with Apache, Tomcat, …

Install mail server on Linux OS

Install proxy server on Linux OS

Install content filter on Linux OS

Create a cluster of 1..n Linux servers

Dynamically add Servers to a cluster of systems

Install MySQL database on Linux Server

Decommission single no longe rused instance in a cluster

Create a cioy if a running server based on model

From Example 2 –

Automated Amazon Cloud Spot

Market provisioning

From Example 1 –

AutoPilot software test

Create EC2 Spot

instance

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Start some test with

EC2 instance

Create EC2 Spot

instance

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

For different tasks, AutoPilot uses

different entries in the KI graph.

We know exactly where to install.

We just do it.

We do not know where to install?

We can find out.

All or little input: no challenge to

AutoPilot – if it doesn’t know, it finds out.

Extract EC2

instance FQDN

Read EC2 instance

Information

Install SW if needed

Install SW on SLES

Install SW if needed

Install SW on SLES

The software was not installed properly? AutoPilot fixes the problem. This specific event

did not have to be anticipated by the subject matter experts creating the knowledge

pool and neither did they have to program case specific error handling.

AutoPilot handles unforeseeable events

by working with them.

Check is engine

installed

Install SW on Linux

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot CLI1

2

3

4

5

6

78

Let us talk about creating

Knowledge Pools

Do you think, you have to hire PhDs or

wizards to create and maintain KIs?

No, you don’t!

Knowledge is created

by ordinary people…

…people who have the knowledge…

…people who do the job now….

…people who most likely

have something better to do!

We will show you how

the Knowledge Pool we worked with in

the examples was created.

Knowledge is entered into the pool

bottom up.

That means knowledge is entered, after

it was needed for the first time.

The thing done manually first:

Straight forward, install AutoPilot on AWS cloud

Create AWS environment, run tests,

decommission AWS instances

Extract EC2

instance FQDN

Read EC2 instance

Information

Start some test with

EC2 instance

Run Simple

AutoPilot CLI Test

Start EC2 Instance

Shutdown unused

EC2 instances

The beginning of the Knowledge Pool

you saw before.

Extract EC2

instance FQDN

Read EC2 instance

Information

Start some test with

EC2 instance

Run Simple

AutoPilot CLI Test

Start EC2 Instance

Shutdown unused

EC2 instances

OK… worked fine, but

We wanted to be able to run multiple

environments with different software.

Ensure compatible operating system (CentOS)

Install any software upon request, not pre-determined

So we needed to be able to install

software based on model information.

Install SW if needed Install SW on Linux

Retry InstallEC2 install

repository

React on error

„repository not found“

Set EC2 AMI ID

CentOS

Resolve version conflicts in an installation of AutoPilot

And deal with incompatible versions of

our SW being installed as part of tests.

Check is engine

installedCheck AutoPilot CLI

Start tests if

precond. are OK

The Knowledge Pool has evolved.

e.g. it is now able to install any RPM.

Check is engine

installed

Install SW if needed

Extract EC2

instance FQDN

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Start some test with

EC2 instance

Install SW on Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on err „repository

not found“

Check AutoPilot CLI

Start tests if

precond. are OK

Start EC2 Instance

Shutdown unused

EC2 instances

OK… worked fine, but someone came

along and needed a test for a new OS.

Request SuSe Linux Enterprise Server from AWS

A new OS request KI was created to

request SLES servers on Amazon

Set EC2 AMI ID

SLES

Perform software installs on SLES and handle exceptions

and perform SLES compatible

installation procedures.

EC2 install

repository on SLESInstall SW on SLES

React on „No

provider of“ msg.

The new Knowledge Pool can install

packed SW on either CentOS or SLES.

Check is engine

installed

Install SW if needed

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Start some test with

EC2 instance

EC2 install

repository on SLES

Install SW on SLES

Install SW on Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

React on „No

provider of“ msg.

Start EC2 Instance

Set EC2 AMI ID

CentOS

Shutdown unused

EC2 instances

Start tests if

precond. are OK

Check AutoPilot CLI

OK… worked fine, until our first bill from

AWS and then we wanted spot prices.

Adding new KIs to request AWS spot

instances, handle availability and pricing.

Request AWS spot instances and check for availability, re-request if desired pricing

cannot be obtained in time.

Create EC2 Spot

instance

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

The Knowledge Pool we used before is

complete now.

Check is engine

installed

Install SW if needed

Create EC2 Spot

instance

Extract EC2

instance FQDN

Set EC2 AMI ID

SLES

Read EC2 instance

Information

Set EC2 AMI ID

CentOS

Set Type/Price for

Spot Inst. Request

Check EC2 Spot

Request Status

Parse EC2 Spot

Request Output

Shutdown unused

EC2 instances

Start some test with

EC2 instance

Start tests if

precond. are OK

EC2 install

repository on SLES

Install SW on SLES

Install SW on Linux

Run Simple

AutoPilot CLI Test

Retry Install

EC2 install

repository

React on error

„repository not found“

Check AutoPilot CLI

React on „No

provider of“ msg.

Effort?

With the same effort needed in another

environment to create 4 scripts…

…ordinary subject matter experts used

AutoPilot to create the foundation for

potentially automating millions of tasks.

instead of protecting the status quo engineers

do what engineers do best…

… make things better !

Thank you for your time which we hope was well invested, because dismissing good ideas can harm your future

Register at

http://www.arago.de/autopilot-ce/