Architecture’s Role in Enterprise Transformation Programs Iain Mortimer & Rupert Brown V0.4 April...

Architecture’s Role in Enterprise Transformation Programs

Iain Mortimer & Rupert Brown

V0.4 April 2008

Objectives for today

Overview of ML GIS Transformation Program

Application Availability Stream

Defining an SLDC

Application and Systems Monitoring

2

Merrill Lynch – a snapshot

Founded 1914Global Financial services company

Wealth management Capital markets Advisory

Operates in ~38 countriesClient assets of about 2 trillion US dollars22nd in Fortune 50064,200 employees

3

Personal introductions

4

ML Transformation ProgramThese six goals deliver against what clients want and our businesses need

Client Value Spectrum

Price Speed Accuracy Flexibility User Experience

Cost Time to Market Results Manageability More Business

Business Value Spectrum

70/30 InvestmentStraight-Thru-Processing

Application AvailabilityGlobal Sourcing

Client SatisfactionE-Channels

Application Availability

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”

Leslie Lamport ACM SIGACT News, 34, Mar. 2003.

The Availability Problem Space

SDLC /QA

Application Portfolio Management

Monitor

Resource Management

& Governance

Grow the business Run the Business

Technology

Business

Applications Infrastructure

ITIL

Service Management

Application availability by numbers

Engineering challenge Establish >99.95% availability 344 tier 0 & tier 1 systems identified in scope AMRS, EMEA, PacRim == 2x IOCC Colossal transaction volumes

Streams of work Monitoring Reporting Systems Development Life Cycle QA

8

SDLC

SDLC /QA

Resource Management

& Governance


Technology

Business


ITIL

•Establish a common top level set of stages to guide all projects •Strengthen risk reviews to ensure our most critical projects have the correct technical focus•Standardise project reporting•Redefine key technical review points to increase quality and visibility of technology•Define a core set of key artefacts which are used by many teams across the life cycle•Support greater multi-team working•Reduce complexity and improve documentation, particularly for the support teams

Introduction

SDLC Objectives

Our Systems Development Life Cycle (SDLC) establishes a simple to use and light-weight mechanism for managing projects. Through this globally common SDLC, senior management will have greater visibility and control over software engineering.

At its heart an SDLC defines:-•Stages - What needs to be done •Reviews - What needs to be checked •Roles - Which teams are involved and why•Artefacts - What needs to be recorded/documented

The SDLC is deliberately NOT a methodology. In fact in designing it a number of common methodologies were considered to ensure that SDLC support their use. Team’s should ensure that any particular methodology they wish to use, conforms to the standard artefacts, undertakes technical review and reports to the common stages.

Introduction

What does an SDLC Define ?

The importance of Artefacts

Technology Processes all create artefacts and their dependency relationships

A software project can be measured mechanically on the completeness of its structure of artefacts and their dependencies

A key factor in any SDLC is balancing the competing needs of risk control versus the weight of effort.

Three broad routes are defined, each entailing a different level of work at each stage and depth and number of reviews. The choice of route is broadly based on :•the tier of the system•risk•the amount of work to be carried out•project manager’s judgement

Tier 0/1 Tier 2 Tier 3

Emergency fix ESAF ESAF ESAF

Less than 25 days Small Small Small

25-100 days Large Small/Large Small

100+ Large Large Large

Introduction – Routes through the SDLC

SDLC Routes

StageSolution

EnvisioningAnalysis Design Development Test

Service Transition

Warranty

Purpose

Builds outline business case to 20-25% accuracy. Identifies feasible architectures. Initial QA engagement. Outlines project with key areas. Checks resource availability

Refines project scope and key requirements. Engages key technical areas & risk. Completes Business Case. Produces project plan, milestones & high level master test plan

Detailed process design and definition of development activity & technical solution. High level tests designed and key QA metrics identified

Builds and Unit tests system components. Detailed test design. Produces new user procedures. Integration of new package

Carries out all remaining testing, including System, Integration, UAT,OAT. Produces test packs. Final Risk assessment. Test results & metrics in QA dashboard

Carries out all training. Deploys and implements live system, verification activities in live environment and operational procedures

A period of heightened support to fix bugs in the live system. Test, report and capture metrics and carry out fine tuning. Decommissions legacy system.

The SDLC has seven key stages which help guide a project from its initial idea to a well supported running system. The specific activities within any stage are not prescriptive but they do highlight the major things that should be considered. Clearly projects using iterative methodologies will move up and down the stages with each iteration.

Activities

Project management• Business demand

registration by BMO

• Outline Business Case

• Architecture advice on feasibility

• Engage QA

• PID• Requirements

definition• High Level Design• Planning• Risk assessment• QA Approach

• Detailed design• QA Planning• SOW• Decide Buy/Build• Refine Business case• High Level Test Design

& identify QA metrics• Environment

specification

• Development• Functional unit test

design & execution• Create operational

procedures• Detailed QA plans• Environments

preparation• Handover to Test• Detailed test design,

prepare QA metrics

• Functional system integration & non-functional testing

• Regression testing• OAT• UAT• Update QA repository• Risk assessment

• Deployment Plan• System installation in

live environment ready for cut-over

• Verification in live environment

• Training• Handover to ASD /

EPRM • System deployment

• Bug fixes• Conduct diagnostic tests

& capture test results• Process fixes• Legacy systems

decommissioned

Introduction - stages

SDLC Stages

StageSolution

EnvisioningAnalysis Design Development Test

Service Transition

Warranty

Business reviews

Technical reviews

InitialAuthorisation

High level designDetailed Design

Accept System

Go/No Go

OutlineAuthorisation

FullAuthorisation

Technical reviews replace existing PTB/PTO reviews

The SDLC has defined a small number of review points. These are to ensure projects remain under financial control, that their progress is reported in a consistent manner and that projects conform to our technical strategy and that risk is managed

The reviews are categorised into business and technical audiences. The number of reviews a project will undertake depends on the SDLC route. Obviously a project manager may undertake further reviews at other points in the lifecycle, if the project warrants them.

There are three broad ways which the reviews take place:-•Delegated to the project for self-service•Mostly delegated to the project for self-service but with exception reporting for a few key aspects•Full review by a panel

It is expected that by far (>90%) of all projects will undertake self service reviews.

:•Large•Small•Esaf

Sample review to ensure project review is adequate??

Introduction - reviews

SDLC Reviews

Monitoring

Monitor

Resource Management

& Governance


Technology

Business


ITIL

Monitoring:-Overall problem scope

17

Event Capture Tools

Flows <-> Applications Mapping Engine

Business Dashboards Generation and Distribution

ML Business Flow Dashboards

Monitoring Tool Displays

Monitoring Tool Models

ML Business Flows <-> ML Applications

Model

Dashboard Display Models

Application Execution Platforms

CMDB

External Transaction and Data Feeds

Today our monitoring is based on platform

specific tools

We need an accurate catalogue and topology map

of our platforms

Data and transaction rates are continuing to increase

which in turn will drive event volume

The mapping of applications to flows is

unique to ML

Greater clarity of business impact will

lead to improved processes and

applications

18

Scope of business activities to be monitored

Individual Clients Institutional Clients

GWM Services GMI Services

Exchanges

ECNs and other LIquidity Pools

News and Data services

Trusted 3rd Party Inbound Services

Ancilliary B2B Sevices

Settlement Services

Clearing Services

Regulatory Services

Other Notification Services

Books and Records (7)Clearing (0) Cash Management (7)Transaction Manager (6) Fails Management (2)Settlement Factory (6) Margin / CollateralManagement (3)

Front Office (10) Client Reporting/Confirmation (8) Depot Management (4) Corporate Actions & Dividends (9) Control (7) Static Data (3)Business Line

HO

US

E

EQ

UIT

YG

EF

RE

GIO

NS

PIE

RC

E

Derivatives

Cash

Repo

Derivatives

Cash

Portfolio

PB Debt

PB Equity

Stock Loan / Collateral

Jo’burg

Madrid

Equity Products

Debt Products

HO

US

E

DE

BT

Zurich

BTM

TESS

CTM

GSF

MEDUSA

BDA

CME system

RAM

RAM

FIDESSA

GRAPES

RAM

FIDESSA

BTM

CTM

TESS

BTM

CSW

EBAR

XTAS

CBAR

XTAS

EBAR

CAPS

MEDUSA

MEDUSA

XTAS

RECON

RECON

S3D

BUCS

JANUS

BOND MANAGER

BLOOMBERG

Janus

TMS

TODCARS

CASH MANAGER

TRELIS

CLAIM MANAGER

FINMAN

CAMS

XTAS

BAM CBC

ATS BDA

CME system

BDA

CME system CME system CME system

CLAIM MANAGER

TRELIS

CLAIM MANAGER

TRELIS

SX1

CLAIM MANAGER

TAX MANAGER

RAM

TRELIS

BAM CBC

CLIENT

MONEY

GRTS/GDFS

BDA

STT FTR MNG

MANUAL

MEDUSA

DMA

COPER

COPER

PME / COPER

COPER

COPER / PME

TESS

CAMS

REPO CAMS

CAMS

RAM

GCDB

CLIENT

MONEY

DMA

EBAR

CTM CBAR

TESS

EBAR

FAR

CMP

BBP

CMP CMPCAMS

BUCS

RAM

CMP

FINMAN

Paris, Milan and Frankfurt Archictecutre rolled into MLI

External pressures on banks

Volume, Latency, Reference Data Market Data Rates and other major feeds continue to

increase exponentially Low latency, DMA and Algorithmic Trading are combining to

cause significant feedback loops with subsequent volume spikes.

Latency metrics from Monitoring and Order Book systems are becoming as significant as the prices and volume quotes on them

System event rates are approaching those of major telcos

19

Marketplace Observations

Marketplace is weak None of the leading enterprise platform vendors have been

able to demonstrate large scale “dogfood” implementations of their own technology platforms in 2007

All enterprise monitoring solutions seem to be grown by an acquisition “strategy” rather than core engineering effort.

Technical solutions exist to many Finance Sector problems but recent focus is on Market data and Trading Systems Package monitoring.

20

B*M Confusion

Business Activity Monitoring Most vendors are pushing little more than Web 2.x widgets coupled to back end data SQL

or warehouse sources to draw nice pictures Credit Crunch Money Saving Tip :- Much of this “limited value” can be replaced by Excel

2007 services and Sharepoint

Business Service Monitoring Vendors are naively assuming that organizations can or have converted their entire

enterprise to very limited N tier (Where 2<=N<=4) “SOA” architectures

Neither mechanism contributes significantly to the improvement of: Root Cause Problem Determination Application and Enterprise Architecture Improvement Business Process Improvement

Industry Landscape

BPMN

IBM Banking Model /

Microsoft Motion

Customer

Product

Channel

Division

Flow

Region

Process

Step

Application

Regulation

Application Tier

Utility

Blueprint

Component

OMG BMM

Microsoft SDMITIL

BAM

CMDB

Business Layers

Industry Standards

Industry Concepts

Vendor Models

Organizational Structure

Implementation

Requirements Faultline

Our Architecture Objectives

Define a unifying, extensible, technology-proof fabric to embrace existing and future monitoring tools

Provide a single, high performance event space to support multiple application and infrastructure support roles and processes

Enable ML to focus on best of breed monitoring toolsEnable continuous, systematic process improvement

supported by a consistent, extensible range of dashboards

23

Macro observations

There is no Silver Bullet

There are no significant reference architectures, widely recognized industry best practices or academic research at the enterprise level in Finance.

There are no recognised enterprise monitoring solution consultancy practices in the Financial Services Sector

24

Some challenges: Dashboards and the “truth”

Many Dashboards – One Source of Data There are many different operational roles & processes that all need to

derive their actions from a single source of the truth There are many different dashboards required to support these “Mission

Specialists” Roles, Processes and Organizations change independently of the data

The CMDB Bottleneck Analysis of current CMDB offerings has determined that they will

struggle to sustain the reverse lookup rates we will require to map device events to platforms and then to applications

25

Models and Dashboards

Customer

Product

Channel

Division

Flow

Region

Process

Step

Application

Application Tier

Utility

Blueprint

Component

CA Cohesion Blueprint Models

ARIS Process Library

Vendor Virtual FabricsHP NetApp

XsigoEgenera PAN etc.

Internal Middleware Flows Catalogue

Application Architecture

Normalization Model

Internal Business Sales

and Trading Model

General Ledger

Microsoft PPS

Aris PPM/PEM

TBD

Vendor Component Dashboards

Existing or Known Models Candidate Dashboards

Business Layers

Regulation

Organizational Structure

Implementation

Requirements Faultline

UML

Some Basic Design Tenets

We have to be able to fully understand the impact of technology solutions and issues on Business Flows and Processes so that they can be continuously optimized to maximum ROI

Data Content and Latency Monitoring Data must contain sufficient detail to determine root cause of

technical problems wherever possible Monitoring Platforms must be able to provide detailed insight into our lowest

latency and highest volume flows ahead of business demands and at sub 1ms granularity.

Automation Monitoring Events must be able to correctly trigger transactional automated

break-fix processes and dynamic capacity fulfilment Standardization

The monitoring platform data will factually direct the future technical strategy and standards

27

28

Some Detail:-Normalized Physical Architecture

Our Implementation approach is to codify existing server inventory and classify servers into 5 application tiers

Service/UI Façade Distribution General Purpose Compute Database Tx Gateway

Basic Application Dashboards Will give a uniform view of each application by

functional tier Can then measure availability of each tier of each

application in a consistent fashion Identify and triage weak architectures

Process

Step

Application

UI or ServicePresentation

Distribution MIddleware

General Purpose

Compute and Caching

Data Persistence

Tx Gateway or other

Service Tier

Package

Module

Flow

Channel / Session

Transaction

Process Process

Process Model

Event model, transport and correlation – Logical view

Need to deal with Event Rates approaching Market Data Rates (>20K Events per second Globally)

System events now driven by business (market events)

Be mindful of scale growth in events and trading

Need to decide on most appropriate blend of technology

Cannot buy rulesets off the shelf

May be able to use multivariate analysis to determine significant correlations to validate rulesets

29

Process Model Database

Common Event Fabric Listener

Event Correlator



Event Fabric Publisher


Event Correlator





Event Correlator




Tool Specific Adapter and Type Filter


CMDB

Device Event Probe

Device to Platform Mapping Engine



Event Correlator




Application Component Server

Business Process and Flow Model

Common Application Tier Model

Basic Event Model

Applications and Monitoring Tools

Shared Infrastructure (SAN etc)

Flow and Process Listener Fabric Loader/Manager

Application Component

Listener Fabric Loader/Manager

Basic Event Fabric

Distribution Model and Manager

Flow and Process Dashboards

Application Tier Component Dashboards

Business Flow and Process Correlation

Rules

Application Component

Correlation Rules

Basic Event Correlation Rules

30

Further precedence and correlation

NB Correlation occurs at multiple layers Components within a physical or virtual

server Architectural tiers of an application Components of a Business Process Components of an end to end Flow of

connected processes. We correlate both technology events and

business KPI metrics. Multivariate analysis can be applied to these

data flows to heuristically surface the key correlations.

31

Event model namespace

TransactionSessionModulePlatform Data

Event Type

Host

Monitoring Platform

Initiaiting or Receiving Host

Event Data

Module Name

Process ID

Session Type

Session ID (Queue Name

etc)

Transaction ID

May also be present

May also be present

May also be present

32

Monitoring event precedence model

There is a clear precedence to the classes of monitoring event.

Most monitoring tools are clustered at the software and session monitoring end of the spectrum

A physical error on a device or platform will produce a cascade of events for all the software modules that run on it and all the platforms that are connected to it.

The precedence hierarchy is used by the correlation engines distributed across the monitoring infrastructure to identify the most significant “root cause” event

The projected data rates mean that a significant amount of compute resource will be needed to perform the event correlation actions.

Power

External Connectivity

Other Denial of Service(DDOS Virus or Physical)

Capacity

Storage Failure

SAN/IP Swtich Failure

Compute Platform (Server) Failure

(Physical or Virtual)

Device Failure

(Physical or Virtual)

Package

Module

Session

Transaction

ITRS/Hawk

Wily / TS-A

Optier

Tivoli/Netcool

Vendor platform

(Brocade, EMC etc)

Severity Recovery LatencyOther

Data Elements (Reference etc)

33

Overall architecture layers + stores view

Infrastructure Asset Data Dashboards

Zone

Site

Region

Global IOCC

Minimum Active/Active

clustered pair per zone

Minimum Active/Active clustered

pair per Site

Consolidated Application <-> Hosts CMDB

Region Specific Process Models,KPIs

& HeuristicsSnake Wolf etc

Larger scale distributed pub/sub

fabric needed

Flow and Process

Zone Specific DashboardsApplication Host Status:- ITRS/Hawk

JVM/CLR App Runtime Status: CA-WilyVLAN Traffic Analysis:- TS-A

Site Specific DashboardsApplication Structure Status TBSM/RAD

Application Transaction Flow:- OptierPhysical Capacity Metrics:- TeamquestApplication Host Status:- ITRS, Hawk

Network Hop Latency TS-A

Region Specific DashboardsBusines Process:- Aris PPM/PEM, SystarApplication Structure Status: -TBSM/RAD

Global DashboardsBusines Process:- Aris PPM/ PEM,

Tier 0 Application Structure Status: -TBSM/RAD

Global / Divisional Process Models,KPIs &

Heuristics

Automated T3 Processes/Workflows

(Opalis iConclude, Autosys etc)

T3 Automated Process

Execution Platforms

Larger scale distributed pub/sub

fabric needed

Collection & Correlation Monitoring Data and Rules

Zone Topology Map

for site

Global Application Map and

Correlation Rules

Zone specific log and monitoring history

ITRS Hawk Wily TS-A

Regional Application Event consolidation & correlation

Global Flow and Process consolidation &

correlation

Site Application Event consolidation and correlation

Site Collection and Correlation Fabric Loader/Manager

Regional Application Map and Correlation Rules

Capacity Data

(TeamQuest)

Zone<->SiteRouting

Site <-> RegionRouting

Region<->IOCCRouting

Region Collection and Correlation Fabric Loader/

Manager

Global Collection and Correlation Fabric Loader/

Manager

MLAI MHS

Site Infrastructure Device Repositories (Brocade,VMWare

ESX etc)

Application Hosts to be Monitored

Dev QA Prod

V 0.4Work in Progress

(RDEB)

Zone Application Event consolidation and

correlation3rd Party Gateways

Global TCIC Database

34

Solution heat map

Commercial Tools

Bespoke Monitoring and Logging

Normalization

Concentration

Correlation

Zoning

Application Aggregation

Flow and Process Aggregation

Base candidate probe technologies identified and initial PoCs completed

Java and .NET logging libraries complete and integrable + C/C++ to be completed

Need to have an accurate topology and metrics of event rates to define Zoning policy

Outline Format and type schema defined - key issue is populating messaging format fully as applications

transition to new logging libraries

First level concentration processes will be distributed in zones. Resilience and Performance metrics TBD

External discussions with potential solution partners and vendor have validated the approach

Event precedence rules need more definition and will evolve over time. Candidate solution technologies

exist. Scaling/Resilience requirements are dictated by Zoning and Normalization

Need to solve the problem of automatic population of TBSM/RAD dashboards from MLAI/MHS inventory

databases

Candidate Technologies identified for mapping events to process models derived from IDS Scheer/

ARIS

•In order to define the solution architecture a number of PoC’s were carried out to provide point solutions and assess technology capabilities

•At this point we have established coverage of all the technical components necessary however we will need to carry out a broader end-to-end integration PoC

Summary

SDLC /QA

Application Portfolio Management Monitor

Resource Management

& Governance


Technology

Business


ITIL

Service Management

Our Monitoring architecture surfaces the structural performance of an application in the context of the governing business process

Our SDLC surfaces the structural components of an application from its business requirements

The combination of performance and development metrics will allow us optimize the resources we need to satisfy our business demand

The need to know

Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns - the ones we don't know we don't know

Donald Rumsfeld

Date post:	01-Jan-2016
Category:	Documents
Upload:	dorcas-jones
View:	217 times
Download:	0 times

Architecture’s Role in Enterprise Transformation Programs Iain Mortimer & Rupert Brown V0.4 April...

Documents