Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | dorcas-jones |
View: | 217 times |
Download: | 0 times |
Architecture’s Role in Enterprise Transformation Programs
Iain Mortimer & Rupert Brown
V0.4 April 2008
Objectives for today
Overview of ML GIS Transformation Program
Application Availability Stream
Defining an SLDC
Application and Systems Monitoring
2
Merrill Lynch – a snapshot
Founded 1914Global Financial services company
Wealth management Capital markets Advisory
Operates in ~38 countriesClient assets of about 2 trillion US dollars22nd in Fortune 50064,200 employees
3
Personal introductions
4
ML Transformation ProgramThese six goals deliver against what clients want and our businesses need
Client Value Spectrum
Price Speed Accuracy Flexibility User Experience
Cost Time to Market Results Manageability More Business
Business Value Spectrum
70/30 InvestmentStraight-Thru-Processing
Application AvailabilityGlobal Sourcing
Client SatisfactionE-Channels
Application Availability
“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”
Leslie Lamport ACM SIGACT News, 34, Mar. 2003.
The Availability Problem Space
SDLC /QA
Application Portfolio Management
Monitor
Resource Management
& Governance
Grow the business Run the Business
Technology
Business
Applications Infrastructure
ITIL
Service Management
Application availability by numbers
Engineering challenge Establish >99.95% availability 344 tier 0 & tier 1 systems identified in scope AMRS, EMEA, PacRim == 2x IOCC Colossal transaction volumes
Streams of work Monitoring Reporting Systems Development Life Cycle QA
8
SDLC
SDLC /QA
Resource Management
& Governance
Grow the business Run the Business
Technology
Business
Applications Infrastructure
ITIL
•Establish a common top level set of stages to guide all projects •Strengthen risk reviews to ensure our most critical projects have the correct technical focus•Standardise project reporting•Redefine key technical review points to increase quality and visibility of technology•Define a core set of key artefacts which are used by many teams across the life cycle•Support greater multi-team working•Reduce complexity and improve documentation, particularly for the support teams
Introduction
SDLC Objectives
Our Systems Development Life Cycle (SDLC) establishes a simple to use and light-weight mechanism for managing projects. Through this globally common SDLC, senior management will have greater visibility and control over software engineering.
At its heart an SDLC defines:-•Stages - What needs to be done •Reviews - What needs to be checked •Roles - Which teams are involved and why•Artefacts - What needs to be recorded/documented
The SDLC is deliberately NOT a methodology. In fact in designing it a number of common methodologies were considered to ensure that SDLC support their use. Team’s should ensure that any particular methodology they wish to use, conforms to the standard artefacts, undertakes technical review and reports to the common stages.
Introduction
What does an SDLC Define ?
The importance of Artefacts
Technology Processes all create artefacts and their dependency relationships
A software project can be measured mechanically on the completeness of its structure of artefacts and their dependencies
A key factor in any SDLC is balancing the competing needs of risk control versus the weight of effort.
Three broad routes are defined, each entailing a different level of work at each stage and depth and number of reviews. The choice of route is broadly based on :•the tier of the system•risk•the amount of work to be carried out•project manager’s judgement
Tier 0/1 Tier 2 Tier 3
Emergency fix ESAF ESAF ESAF
Less than 25 days Small Small Small
25-100 days Large Small/Large Small
100+ Large Large Large
Introduction – Routes through the SDLC
SDLC Routes
StageSolution
EnvisioningAnalysis Design Development Test
Service Transition
Warranty
Purpose
Builds outline business case to 20-25% accuracy. Identifies feasible architectures. Initial QA engagement. Outlines project with key areas. Checks resource availability
Refines project scope and key requirements. Engages key technical areas & risk. Completes Business Case. Produces project plan, milestones & high level master test plan
Detailed process design and definition of development activity & technical solution. High level tests designed and key QA metrics identified
Builds and Unit tests system components. Detailed test design. Produces new user procedures. Integration of new package
Carries out all remaining testing, including System, Integration, UAT,OAT. Produces test packs. Final Risk assessment. Test results & metrics in QA dashboard
Carries out all training. Deploys and implements live system, verification activities in live environment and operational procedures
A period of heightened support to fix bugs in the live system. Test, report and capture metrics and carry out fine tuning. Decommissions legacy system.
The SDLC has seven key stages which help guide a project from its initial idea to a well supported running system. The specific activities within any stage are not prescriptive but they do highlight the major things that should be considered. Clearly projects using iterative methodologies will move up and down the stages with each iteration.
Activities
Project management• Business demand
registration by BMO
• Outline Business Case
• Architecture advice on feasibility
• Engage QA
• PID• Requirements
definition• High Level Design• Planning• Risk assessment• QA Approach
• Detailed design• QA Planning• SOW• Decide Buy/Build• Refine Business case• High Level Test Design
& identify QA metrics• Environment
specification
• Development• Functional unit test
design & execution• Create operational
procedures• Detailed QA plans• Environments
preparation• Handover to Test• Detailed test design,
prepare QA metrics
• Functional system integration & non-functional testing
• Regression testing• OAT• UAT• Update QA repository• Risk assessment
• Deployment Plan• System installation in
live environment ready for cut-over
• Verification in live environment
• Training• Handover to ASD /
EPRM • System deployment
• Bug fixes• Conduct diagnostic tests
& capture test results• Process fixes• Legacy systems
decommissioned
Introduction - stages
SDLC Stages
StageSolution
EnvisioningAnalysis Design Development Test
Service Transition
Warranty
Business reviews
Technical reviews
InitialAuthorisation
High level designDetailed Design
Accept System
Go/No Go
OutlineAuthorisation
FullAuthorisation
Technical reviews replace existing PTB/PTO reviews
The SDLC has defined a small number of review points. These are to ensure projects remain under financial control, that their progress is reported in a consistent manner and that projects conform to our technical strategy and that risk is managed
The reviews are categorised into business and technical audiences. The number of reviews a project will undertake depends on the SDLC route. Obviously a project manager may undertake further reviews at other points in the lifecycle, if the project warrants them.
There are three broad ways which the reviews take place:-•Delegated to the project for self-service•Mostly delegated to the project for self-service but with exception reporting for a few key aspects•Full review by a panel
It is expected that by far (>90%) of all projects will undertake self service reviews.
:•Large•Small•Esaf
Sample review to ensure project review is adequate??
Introduction - reviews
SDLC Reviews
Monitoring
Monitor
Resource Management
& Governance
Grow the business Run the Business
Technology
Business
Applications Infrastructure
ITIL
Monitoring:-Overall problem scope
17
Event Capture Tools
Flows <-> Applications Mapping Engine
Business Dashboards Generation and Distribution
ML Business Flow Dashboards
Monitoring Tool Displays
Monitoring Tool Models
ML Business Flows <-> ML Applications
Model
Dashboard Display Models
Application Execution Platforms
CMDB
External Transaction and Data Feeds
Today our monitoring is based on platform
specific tools
We need an accurate catalogue and topology map
of our platforms
Data and transaction rates are continuing to increase
which in turn will drive event volume
The mapping of applications to flows is
unique to ML
Greater clarity of business impact will
lead to improved processes and
applications
18
Scope of business activities to be monitored
Individual Clients Institutional Clients
GWM Services GMI Services
Exchanges
ECNs and other LIquidity Pools
News and Data services
Trusted 3rd Party Inbound Services
Ancilliary B2B Sevices
Settlement Services
Clearing Services
Regulatory Services
Other Notification Services
Books and Records (7)Clearing (0) Cash Management (7)Transaction Manager (6) Fails Management (2)Settlement Factory (6) Margin / CollateralManagement (3)
Front Office (10) Client Reporting/Confirmation (8) Depot Management (4) Corporate Actions & Dividends (9) Control (7) Static Data (3)Business Line
HO
US
E
EQ
UIT
YG
EF
RE
GIO
NS
PIE
RC
E
Derivatives
Cash
Repo
Derivatives
Cash
Portfolio
PB Debt
PB Equity
Stock Loan / Collateral
Jo’burg
Madrid
Equity Products
Debt Products
HO
US
E
DE
BT
Zurich
BTM
TESS
CTM
GSF
MEDUSA
BDA
CME system
RAM
RAM
FIDESSA
GRAPES
RAM
FIDESSA
BTM
CTM
TESS
BTM
CSW
EBAR
XTAS
CBAR
XTAS
EBAR
CAPS
MEDUSA
MEDUSA
XTAS
RECON
RECON
S3D
BUCS
JANUS
BOND MANAGER
BLOOMBERG
Janus
TMS
TODCARS
CASH MANAGER
TRELIS
CLAIM MANAGER
FINMAN
CAMS
XTAS
BAM CBC
ATS BDA
CME system
BDA
CME system CME system CME system
CLAIM MANAGER
TRELIS
CLAIM MANAGER
TRELIS
SX1
CLAIM MANAGER
TAX MANAGER
RAM
TRELIS
BAM CBC
CLIENT
MONEY
GRTS/GDFS
BDA
STT FTR MNG
MANUAL
MEDUSA
DMA
COPER
COPER
PME / COPER
COPER
COPER / PME
TESS
CAMS
REPO CAMS
CAMS
RAM
GCDB
CLIENT
MONEY
DMA
EBAR
CTM CBAR
TESS
EBAR
FAR
CMP
BBP
CMP CMPCAMS
BUCS
RAM
CMP
FINMAN
Paris, Milan and Frankfurt Archictecutre rolled into MLI
External pressures on banks
Volume, Latency, Reference Data Market Data Rates and other major feeds continue to
increase exponentially Low latency, DMA and Algorithmic Trading are combining to
cause significant feedback loops with subsequent volume spikes.
Latency metrics from Monitoring and Order Book systems are becoming as significant as the prices and volume quotes on them
System event rates are approaching those of major telcos
19
Marketplace Observations
Marketplace is weak None of the leading enterprise platform vendors have been
able to demonstrate large scale “dogfood” implementations of their own technology platforms in 2007
All enterprise monitoring solutions seem to be grown by an acquisition “strategy” rather than core engineering effort.
Technical solutions exist to many Finance Sector problems but recent focus is on Market data and Trading Systems Package monitoring.
20
B*M Confusion
Business Activity Monitoring Most vendors are pushing little more than Web 2.x widgets coupled to back end data SQL
or warehouse sources to draw nice pictures Credit Crunch Money Saving Tip :- Much of this “limited value” can be replaced by Excel
2007 services and Sharepoint
Business Service Monitoring Vendors are naively assuming that organizations can or have converted their entire
enterprise to very limited N tier (Where 2<=N<=4) “SOA” architectures
Neither mechanism contributes significantly to the improvement of: Root Cause Problem Determination Application and Enterprise Architecture Improvement Business Process Improvement
Industry Landscape
BPMN
IBM Banking Model /
Microsoft Motion
Customer
Product
Channel
Division
Flow
Region
Process
Step
Application
Regulation
Application Tier
Utility
Blueprint
Component
OMG BMM
Microsoft SDMITIL
BAM
CMDB
Business Layers
Industry Standards
Industry Concepts
Vendor Models
Organizational Structure
Implementation
Requirements Faultline
Our Architecture Objectives
Define a unifying, extensible, technology-proof fabric to embrace existing and future monitoring tools
Provide a single, high performance event space to support multiple application and infrastructure support roles and processes
Enable ML to focus on best of breed monitoring toolsEnable continuous, systematic process improvement
supported by a consistent, extensible range of dashboards
23
Macro observations
There is no Silver Bullet
There are no significant reference architectures, widely recognized industry best practices or academic research at the enterprise level in Finance.
There are no recognised enterprise monitoring solution consultancy practices in the Financial Services Sector
24
Some challenges: Dashboards and the “truth”
Many Dashboards – One Source of Data There are many different operational roles & processes that all need to
derive their actions from a single source of the truth There are many different dashboards required to support these “Mission
Specialists” Roles, Processes and Organizations change independently of the data
The CMDB Bottleneck Analysis of current CMDB offerings has determined that they will
struggle to sustain the reverse lookup rates we will require to map device events to platforms and then to applications
25
Models and Dashboards
Customer
Product
Channel
Division
Flow
Region
Process
Step
Application
Application Tier
Utility
Blueprint
Component
CA Cohesion Blueprint Models
ARIS Process Library
Vendor Virtual FabricsHP NetApp
XsigoEgenera PAN etc.
Internal Middleware Flows Catalogue
Application Architecture
Normalization Model
Internal Business Sales
and Trading Model
General Ledger
Microsoft PPS
Aris PPM/PEM
TBD
Vendor Component Dashboards
Existing or Known Models Candidate Dashboards
Business Layers
Regulation
Organizational Structure
Implementation
Requirements Faultline
UML
Some Basic Design Tenets
We have to be able to fully understand the impact of technology solutions and issues on Business Flows and Processes so that they can be continuously optimized to maximum ROI
Data Content and Latency Monitoring Data must contain sufficient detail to determine root cause of
technical problems wherever possible Monitoring Platforms must be able to provide detailed insight into our lowest
latency and highest volume flows ahead of business demands and at sub 1ms granularity.
Automation Monitoring Events must be able to correctly trigger transactional automated
break-fix processes and dynamic capacity fulfilment Standardization
The monitoring platform data will factually direct the future technical strategy and standards
27
28
Some Detail:-Normalized Physical Architecture
Our Implementation approach is to codify existing server inventory and classify servers into 5 application tiers
Service/UI Façade Distribution General Purpose Compute Database Tx Gateway
Basic Application Dashboards Will give a uniform view of each application by
functional tier Can then measure availability of each tier of each
application in a consistent fashion Identify and triage weak architectures
Process
Step
Application
UI or ServicePresentation
Distribution MIddleware
General Purpose
Compute and Caching
Data Persistence
Tx Gateway or other
Service Tier
Package
Module
Flow
Channel / Session
Transaction
Process Process
Process Model
Event model, transport and correlation – Logical view
Need to deal with Event Rates approaching Market Data Rates (>20K Events per second Globally)
System events now driven by business (market events)
Be mindful of scale growth in events and trading
Need to decide on most appropriate blend of technology
Cannot buy rulesets off the shelf
May be able to use multivariate analysis to determine significant correlations to validate rulesets
29
Process Model Database
Common Event Fabric Listener
Event Correlator
Common Event Fabric Listener
Common Event Fabric Listener
Event Fabric Publisher
Common Event Fabric Listener
Event Correlator
Common Event Fabric Listener
Common Event Fabric Listener
Event Fabric Publisher
Common Event Fabric Listener
Event Correlator
Common Event Fabric Listener
Common Event Fabric Listener
Event Fabric Publisher
Tool Specific Adapter and Type Filter
Event Fabric Publisher
CMDB
Device Event Probe
Device to Platform Mapping Engine
Event Fabric Publisher
Common Event Fabric Listener
Event Correlator
Common Event Fabric Listener
Common Event Fabric Listener
Event Fabric Publisher
Application Component Server
Business Process and Flow Model
Common Application Tier Model
Basic Event Model
Applications and Monitoring Tools
Shared Infrastructure (SAN etc)
Flow and Process Listener Fabric Loader/Manager
Application Component
Listener Fabric Loader/Manager
Basic Event Fabric
Distribution Model and Manager
Flow and Process Dashboards
Application Tier Component Dashboards
Business Flow and Process Correlation
Rules
Application Component
Correlation Rules
Basic Event Correlation Rules
30
Further precedence and correlation
NB Correlation occurs at multiple layers Components within a physical or virtual
server Architectural tiers of an application Components of a Business Process Components of an end to end Flow of
connected processes. We correlate both technology events and
business KPI metrics. Multivariate analysis can be applied to these
data flows to heuristically surface the key correlations.
31
Event model namespace
TransactionSessionModulePlatform Data
Event Type
Host
Monitoring Platform
Initiaiting or Receiving Host
Event Data
Module Name
Process ID
Session Type
Session ID (Queue Name
etc)
Transaction ID
May also be present
May also be present
May also be present
32
Monitoring event precedence model
There is a clear precedence to the classes of monitoring event.
Most monitoring tools are clustered at the software and session monitoring end of the spectrum
A physical error on a device or platform will produce a cascade of events for all the software modules that run on it and all the platforms that are connected to it.
The precedence hierarchy is used by the correlation engines distributed across the monitoring infrastructure to identify the most significant “root cause” event
The projected data rates mean that a significant amount of compute resource will be needed to perform the event correlation actions.
Power
External Connectivity
Other Denial of Service(DDOS Virus or Physical)
Capacity
Storage Failure
SAN/IP Swtich Failure
Compute Platform (Server) Failure
(Physical or Virtual)
Device Failure
(Physical or Virtual)
Package
Module
Session
Transaction
ITRS/Hawk
Wily / TS-A
Optier
Tivoli/Netcool
Vendor platform
(Brocade, EMC etc)
Severity Recovery LatencyOther
Data Elements (Reference etc)
33
Overall architecture layers + stores view
Infrastructure Asset Data Dashboards
Zone
Site
Region
Global IOCC
Minimum Active/Active
clustered pair per zone
Minimum Active/Active clustered
pair per Site
Consolidated Application <-> Hosts CMDB
Region Specific Process Models,KPIs
& HeuristicsSnake Wolf etc
Larger scale distributed pub/sub
fabric needed
Flow and Process
Zone Specific DashboardsApplication Host Status:- ITRS/Hawk
JVM/CLR App Runtime Status: CA-WilyVLAN Traffic Analysis:- TS-A
Site Specific DashboardsApplication Structure Status TBSM/RAD
Application Transaction Flow:- OptierPhysical Capacity Metrics:- TeamquestApplication Host Status:- ITRS, Hawk
Network Hop Latency TS-A
Region Specific DashboardsBusines Process:- Aris PPM/PEM, SystarApplication Structure Status: -TBSM/RAD
Global DashboardsBusines Process:- Aris PPM/ PEM,
Tier 0 Application Structure Status: -TBSM/RAD
Global / Divisional Process Models,KPIs &
Heuristics
Automated T3 Processes/Workflows
(Opalis iConclude, Autosys etc)
T3 Automated Process
Execution Platforms
Larger scale distributed pub/sub
fabric needed
Collection & Correlation Monitoring Data and Rules
Zone Topology Map
for site
Global Application Map and
Correlation Rules
Zone specific log and monitoring history
ITRS Hawk Wily TS-A
Regional Application Event consolidation & correlation
Global Flow and Process consolidation &
correlation
Site Application Event consolidation and correlation
Site Collection and Correlation Fabric Loader/Manager
Regional Application Map and Correlation Rules
Capacity Data
(TeamQuest)
Zone<->SiteRouting
Site <-> RegionRouting
Region<->IOCCRouting
Region Collection and Correlation Fabric Loader/
Manager
Global Collection and Correlation Fabric Loader/
Manager
MLAI MHS
Site Infrastructure Device Repositories (Brocade,VMWare
ESX etc)
Application Hosts to be Monitored
Dev QA Prod
V 0.4Work in Progress
(RDEB)
Zone Application Event consolidation and
correlation3rd Party Gateways
Global TCIC Database
34
Solution heat map
Commercial Tools
Bespoke Monitoring and Logging
Normalization
Concentration
Correlation
Zoning
Application Aggregation
Flow and Process Aggregation
Base candidate probe technologies identified and initial PoCs completed
Java and .NET logging libraries complete and integrable + C/C++ to be completed
Need to have an accurate topology and metrics of event rates to define Zoning policy
Outline Format and type schema defined - key issue is populating messaging format fully as applications
transition to new logging libraries
First level concentration processes will be distributed in zones. Resilience and Performance metrics TBD
External discussions with potential solution partners and vendor have validated the approach
Event precedence rules need more definition and will evolve over time. Candidate solution technologies
exist. Scaling/Resilience requirements are dictated by Zoning and Normalization
Need to solve the problem of automatic population of TBSM/RAD dashboards from MLAI/MHS inventory
databases
Candidate Technologies identified for mapping events to process models derived from IDS Scheer/
ARIS
•In order to define the solution architecture a number of PoC’s were carried out to provide point solutions and assess technology capabilities
•At this point we have established coverage of all the technical components necessary however we will need to carry out a broader end-to-end integration PoC
Summary
SDLC /QA
Application Portfolio Management Monitor
Resource Management
& Governance
Grow the business Run the Business
Technology
Business
Applications Infrastructure
ITIL
Service Management
Our Monitoring architecture surfaces the structural performance of an application in the context of the governing business process
Our SDLC surfaces the structural components of an application from its business requirements
The combination of performance and development metrics will allow us optimize the resources we need to satisfy our business demand
The need to know
Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns - the ones we don't know we don't know
Donald Rumsfeld