Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | dataworks-summithadoop-summit |
View: | 1,553 times |
Download: | 1 times |
Big Data Application Architectures – Fraud DetectionNishant ThackerTechnical Product Manager – Big DataMicrosoft
AgendaDefine the problemEstablish the expected outcomeDive into each pillarDetermine a SolutionUnderstand the applicability
Financial Institutions risk
EV
Loss of Charterand a host of other penalties through noncompliance with
federal money laundering legislation.
Detect fraudulent activity, theft, and money laundering
Prescribe what to sell, when, where, and to whom
Reduce risk while complying with legal requirements
Prevent customers from leaving
Big Data Evolution
Legacy Systems Current SystemsBig Data
Advanced Analytics
Timely Info Accurate Thoughtful
Opportunity $
Marketing Operations Bankers CEOs
• Next Best Action• Recommended
Interventions• Lifestyle Yield
Management• Seasonal Personal
Impact
• Theft Profiling• Fraudulent Transaction
Identification• Remote Shutdown• Site Monitoring
• Recommended Interventions
• Risky Customer Profiling• Call Center Monitoring• Churn Scoring
• Payment System Errors• Money Laundering
prevention• Compliance• Data Entry Intervention
?
Personalization of offers & banking experience
Risk Reduction & Compliance
Customer Churn PreventionFraud Detection
Areas of Opportunity for Financial Analytics
Expected Outcome
• Rejected Transactions• Real Time Alerts• Real Time Dashboards• Automated Learning and Improvement – Batch and Real Time• Audit Trails and Analytics
$
Big Data Challenges
VolumeIt’s big.
VeracityIt’s unverified
VarietyIt’s different
VelocityIt’s fast.
Architectural Considerations• Storage State
• Cached• Distributed Cache• Distributed Storage
• Profile Storage• HDFS• HBase
• Ingestion Framework• Kafka• Sqoop• Event Hubs• IoT Hubs
• Stream Processing• Storm• Spark• Flink• Azure Stream
Analytics
• Analytics• Batch• Interactive
• Machine Learning• Standalone• Scale out
Fraud Detection Reference Architecture
Apps data from devices
News and other alerts
Solution UX
Provisioning API (Pull)
User Profile Information
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
User Recent Activity Store
Gateway
Data Lake
Gateway
App Backend
Data PathOptional solution componentMain solution component
Thin Client
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Personal mobile devices
Trades and/or transactions
Business systems
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal mobile devices
Business systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data from devices
News and other alerts
Gateway
Data PathOptional solution componentMain solution component
Thin Client
Trades and/or transactions
DemoWoodgrove Financial
User Profile and Metadata Stores
App Backend Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
User Recent Activity Information
Data Lake
Gateway(Kafka, IoT Hub,
Event Hubs)
Data PathOptional solution componentMain solution component
Metadata Store
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
Device Identity, Registry and State StoresMetadata Store
Authority for all registered sources Stores identity information and authentication secrets
User Profile InformationIndexed list of all Users and their demographics – Secure, Governed, Audit
ControlledContains discovery and reference data related to UsersCan define a schema model or use a vertical industry standard schema for
metadataCan contain structured metadata and links to externally stored operational data
User Recent ActivityContains operational data related to the Users’ most recent activities: - “Last known values” for each User - Aggregated or computed values - Stream of device data events containing Geo location and Time based tagging
Stream Processors
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
Device State Store
Data Lake
Data PathOptional solution componentMain solution component
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
Stream Processing: Data FlowAfter ingress through the Gateway (Ingestion), the flow of data through the system is facilitated by data pumps and analytics tasks
Data flow can be driven by:• Apache Storm on Azure
HDInsight• Apache Spark on Azure
HDInsight• Azure Stream Analytics• Custom Event
Processors
Each can perform tasks in flight:• Data aggregation• Data enrichment • Complex event
processing
… and can output data to:• Azure Data Lake• Azure Blobs/Tables• HDInsight / HBase• Azure SQL DB • Time Series Databases• Event Hub • Service Bus Queues
Stream Processor Examples
Queue
Device Registry StoreDevice Metadata Processor
Data Lake
Device State StoreDevice State Processor
Notification Processor
Raw Telemetry Processor
App Backend
Rules Processor
Event HubStream Transformation Processor
Secondary Stream Processor
Data PathOptional solution componentMain solution component
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
App Backend
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
Device State Store
Storage
Cloud Gateway
Data PathOptional solution componentMain solution component
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
High-Scale Compute ModelsScale-appropriate compute models
Actor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors. Service Fabric Reliable Collections: highly available with replicated and local state management.Azure Batch: job scheduling and compute management for highly parallelizable compute workloads.
Simple programming logic in vastly scalable compute nodes
Data Analytics
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
Device State Store
Data Lake
Cloud Gateway
Data PathOptional solution componentMain solution component
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
Data Analytics
Event HubNRT Events
Stream Processing
(ASA, Storm or Spark)
Alerts
Batch Events
Fetching & Updating
Reference Data
Interceptor (Rules)
Spark
Hive/Pig
U-SQL
Azure Data Lake Store Azure Data Lake Analytics
SQL DB
ML
Reports and Dashboards
Real Time Scoring
Training ML Models
Relational Data
Data AnalyticsReal-Time Analysis Aggregation/Reduction, Temporal Queries, State Correlation, Threshold Detection, Alerting
Data-At-Rest AnalysisTime-Series, Map/Reduce, Correlation
Machine LearningPattern Detection, Behavior PredictionPlausibility Analysis, Anomaly and Fraud Detection
Power BI
HDInsight
Stream Analytics
Data Factory
Machine Learning
Presentation and Business Connectivity
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
Device State Store
Data Lake
Cloud Gateway
Data PathOptional solution componentMain solution component
Gateway
Trades and/or
transactions
Thin Client
News and other alerts
Apps data from devices
WebHDFS
YARN
U-SQL
Analytics Service HDInsight(managed Hadoop Clusters)
1
1
1
1
1
1 1
1
1
1
1
1
Analytics
Store
Azure Data Lake
Cortana Intelligence Suite
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards & Visualizations
Cortana
Bot Framework
Cognitive Services
Power BI
Information Management
Event Hubs
Data Catalog
Data Factory
Machine Learning and Analytics
HDInsight (Hadoop and Spark)
Stream Analytics
Intelligence
Data Lake Analytics
Machine Learning
Big Data Stores
SQL Data Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal mobile devices
Business systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data from devices
News and other alerts
Gateway
Data PathOptional solution componentMain solution component
Thin Client
Trades and/or transactions
Money Laundering PreventionFraud Detection
$$$$$ ¥£ € £
Placement Layering Integration
Process
Know your Customer
Transaction Monitoring
Pattern Detection
Machine Learning
Decision Tree
Classification
Cluster Analysis
Cloud
VisualizationMachine Learning & AnalyticsFinancial DataInformation Management
Anti-Money Laundering
Power BI
Fund monitoring dashboard
Power BI / Azure WebsiteAzure Services
Big Data Storage for Multiple
Sources
HDInsight
Azure Data Lake
Azure Data
Warehouse
SQL Azure Azure Machine Learning
SQL
Financial Data
Real-time fraud detection feedback
Information Services
HDInsight
Streaming Analytics
Data Science Modeling • Similar to linear regression• Weights independent
variables• Useful with categorical
independent variable• Offers coefficients to inform
management decision-making
• Very useful with internal analytical teams to interpret data
• Useful for diagnosing gaps in data and customer outreach
• Helps drive understanding of demand drivers
• Uses decision trees & votes• Forest
• Compares results between various outcomes
• Votes upon outcomes • Evaluates based upon a
series of logical questions or “forest”
• Jungle• Useful when a forest
produces too many logical branches
• Produces a series of weighted edges and nodes
• Trained in input data• Useful for complex tasks,
like speech recognition when allowed to train in depth
• Very good with complex interactions
• Enables retailers to better identify behaviour patterns & certain shopping activities
Multi-Class Logistic Regression Multi-Class Neural Network
Multi-Class Decision Forest or Jungle
Reference Architecture & Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &Machine Learning
Business Integration Connectors
and Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal mobile devices
Business systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data from devices
News and other alerts
Gateway
Data PathOptional solution componentMain solution component
Thin Client
Trades and/or transactions
Q&A
@nishantthacker
Click icon to add picture
© 2016 Microsoft Corporation. All rights reserved.