+ All Categories
Home > Documents > Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch...

Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch...

Date post: 22-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
25
Proprietary Compliance & Risk Congress 2020 www.interworks.com.mk
Transcript
Page 1: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

Compliance & Risk

Congress 2020www.interworks.com.mk

Page 2: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

2

Page 3: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

3

Technology is developing

exponentially and organizations

are still stuck in linear thinking!

Page 4: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

4

Page 5: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

5

Daily Doses Of Fake

NEWS!!!

Page 6: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

6

Page 7: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

7

Did anything change? No, we are still trying to make timely decisions which impact the organization as a whole!

The devaluation of information and the impact over time!

Page 8: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

8

What is Artificial Intelligence?

A theory of development of computer systems Able to perform

tasks normally requiring human intelligence!

But actually AI is the sum of its parts!

Page 9: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

9

AI can transform how banks perform AML, KYC, compliance

▪ AML & KYC▪ Mine huge volumes of data for risk

relevant facts

▪ Simplify the process of identifying higher risk clients

▪ Repetitive tasks▪ Saving valuable time, resource focus on

higher client value tasks

▪ NLP & ML▪ Leapfrog automation across large parts of

clients life cycle management

▪ Intelligent document scanning, enhance KYC process for new client onboarding

Page 10: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

10

How do you put this in practice?

Page 11: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

11

Collect Your Data: Extract, Transform, Load

▪ Data Types:

o Structured (databases)

o Semi-structured (JSON, CSV etc.)

o Unstructured (server logs, audit logs, email messages, audio call recordings)

▪ Final destinations for consolidation:

o Data Warehouse

o Elasticsearch

o Data Lake: Hadoop HDFS, Amazon S3

▪ Select your source data, filter, transform, enrich and save it into the final destination

▪ Collection and ETL Tools and Platforms:

o SnapLogic, MuleSoft, Tibco

o Apache Spark for batch and stream processing

o Beats, Logstash, Fluentd, Apache Flume as data collectors

o Kafka, Amazon Kinesis for real time data stream processing

o Amazon Kinesis Firehose for collection and storage into the Cloud (Amazon S3, Elasticsearch, Redshift,

Splunk)

Page 12: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

12

Collect Your Data: Extract, Transform, Load – cont.

Examples:

▪ SnapLogic periodically polls the source database, picks up the changed records, transforms and stores the

records into Data Warehouse (Redshift, Snowflake, SQL Server Analysis Services etc.)

▪ Kafka Debezium Connecter continuously reads database transaction logs (Change Data Capture i.e. CDC) and

sends events to configured Kafka Topic for further processing, alerting or storage.

Page 13: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

13

Security Risks and Compliance

Be aware:

• Where your data is stored across the enterprise

• Pay attention how that data has been classified, does it contain Personally Identifiable Information etc.

• Who has access to data and systems

• When that data has been created or changed

• Are there unusual system access patterns, suspicious IP addresses, anomalies and outliers?

Example 1: Continuously monitor your file servers, detect file changes and evaluate the file content with each

change. If the document has been automatically classified as sensitive and its location is not appropriate, the system

shall raise an alert.

Open Source Solution Architecture

• Elasticsearch File Integrity Module monitors the

files

• File metadata is streamed into Logstash

• Logstash destination evaluates and classifies the

file as Sensitive/Not Sensitive content using

previously trained Machine Learning model

(fasttext), spaCy NLP, Rule Engines and/or Regular

Expressions

Cloud Solution

Amazon Macie (https://aws.amazon.com/macie/) can

automatically detect and classify documents stored into

S3 buckets by using Machine Learning . This security

service detects automatically PII information (optimized

for English language only at the moment)

Page 14: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

14

Security Risks and Compliance – cont.

Example 2: Continuously monitor your Web Server traffic logs for unusual access patterns. The system shall be

able to continuously learn the traffic patterns and raise an alert if some access event (user id, resource, remote IP,

time) seems to be quite unusual.

• Elasticsearch Filebeat sends Web server log lines to Logstash

• Logstash filters the financial transaction logs

• Logstash destination evaluates the pairs <username, ip-address> and classifies them as Usual/Unusual. The

evaluation is against Deep Learning ML model created with the AWS Sagemaker IP Insights algorithm

Page 15: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

15

Predict Your Customer Behavior

Combine your Customer’s:

• Basic info: age, education, marital status, job … and

• Interaction outcomes for that customer: previous campaigns, number of calls, call channels, products used …

Train Machine Learning algorithm using these features that will give us insights into the customer’s behavior: predict

deposits, customer churn, recommend products etc.

Examples:

• Train ML model (Logistic Regression, XGBoost etc.) that will predict if the customer will place a deposit or not

• Collaborative Filtering for product Recommendations:

o The bank is promoting several products to their Customers. Each Customer has recently engaged with zero,

one or more of those products (Customer_i, Product_j);

o Based on the existing Customer/Product combinations build ML model (Alternating Least Squares,

Factorization Machines) that is able to predict the possibility of Customer_i using Product_j.

Product 1 Product 2 Product 3 Product 4

Customer 1 x x x?

Customer 2 x x? x? x

Customer 3 x? x x

Page 16: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

16

Fraud Transaction Detection

Financial transactions can be described with several features related to the Customer’s:

• Basic info: age, education, job, join date, zip code…. and

• Transaction related features: transaction amount, transaction timestamp, currency, customer present etc.

This information is usually stored as hierarchical parent-child one-to-many structures.

Machine Learning process can significantly benefit with additional per transaction features created by aggregating

historical transaction information like:

• Average transaction amount for particular card

• Average time between subsequent transactions for particular card

• Average transaction hour for particular card etc.

Page 17: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

17

Fraud Transaction Detection – cont.

The augmented transaction feature set can be generated with Automatic Feature Engineering based on Deep

Feature Synthesis:

• Deep Feature Synthesis: Towards Automating Data Science Endeavors:

https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf

• Solving the false positives problem in fraud prediction using automated feature engineering:

https://dai.lids.mit.edu/wp-content/uploads/2018/07/bbva_ecml.pdf

Manually generating additional transaction features is a tedious process!

There are hundreds of possible features to be extracted!

Existing Transaction Features

a b c d

Tx_1

Tx_2

Tx_3

Auto Generated Features

Mean(a) Std(a) Mean(b) Std(b) Mode(c) Hour(d) Wday(d) … …

Page 18: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

18

Fraud Transaction Detection – cont.

Example:

• Use Feature Tools (https://github.com/FeatureLabs/featuretools) for generating rich feature ML training set

based on the customer’s historical purchase information. Besides the existing features like customer age,

education, transaction amount, transaction timestamp, each credit card transaction is enhanced with

additional features like:

o card.MEAN(transaction.amount)

o card.STD(transaction.amount)

o card.AVGTIMEBETWEEN (transaction.timestamp) etc.

feature_matrix = ft.dfs(

entityset=es,

target_entity= “transaction”,

agg_primitives=[mean,std, average_time_between]

trans_primitives=[day, weekend] )

• Now we have rich transaction feature set that takes into account many factors that matter in the classification

process… Build Classification ML Model (XGBoost, Random Forest) that will predict if some transaction is

fraudulent or not.

• Stream new transaction related events in real-time (Kafka, AWS Kinesis), evaluate them against the

Machine Learning model if they are fraudulent or not and raise an alert if needed.

Page 19: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

19

Fraud Rings Detection

• Fraud ring is a group of two or more people that share common legitimate contact information like Phone,

Address, SSN etc.

• These people create bank accounts using synthetic identities

• They behave normally at the beginning

• Normal behavior leads to unsecured credit card lines, personal loans etc.

• After a while, ring members coordinate their activities and maxes out all of their credit lines

• The ring disappears after that

In order to early detect possible Fraud Rings, we have to analyze the relationships between evolved entities: accounts,

contact information, transactions…

Traditional Relational Databases are NOT a good fit for this kind of analysis that shall support efficient and time

effective detection of complex relationships or patterns.

Instead…

Graph Databases like Neo4j can help us find these relationships, visualize graphs and detect the fraud rings along

with the possible financial impact!

Note: Anti-Money Laundering investigation demo available at:

https://www.youtube.com/watch?v=J7BNKV2Lqy0

Page 20: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

20

Fraud Rings Detection – cont.

Neo4j case study for Fraud Ring Detection:

https://github.com/neo4j-contrib/gists/blob/master/other/BankFraudDetection.adoc

MATCH (accountHolder:AccountHolder)-[]->(contactInformation)

WITH contactInformation, count(accountHolder) AS RingSize

MATCH (contactInformation)<-[]-(accountHolder),

(accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]-

>(unsecuredAccount)

WITH collect(DISTINCT accountHolder.UniqueId) AS AccountHolders,

[ . . . ]

RETURN AccountHolders AS FraudRing,

labels(contactInformation) AS ContactType,

RingSize,

round(FinancialRisk) as FinancialRisk

ORDER BY FinancialRisk DESC

Fraud Ring Contact Type Financial Risk

["MattSmith","JaneAppleseed","JohnDoe"] ["Address"] 34387.0

["MattSmith","JaneAppleseed"] ["SSN"] 29387.0

["JaneAppleseed","JohnDoe"] ["PhoneNumbe

r"]

18046.0

Cypher Query for Link Entity Analysis

Page 21: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

21

Fraud Rings Detection – cont.

Previous analysis defines particular Fraud Ring based on one single shared contact information. But, Matt shares

something with Jane, Jane with John etc. implying transitive relationships among them.

Neo4j can help us find the complete ring as connected component in the graph by using the Weakly Connected

Components (Union/Find) algorithm!CALL algo.unionFind.stream(

'MATCH (n) WHERE n:AccountHolder OR n:SSN OR n:PhoneNumber OR n:Address RETURN id(n) AS

id',

'MATCH (n)-[r:HAS_ADDRESS|HAS_SSN|HAS_PHONENUMBER]->(m) RETURN id(n) AS source, id(m)

AS target',

{graph: "cypher"})

YIELD nodeId, setId

WITH algo.asNode(nodeId) AS n, setId AS ringId

MATCH (n:AccountHolder)

WITH n as accountHolder, ringId

WITH ringId,collect(DISTINCT accountHolder) AS ring,count(accountHolder) as ringSize

WHERE ringSize>1

UNWIND ring AS accountHolder

MATCH(accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]->(unsecuredAccount)

[….]

Fraud Ring Financial Risk

["AliceJohnson", "BobTrudy"] 300000.0

["JohnDoe", "JaneAppleseed", "MattSmith

"]

34387.0

Page 22: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

22

Real time ATM Fraud Detection

• Streaming systems like Kafka or Amazon Kinesis can help us identify suspicious events based on their temporal

(when) and/or geospatial (where) attributes

• Streaming systems implement Continuous Query SQL like functionality

Kafka ksqlDB Example (https://www.confluent.io/blog/atm-fraud-detection-apache-kafka-ksql/):

• ATM transaction related information is streamed into Kafka Topic

• Kafka ksqlDB analyses the transaction events (KSQL i.e. streaming SQL) in real-time and identifies if there are

transactions from the same account within relatively short time interval having large geographical distance

between corresponding ATM locations

• Alert if there are such suspicious transactions.SELECT

[…]

(T2.ROWTIME - T1.ROWTIME) AS MILLISECONDS_DIFFERENCE,

GEO_DISTANCE(T1.location->lat, T1.location->lon,

T2.location->lat, T2.location->lon, 'KM') AS

DISTANCE_BETWEEN_TXN_KM,

[…]

FROM

[…]

WITHIN (0 MINUTES, 10 MINUTES)

Page 23: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

23

What’s Next?

Deep Learning based products that deal with text, images, voice, language translation etc. have been created in

the past several years. These products allows us to combine them and create interesting architectures that will bring

value to the business.

“Exotic” Example: Detect Customer’s sentiment from the Call Center audio records

• Amazon Transcribe (Dutch supported): Convert the Call Center’s audio records into text

• Amazon Translate: Translate Dutch text into English (Dutch not supported in Amazon Comprehend yet)

• Amazon Comprehend: Analyze the sentiment of the text

• If there were many negative sentiments recently, take an action and manage the Customer Churn Risk!

Page 24: Compliance & Risk Congress 2020 - InterWorks...o SnapLogic, MuleSoft, Tibco o Apache Spark for batch and stream processing o Beats, Logstash, Fluentd, Apache Flume as data collectors

Proprietary

24

Summary

• Define your Business Goals and Values

• Define your Security Compliance Requirements and Risks

• Be aware of your data scattered across the Enterprise

• Create Data Catalogs

• Monitor the access to your data

• Normalize and transform your data for further BI or ML tasks

• Store your data for analysis into alternative database engines like Graph Databases. Relational databases are not in

the center of the universe anymore!

• Stream your data, perform real-time analysis and take immediate actions, don’t wait!

• Find the hidden patterns in your data using Machine Learning and make informed decisions!

• Combine the AI and ML technologies and products available on the market into brand new functionalities that will

help your business automate many task!

Data Management, Mining and Analysis is not a project.

IT IS A PROCESS!


Recommended