El automático marcará la gestión de las Amenazas...

El aprendizaje automático marcará la gestión

de las Amenazas en Ciberseguridad

Francisco J. Gomez Elizabeth Gonzalez

Logtrust Inc.

Roadmap

• Definiciones • Aprendizaje • Comparación entre ellos • Los datos • Tipos de aprendizajes

Aprendizaje Automático - Amenazas - Ciberseguridad

NCA: Cyber Crime Assessment 2016

“The ONS estimated that there were 2.46 million cyber incidents and 2.11 million victims of cyber crime in the UK in 2015.“

Office of National Statistics

Fuente: http://www.nationalcrimeagency.gov.uk/publications/709-cyber-crime-assessment-2016/file

Demasiados

Virginia Aguilar, Responsable del Centro de Coordinación de la Capacidad de Respuesta a Incidentes Informáticos de OTAN - NCIRC CC

Dónde está el ML

http://www.youtube.com/watch?v=Ndeb1pMAsh4

Dónde está el ML

“In an incredibly complex operation, machine learning in online advertising can help determine the optimal amount to bid on every single impression.”

Fuente: https://www.refuel4.com/blog/2016/07/21/machine-learning-advertisings-next-big-thing/

Face(Recognition)Book

Definiciones

IA, Machine Learning, Deep Learning

Timeline

ARTIFICIAL INTELLIGENCE

MACHINE LEARNING

DEEP LEARNING

Machine Learning

Machine learning is an application of artificial

intelligence (AI) that provides systems the

ability to automatically learn and improve from

experience without being explicitly

programmed. Machine learning focuses on

the development of computer programs that

can access data and use it learn for

themselves.

expertsystem.com

The field of machine learning is concerned with

the question of how to construct computer

programs that automatically improve with

experience.

Tom Mitchell

Machine learning studies computer algorithms

for learning to do stuff. We might, for instance,

be interested in learning to complete a task, or to

make accurate predictions, or to behave

intelligently. [...]

The emphasis of machine learning is on

automatic methods. In other words, the goal is

to devise learning algorithms that do the learning

automatically without human intervention

or assistance.

Robert Schapire

Machine Learning is the way we

are going to automate your

automation.

Chris Wright, RedHat CTO

Aprendizaje Manual

2x + 3y = 24 4z + 7i = 36

Regla 1 ... Regla n

Set de datos

Modelado

Aprendizaje Automático

?ax + ?by = 24 ?cz + ?di = 36

Regla 1 ... Regla n

Set de datos

Modelado

Fuente: http://mbmlbook.com/LifeCycle.html

Comparativa

• Análisis Manual • Cualquier caso no

modelado queda sin procesar

• Necesitas menos casos

• Menor error • Menor amplitud

• Análisis Automático • Los casos nos

modelados pueden ser procesados

• Necesitas de un set de datos elevado

• Más errores • Mayor amplitud

Datos

El dato

determina el mejor

algoritmo

El dato y la aplicación

determina el mejor

algoritmo

Ciberseguridad

Clasificación

• Agrupación • Agregación • Disección

Detección

• Anomalias

• Patrones

Tipos de aprendizaje

• Supervisado – Los datos de entrenamiento consisten de pares de objetos donde una componente del par son los

datos de entrada y la otra son los resultados deseados. Datos etiquetados.

• No supervisado – Los datos de entrenamiento no contiene el resultado. Datos no etiquetados.. Su función suele ser

clasificar los datos en busca de patrones.

• Semi-Supervisado – Supone la unión entre los dos anteriores y se basa en utilizar un set de datos etiquetados mucho menor

que el set de datos etiquetados para mejorar los aprendizajes no supervisados.

Aprendizaje Supervisado

Gradient Boosting Machine (GBM)

Detección de ataques de red

Dataset: UNSW-NB 15 dataset:

https://www.unsw.adfa.edu.au/australian

-centre-for-cyber-

security/cybersecurity/ADFA-NB15-

Datasets/

- Training set: 175,341 rows

- Validation set: 82,332 rows

https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/














Parametrización

● ntrees: this option specifies the number of trees to build in the model, the default is set to 50 in R.h2o.

● max_depth: the maximum depth for each tree, the default is 5. Deeper trees lead to better results in general but

this can overfit the data and increase the execution time.

● sample_rate: row sample rate per tree, from 0 to 1, default is set to 1.

● col_sample_rate_per_tree: column sample rate per tree, from 0 to 1, default is set to 1.

● col_sample_rate: column sample rate per split, from 0 to 1, default is set to 1.

● col_sample_rate_change_per_level: relative change of the column sampling rate for every level (from 0.0 to 2.0)

Defaults to 1.

● histogram_type: this option specifies the type of histogram to use for finding optimal split points

● min_rows: this option specifies the minimum number of observations for a leaf in order to split, default is set to 10.

● min_split_improvements: this option specifies the minimum relative improvement in squared error reduction in

order for a split to occur. When properly tuned, this option can help reduce overfitting because the algorithm will

stop splitting when all the possible splits lead to worse error measures.

● nbins: for numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point

Defaults to 20.

● nbins_cat: for categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher

values can lead to more overfitting. To make a model more general, decrease nbins_cats. To make a model more

specific, increase nbins and/or nbins_cats. Keep in mind that increasing nbins_cats can have a big impact on the

amount of overfitting.

ROC Curve

Selección de características

Aprendizaje No Supervisado

K-MEANS Clustering

Detección de ataques en tráfico HTTP

Dataset: NLS-KDD (http://www.unb.ca/cic/research/datasets/nsl.html)

duration,src_bytes,dst_bytes,logged_in,num_compromised,count,srv_count,serror_rate,srv_serror_rate,rerror_rate,srv_rerror_rate,same_srv_rate,diff_srv_rate,srv_diff_host_rate,dst_host_count,dst_host_srv_count,dst_host_same_srv_rate,dst_host_diff_srv_rate,dst_host_same_src_port_rate,dst_host_srv_diff_host_rate,dst_host_serror_rate,dst_host_srv_serror_rate,dst_host_rerror_rate,dst_host_srv_rerror_rate

• 40337 eventos

• 7 tipos de ataque

http://www.unb.ca/cic/research/datasets/nsl.html

Detección de ataques por HTTP

Algoritmo de clustering: K-means

Objetivo:

Obtener una serie de puntos que en cierto modo representan al resto de puntos iniciales por su posición privilegiada con respecto al total (Patrón).



Resultado del proceso de clustering

Detección de ataques por HTTP


Name: Normal

Name: Back



Name: Mix_Nor1

Name: Portsweep



Name: Mix_Nor2

Name: MixNor_Nep

Detección de ataques por HTTP Clasificación

Detección de ataques por HTTP Predicción del comportamiento

Aprendizaje Semi Supervisado

Deep Learning (Autoencoders)

Detección de de fraude en tarjetas de

crédito

Dataset: Transacciones realizadas en Septiembre del 2013 492 fraudes en 284,807 transactions Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amou

nt


crédito Autoencoders

Deep Learning


crédito Autoencoders


crédito Umbral


crédito Resultados

90.0% -> detección de casos de fraudes

99.5% -> detección casos de no fraude

Date post:	03-Oct-2018
Category:	Documents
Upload:	duongkhanh
View:	214 times
Download:	0 times

El automático marcará la gestión de las Amenazas...

Documents