1 An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher /...

Post on 20-Jan-2016

230 views 7 download

transcript

1

www.bitdefender.com

An Anti-Spam filter based

on Adaptive Neural

Networks

Alexandru Catalin Cosoi

Researcher / BitDefender AntiSpam Laboratory

acosoi@bitdefender.com

2

www.bitdefender.com

Neural Networks

a large number of processing elements, called neurons

a different approach in problem solving

neural networks and conventional algorithmic computers complement each other

3

www.bitdefender.com

Adaptive Resonance Theory

Proposed by Carpenter and Grossberg in 1976-86

Solves the stability – plasticity dilemma

ART architecture models can self-organize in real time producing stable recognition while getting input patterns beyond those originally stored

Contains two components: an attentional and an orienting subsystem

The orienting subsystem works like a novelty detector

4

www.bitdefender.com

ARTMAP

ARTMAP a class of Neural

Network architectures perform incremental

supervised learning multi-dimensional

maps input vectors

presented in arbitrary order

Fuzzy ARTMAP features presented in

fuzzy logic

5

www.bitdefender.com

System

A complex system that will

gather the spam and ham corpus

study its characteristics learn no human involvement

6

www.bitdefender.com

Inputs

words like viagra, mortgage, xanax

obfuscated words information extracted

from headers other heuristics used in

Anti-Spam filters

7

www.bitdefender.com

Hierarchy

Initial implementation: single neural network Increasing number of heuristics Increasing number of training items Train both on spam and ham Improvements

Next step: multiple neural networks (a hierarchy) Run only requested heuristics Perform a refined classification Split email into several categories Increase detection speed Learn new patterns without losing detection on older spam

8

www.bitdefender.com

Hierarchy

9

www.bitdefender.com

Correction module and noise reduction

Performs noise reduction on the input data before entering the learning phase

Increases discrimination rate between the input patterns Eliminates or modifies patterns that can cause misclassification

(same pattern for multiple categories)

10

www.bitdefender.com

Results

11

www.bitdefender.com

Results

Table 3: Detection results on an increasing number of training items. Both train and test corpus were analyzed.

Detection results on training items

Detection results on test items

12

www.bitdefender.com

Conclusions

Fast learning method Solves the stability – plasticity dilemma (property preserved from the

ART-modules) Improves consistently the heuristic filter

• Faster• The analysis is based on pattern recognition

Performs a refined analysis High detection rates Advanced categorization Multiple spam categories Can also be used for parental control Can perform email classification (business, school, personal)

In conclusion, this system improves both speed and detection