Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | guest62227f |
View: | 438 times |
Download: | 0 times |
SP.a.M/\TØ
by
Keno AlbrechtNicolas Burri
Roger Wattenhofer
Spamato
An Extendable Spam Filter System
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Motivation
• Countless number of different spam filters– Google: 1,740,000 hits (not spam filters)– Freshmeat/Sourceforge: 404/420 projects– Several "once-only" research projects
• Client-side filtering (vs. server-side)– Email Client Add-On: Outlook (Express), …– Proxy: Mediator between Client and Server– Stand-alone: Proprietary “email clients”
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Project Goal
• Build an extendable spam filter system to…– ease the development of filters; provide filter
container – help implementing tools for common tasks– support as many email clients as possible
• Encourage filter developers to use our framework
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Subject: Free Spam Filter SystemTo: [email protected]: [email protected]
Dear Spam Filter Developer,
This is your once-in-a-lifetime opportunity to use the free spam filter system Spamato. Spamato aims to bring a practical, easy-to-use, and effective spam filter technology to the user’s desktop. It has been designed to be used primarily as an add-on for several email clients. The combination of multiple filtering techniques leads to a high spam detection rate and a low false-positive rate. It offers a variety of features that simplifies your life as a spam filter developer.
Do not reinvent the wheel!Write your filter in an instance!
Use Spamato!Visit our homepage at http://www.spamato.net. To unsubscribe click here.
The Spamato-Team
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Architecture
Java• platform independent
Depending on Add-on:• Visual Basic• Java Script• …
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Filtering Process
Emails are processed in five phases:
(1) Initialization
(2) Pre-Check
(3) Check
(4) Decision
(5) Post-Check
• Email client receives email, forwards it to Spamato, and waits for check result.
Spamato Base
Filter 1
PreCheck(msg)
Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filter 2
PreCheck(msg)
Filter N
PreCheck(msg)
veto1(msg) veto2(msg) vetoN(msg)
. . . . .
DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
Post Check
Filter1
Filter2
...
FilterN
msg msg msg
isSpam1(msg) isSpam2(msg) isSpamN(msg)
isSpam(msg)
msg msg msg
Filter 1
Check(msg)
Filter 2
Check(msg)
Filter N
Check(msg)
isSpam(msg)
veto(msg) == trueignore this msg
. . . . .
msg isSpam(msg)
Filtering Process
(1) Initialization
• Veto against further processing
(Configuration, Sender-whitelist)• Gain information for other plugins (URL extractor)
Filtering Process
(2) Pre-Check
Spamato Base
Filter 1
PreCheck(msg)
Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filter 2
PreCheck(msg)
Filter N
PreCheck(msg)
veto1(msg) veto2(msg) vetoN(msg)
. . . . .
DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
Post Check
Filter1
Filter2
...
FilterN
msg msg msg
isSpam1(msg) isSpam2(msg) isSpamN(msg)
isSpam(msg)
msg msg msg
Filter 1
Check(msg)
Filter 2
Check(msg)
Filter N
Check(msg)
isSpam(msg)
veto(msg) == trueignore this msg
. . . . .
msg isSpam(msg)
• Each filter calculates the spam probability
Filtering Process
(3) CheckSpamato Base
Filter 1
PreCheck(msg)
Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filter 2
PreCheck(msg)
Filter N
PreCheck(msg)
veto1(msg) veto2(msg) vetoN(msg)
. . . . .
DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
Post Check
Filter1
Filter2
...
FilterN
msg msg msg
isSpam1(msg) isSpam2(msg) isSpamN(msg)
isSpam(msg)
msg msg msg
Filter 1
Check(msg)
Filter 2
Check(msg)
Filter N
Check(msg)
isSpam(msg)
veto(msg) == trueignore this msg
. . . . .
msg isSpam(msg)
• The overall spam probability is calculated and returned to the email client
Filtering Process
(4) Decision
Spamato Base
Filter 1
PreCheck(msg)
Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filter 2
PreCheck(msg)
Filter N
PreCheck(msg)
veto1(msg) veto2(msg) vetoN(msg)
. . . . .
DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
Post Check
Filter1
Filter2
...
FilterN
msg msg msg
isSpam1(msg) isSpam2(msg) isSpamN(msg)
isSpam(msg)
msg msg msg
Filter 1
Check(msg)
Filter 2
Check(msg)
Filter N
Check(msg)
isSpam(msg)
veto(msg) == trueignore this msg
. . . . .
msg isSpam(msg)
• Learn from global decision
• Collect statistics
• Play sound
Filtering Process
(5) Post-Check
Spamato Base
Filter 1
PreCheck(msg)
Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filter 2
PreCheck(msg)
Filter N
PreCheck(msg)
veto1(msg) veto2(msg) vetoN(msg)
. . . . .
DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
Post Check
Filter1
Filter2
...
FilterN
msg msg msg
isSpam1(msg) isSpam2(msg) isSpamN(msg)
isSpam(msg)
msg msg msg
Filter 1
Check(msg)
Filter 2
Check(msg)
Filter N
Check(msg)
isSpam(msg)
veto(msg) == trueignore this msg
. . . . .
msg isSpam(msg)
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Filters
• Bayesianato: Naïve Bayesian-based filter
• Ruleminator: Rule-based filter
• Razor(Ephemeral): Hash-based filter» Vipul’s Razor: http://razor.sourceforge.net
• URL-based filters:– Domainator: Search engine (“Google”) filter– Earlgrey: Our collaborative multi-domain filter– Razor(Whiplash): Collaborative single-domain filter
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
URL/URI/Domain Filtering
• About 70,000 spam emails investigated– ~76% with at least one domains, thereof…
• ~20% with more than one distinct domain• ~2% with ten or more distinct domains
• Spammers obfuscate their messages for the (sole) purpose of misleading URL filters!
• How to handle “fake” (including ham) domains? How to find “spam” domains?
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
URL-Filters in Comparison
D E R/W NOT ONLY
D 26.5% 1.1% 27.3% 0.6%
E 11.7% 2.5% 42.1% 2.0%
R/W 25.2% 41.4% 3.1% 15.6%
26.5% (1.1%) of all spam messages were identified by the Domainator, but not by the Earlgrey (Razor/Whiplash) filter. 27.3% of all messages were not identified by the Domainator, and 0.6% of all spam messages were solely identified by it.
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Conclusion & Future Work
• Spamato eases the implementation and deployment of spam filters and tools. It can be used with all email clients. It is open source.
• A multi-faceted (URL-) filtering approach is reasonable.
• TODO:– Integration of more filters and improved analysis tools– Decision module (dynamic weighting of filter results)– Trust system for collaborative filters
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Thank you!
Questions?Comments?
(Un)[email protected]@spamato.nethttp://www.spamato.nethttp://sf.net/projects/spamato