Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | amari-summersett |
View: | 217 times |
Download: | 0 times |
SureMailNotification Overlay for Email Reliability
Sharad AgarwalVenkat Padmanabhan
Dilip A. Joseph
8 March 2006
8.MAR.2006 2
Outline
• Email loss problem
• Design philosophy
• SureMail design
• SureMail robustness to security attacks
• SureMail implementation
8.MAR.2006 3
What is Email Loss?
• Email loss : sent email not received
• Silent email loss– Loss w/o notification (no bounceback / DSN)
• Why?– Aggressive spam filters
• 90% corp. emails thrown away (blacklist)• AOL’s strict whitelist rules (must send 100/day)• Bouncebacks contribute to spam
– Complex mail architecture upgrades / failures• SMTP reliability is per hop, not end-to-end
8.MAR.2006 4
How Much Email Loss?
• Even loss of 1 email / user / year is bad– If it’s an important email
• To really measure loss– Monitor many users’ send & receive habits– Count how many sent emails not received– Count how many bouncebacks received– Difficult to find enough willing participants that
email each other across multiple domains
8.MAR.2006 5
Prior Work
• “The State of the Email Address”– Afergan & Beverly, ACM CCR 01.2005
• Rely on bouncebacks; similar to “dictionary” attack• 25% of tested domains send bouncebacks• 1 sender• 0.1% to 5% loss, across 1468 servers, 571 domains
• “Email dependability”– Lang, UNSW B.E. thesis 11.2004
• 40 accounts, 16 domains receive emails from 1 sender• Empty body, sequence number as subject• 0.69% silent loss
8.MAR.2006 6
Our Email Loss Study
• Methodology– Controller composes email, sends– Our code for SMTP sending– Outlook for receiving (both inbox & junk mail)– Parse sent and received emails into SQL DB– Match on {sender,receiver,subject,attachment}– Heuristics for parsing bouncebacks
• Want– Many sending, receiving accounts– Real email content
8.MAR.2006 7
Experiment Details
• Email accounts– 36 send, 42 receive– Junk filters off if possible
• Email subject & body– Enron corpus subset– 1266 emails w/o spam
• Email attachment– 70% no attachment– jpg,gif,ppt,doc,pdf,zip,htm– marketing,technical,funny
Domain Type Receive Sendmicrosoft.com Exchange 1 1fusemail.com IMAP 2 2aim.com IMAP 2 2yahoo.co.uk POP 2 2yahoo.com POP 2 2hotmail.com HTTP 1 Xgawab.com POP 2 2bluebottle.com POP 2 2orcon.net.nz POP 2 2nerdshack.com POP 2 2gmail.com POP 2 2eecs.berkeley.edu IMAP 2 2cs.columbia.edu IMAP 2 2cc.gatech.edu POP 2 2nms.lcs.mit.edu POP 2 2cs.princeton.edu POP 2 2cs.ucla.edu POP 2 Xcubinlab.ee.mu.oz.au POP 2 Xusc.edu POP 2 2cs.utexas.edu POP 2 2cs.waterloo.ca POP 2 2cs.wisc.edu IMAP 2 2
8.MAR.2006 8
Email Loss ResultsStart Date 11/18/2005End Date 1/11/2006Days 54Emails sent 138944Emails received 144949Emails lost 2530Total loss rate 1.82%Bouncebacks received 982Matched bouncebacks 878Unmatched bouncebacks 104Emails lost silently 1548Silent loss rate 1.11%Hard failures 565Conservative silent loss 0.71%
8.MAR.2006 9
Loss Rates by AccountAccount Send Receive
Loss [email protected] 1.2 [email protected] 0.8 [email protected] 0.4 [email protected] 1.6 [email protected] 1.2 [email protected] 0.8 [email protected] 0.8 [email protected] 0.7 [email protected] 0.8 [email protected] - [email protected] 1.3 [email protected] 1.1 [email protected] 1.4 [email protected] 2.3 [email protected] 5.3 [email protected] 5.9 [email protected] 0.6 [email protected] 0.7 [email protected] 2.6 [email protected] 2.3 2.3
Account Send ReceiveLoss Loss
[email protected] 0.9 [email protected] 0.5 [email protected] 1.3 [email protected] 1 [email protected] 0.7 [email protected] 0.6 [email protected] 0.7 [email protected] 0.6 [email protected] 0.9 [email protected] 0.5 [email protected] - [email protected] - [email protected] - [email protected] - [email protected] 0.8 [email protected] 0.7 [email protected] 0.7 [email protected] 0.4 [email protected] 0.6 [email protected] 0.6 [email protected] 0.6 [email protected] 0.7 0.3– Loss rate 1.82% to 0.82%
8.MAR.2006 10
Loss Rates by AttachmentAttachment Size Type Emails Emails Loss
(B) Sent Received Rate(none) 96631 1062 1.11nag_jpg2.jpg 48634 JPEG 1489 28 1.9Nehru_01.jpg 21192 JPEG 2632 55 2.1home_main.gif 35106 GIF 2723 65 2.4phd050305s.gif 65077 GIF 2905 43 1.5ActiveXperts_Network_Monitor2.ppt 105984 MSPowerPoint 2935 43 1.5vijay.ppt 48640 MSPowerPoint 1327 18 1.4CfpA4v10.doc 55808 MS Word 2900 43 1.5CHANG_1587051095.doc 25088 MS Word 2711 15 0.6SSA_PRODUCTSTRATEGY_final.doc 91136 MS Word 2822 33 1.234310344.pdf 32211 PDF 2867 25 0.9f1040v.pdf 47875 PDF 2805 42 1.5CHANG_1587051095.zip 4987 Zip 2940 19 0.6SSA_PRODUCTSTRATEGY_final.zip 43511 Zip 2776 64 2.3IMC_2005_-_Call_for_Papers.htm 10377 HTML 2773 32 1.2ActivePerl_5.8_-_Online_Docs___Getting_Started.htm24598 HTML 2757 23 0.8BBC_NEWS___Entertainment___Space_date_set_for_Scotty's_ashes.htm32776 HTML 2951 42 1.4
• Nothing stands out
8.MAR.2006 11
Loss Rates by Subject/Body
• ~50-250 emails sent per subject
• Without 35% case : loss rate 1.82% to 1.79%
0
5
10
15
20
25
30
35
40
1 60 119
178
237
296
355
414
473
532
591
650
709
768
827
886
945
1004
1063
1122
1181
1240
Subject
Lo
ss %
8.MAR.2006 12
Summary of Findings
• Email loss rates are high– 1.82% loss– 0.71% conservative silent loss ( 1 / 140 )
• Difficult to disambiguate cause of loss– Difference between domains (filters or servers?)– No difference between mailboxes– No difference between attachments– Only 1 body had abnormally high loss
8.MAR.2006 13
Outline
• Email loss problem
• Design philosophy
• SureMail design
• SureMail robustness to security attacks
• SureMail implementation
8.MAR.2006 14
We Found Email Loss; Now What?
• Can try to fix email architecture, but– Hard to know exactly what is problem– Spam filters continually evolve; not perfect– Some architectures are very complicated– How many email systems are out there?– The current system mostly works
8.MAR.2006 15
Fixing the Architecture
• Improve email delivery infrastructure– more reliable servers
• e.g., cluster-based (Porcupine [Saito ’00])
– server-less systems • e.g., DHT-based (POST [Mislove ’03])
– total switchover might be risky
• “Smarter” spam filtering– moving target mistakes inevitable
– non-content-based filtering still needed to cope with spam load
8.MAR.2006 16
Email Notifications
• DSN / bouncebacks– Most spam filters don’t generate DSN on drop– Bogus DSNs due to spam w/ bogus sender– Some MTAs block DSN for privacy– MTA crash may not generate DSN– No DSN for loss between MTA and MUA
• MDN / read receipts– Expose private info (when read, when online)– Can help spammers
8.MAR.2006 17
Notification Design Requirements
• Cause minimal MTA/MUA disruption
• Cause minimal user disruption
• Preserve asynchronous operation
• Preserve user privacy
• Preserve repudiability
• Maintain spam and virus defenses
• Minimize traffic overhead
8.MAR.2006 18
Outline
• Email loss problem
• Design philosophy
• SureMail design
• SureMail robustness to security attacks
• SureMail implementation
8.MAR.2006 19
SureMail Design Requirements
• Cause minimal MTA/MUA disruption• No MTA modification; no Outlook modification
• Cause minimal user disruption• User notified only on loss
• Preserve asynchronous operation• Preserve user privacy
• Only receiver is notified of loss
• Preserve repudiability• No PKI / authentication
• Maintain spam and virus defenses• Emails not modified
• Minimize traffic overhead• 85 byte notification per email
8.MAR.2006 20
Basic Operation
• Sender S sends email to receiver R– S also posts notification to overlay
• R periodically downloads new email– R also downloads notifications from overlay
• Notification without matching email loss– delay : median 26s, mean 276s, max 36.6 hrs
8.MAR.2006 21
SureMail Overview
Sender S Recipient R
You’ve Lost Mail!
Request lost message
Dreg=H2(R)
Reg
iste
r
Dnot=H1(R) Verify
GetNotifications
H1(Mnew),
H1(Mold),T,
MAC([T,H1(Mnew)]
,H2(Mold))
8.MAR.2006 22
SureMail Overview
• Emails, MTAs, MUAs unmodified
• Parallel notification overlay system– Decentralized; limited collusion– Agnostic to actual implementation
• end-host-based (e.g., always-on user desktops) • infrastructure-based (e.g., “NX servers”)
• Prevent notification snooping & spam– Email based registration– Reply based shared secret
8.MAR.2006 23
Email-Based Registration
• Goal: prevent hijacking of R’s notifications– Only R can receive emails sent to R– Limited collusion among notification nodes
• One-time operation for initial registration– R sends registration request to H2(R), H3(R)– H2(R), H3(R) email registration secrets to R
• To retrieve notifications at H1(R)– R uses registration secrets with H1(R); H1(R)
verifies with H2(R) H3(R), sends back notifications– Neither H1(R), H2(R), H3(R) can associate
notifications with R, unless they collude
8.MAR.2006 24
Reply-Based Shared Secret
• Goal: prevent notification spoofing & spam– Only R & S know their email conversations– S rarely converses with spammers
• Reply detection– S sends Mold to R, R replies with M’old
– S uses H(Mold) to “prove” identity to R in future
• Notification for Mnew from S to R– H1(Mnew),H1(Mold),T,MAC([T,H1(Mnew)],H2(Mold))
– Only R can identify S– Shared secret can be continually refreshed
8.MAR.2006 25
Attacks Defeated by Design
• X cannot retrieve H1(R) notifications
• H1(R) cannot identify R
• H2(R), H3(R) cannot see R’s notifications– If they don’t collude; can increase to 3 nodes
• X, H(R) cannot identify S
• X, H(R) cannot learn Mnew, Mold
• X cannot annoy R with bogus notifications
• X cannot masquerade post to H1(R) as S
8.MAR.2006 26
First Time Sender
• What if FTS email is lost?
• FTS & spammer generally indistinguishable
• But perhaps FTS knows I who knows R– Email networks have small world properties
– I makes shared secret SI with all known parties
– FTS sends email to R• Posts multiple notifications
• One for every SI it has learned
8.MAR.2006 27
Other Issues
• Reply-detection:– “in-reply-to” header may not always help– indirect checks based on text similarity
• Reducing overhead:– post notifications only for “important” emails– delay posting in hope of receiving implicit ACK (reply) or
NACK (bounce-back)
• Mobility: – reply-based shared secret can be regenerated– web-mail
• Can support mailing lists
8.MAR.2006 28
Outline
• Email loss problem
• Design philosophy
• SureMail design
• SureMail robustness to security attacks
• SureMail implementation
8.MAR.2006 29
SureMail Implementation
• Reply detection heuristic for shared secret
• Notification service– Centralized server running– Chord based DHT running
• Notification posting, retrieving– Grab in/out bound email via Outlook MAPI call
• No modification to Outlook binaries
– XML notification put/get commands– Simple Win32 GUI
8.MAR.2006 30
SureMail GUI• Client UI will see emails, will post & retrieve notifications• E.g. running on two machines [email protected] and [email protected]
Lost!Not lost
8.MAR.2006 31
Notification Results
Start Date 1/11/2006End Date 2/8/2006Days 29Emails sent 19435Emails lost 653Total loss rate 3.36%Bouncebacks received 406Matched bouncebacks 378Unmatched bouncebacks 28Emails lost silently 247Silent loss rate 1.27%Hard failures 70Conservative silent loss 0.91%Notifications received 19435
8.MAR.2006 32
Summary• Email does get lost!
– ~40 accounts, 158000 emails, 0.71%-0.91% silent loss• SureMail
– Client based – unmodified email, servers, clients; no PKI– User intervention only on lost email– Keeps repudiability, privacy, asynchronous, spam & virus
defense• Separate notification overlay robust
– Simple, small message format– No virus, malware, spam filters needed– Provides failure independence
• Status– ACM Hotnets 05; ACM Sigcomm 06 submission– Prototype implementation