Anti-Spam Techniques

Anti-spam techniques

Lorenzo Peraldo

February 10, 2008

Contents

1

Chapter 1

Introduction

The abuse of electronic messaging to send unauthorized and inappropriate bulk messages is commonlynamed spamming. Spam is nowadays widely spread in different media, for example instant messagingspam, web search engines spam, spam in blogs or forums, even mobile phone messaging spam, but themost widely recognized and common form of spam is for sure e-mail spam.

E-mail spam is also known as unsolicited bulk e-mail (UBE) or unsolicited commercial e-mail(UCE)and consists of sending e-mail messages, usually with commercial content, in large quantities to anindiscriminate set of recipients. E-mail spamming started since the beginning of the internet and itgrew exponentially over the following years and nowadays spam e-mails represent the 80-85% of alle-mail messages in the world. One of the reasons why the volume of spam has become higher andhigher every year is the fact that spamming has no costs for spammers. Therefore they can managevery huge mailing lists without any operating costs thus adding more and more users to advertisewith bulk messages. Advertising messages are the most common but lately also other kinds of spammessages started to travel through the net, such as political or religious purposes messages.

Although spamming has no costs for spammers, its effects are devastating in order of consumptionof computer and network resources and human attention and time. Moreover it has a high direct costfor companies and internet service providers who want to fight spam, as well as indirect costs borneby the victims of spam, such as financial theft, identity theft, data and intellectual property theft,fraud, viruses and other malware infections that usually accompany spam messages.

Even though sending of junk e-mail has been prohibited from the beginning of the internet, enforcedby the Terms of Service/ Acceptable Use Policy (ToS/AUP) of the internet service providers, in manystates more permissive laws have been applied instead of tough laws against spam, especially in the US(because of CAN-SPAM Act of 2003), while in other countries like Australia and the member countriesof the European Union anti-spam laws have been passed. As a result we can see from statistics thatnowadays the most spam e-mail are produced in the USA, while for example Australia’s rank in thisnegative list has decreased since these tough laws against spamming were applied.

2

Chapter 2

Spam

In order to find a solution to the problem of spam it’s very important to define what is really consideredas spam and how spammers exploit weeknesses of the networks to spam.

2.1 Definition of spam

To be considered spam, an e-mail message must be first of all sent in bulk, that means it’s not sent toa single recipient but to a larger mailing list, and whats more important, it must be an unwanted orunsolicited e-mail, which means the recipient had never actually subscribed and confirmed subscriptionto that mailing list. For this reason another name for e-mail spam is UBE (unsolicited bulk e-mail).Another term often used to identify spam is UCE (unsolicited commercial e-mail) which just refers tothose spam messages having a commercial content. We’ll see later how these definitions of spam areimportant because many anti-spam techniques are based on these definitions for spam filtering andblacklisting.

2.2 How do spammers operate?

First of all spammers need a list of recipients to whom they’ll send spam messages. Both spammersthemselves and list merchants scan the net in order to find as many e-mail addresses as possible toadd to their lists. This process of e-mail addresses research is called address harvesting and is donewithout the consent of the recipient. There are different ways in which this harvesting could be made.The simplest one is gathering e-mail addresses from websites, usenet posts or discussion mailing lists.As spam messages often contain viruses, these could include functions to scan the victim’s computerfor e-mail addresses even if they’ve never been exposed on the web. In some cases these viruses mayalso scan the victim’s network interfaces, letting the spammer also gather e-mail addresses from trafficaddressed to the same network of the victim. Not all of these addresses harvested from the web arevalid and deliverable addresses, so spammers use some methods to find out if an address is valid ornot, for example if a recipient replies to a spam e-mail, or he clicks on a web link for unsubscribingfrom a mailing list (which usually just reveals the e-mail address to more spammers).

Hardly ever spammers send spam e-mails from their own computers and in any case they usuallyobfuscate their address with address spoofing. Spammers usually have many different accounts onfree webmail services in order to send tons of e-mails they couldn’t send from a single account. Eventhough most of webmail service now adopt a system called catcha to avoid automated bots to createaccounts, spammers have found a means of circumventing this measure. Spammers also found theway to protect themselves by hiding their tracks and at the same time get others’ systems to delivermessages for them. To do so they started creating the so called botnets, made of several compromisedmachines, and started to exploit the weaknesses of the network such as open relays and open proxies.Open relays just pass along messages sent to it from any location to any recipient, so that a spammercould just leave that relay the work of delivering all messages; open proxies instead create connectionsfrom any client to any server without authentication, so that a spammer could simply connect to a mailserver and send spam trough it. Both open proxies and open relays were designed when spamming

3

wasn’t a problem yet, but as spam from these insecure resources grew, DNSBL operators startedlisting their IP addresses in order to block spam coming form them.

Also for this reason, since 2003 spammers, rather than searching the global network for exploitableservices, began creating services on their own by commissioning computer viruses designed to deployproxies and other spam-sending tools on thousands of end-user computers. Virus-infected computersnot only serve spammers as spamming tools by sending spam messages, thus acting as proxies, butalso by perpetrating distributed denial-of-service attacks. To fight spam, many anti-spam techniqueshave been implemented, with good or not so good results, but as years are passing by spammers arealways finding new methods to cheat these techniques.

4

Chapter 3

Anti-spam techniques

To prevent e-mail spam various anti-spam techniques are used both by end users and e-mail systemsadministrators. Depending on who these techniques are executed by, they can be divided into fourcategories: end-user techniques, if action by individual users is required; automated techniques fore-mail administrators, if they can be automated and implemented directly on proxies or MTAs; auto-mated techniques for e-mail senders, if they’re implemented on end-users’ computers maybe embeddedin products or software; techniques for researchers and law enforcement officials. None of these tech-niques represent a complete and definitive solution to the problem of spam, as they all have a trade-offbetween not blocking all spam vs rejecting legitimate messages, and the associated costs in time andeffort.

3.1 End-user techniques

These techniques can be applied by single users in order to reduce their attractiveness to spam andrestrict the availability of their e-mail addresses on the net. To do this there are many little expedienteveryone can make; some of these measures are actually just some little rules users should rememberand observe when they send e-mails or receive spam messages. For example it is very importantnever to reply spam e-mail, first of all because many spammers see the reply as a proof that youraddress is actually a valid address. Moreover, as sender’s addresses in spam e-mails are often forged orinvalid addresses, a reply would be totally useless and sometimes even reach innocent users. Anotherimportant thing is not to trust links contained in spam messages because though they promise youto be removed from the spammer’s mailing list they just lead to more spam. Another measure userscould use is the so called address munging, which consists of altering ones e-mail address so thatanother user can still recognize it is a valid address, but machines cannot, in order to avoid addressharvesting to collect this address. Also posting anonymously or using disposable e-mail addresses aregood techniques to avoid spam. And finally, disabling the display of HTML, URLs and images ine-mails can avoid offensive images to be shown and spyware to be installed on our machines.

3.2 Automated techniques for e-mail administrators

E-mail administrators can use many software systems and services in order to reduce the load of spamin their systems and mailboxes. The two most known approaches are blocking and filtering. Theformer depends upon rejecting messages from internet sites likely to send spam, the latter relies onautomatically analysing the content of e-mails and blocking those which look like spam. Many ofthis filtering systems use machine learning techniques, which improves their accuracy over manualmethods, but in general filtering techniques are often found intrusive to privacy by some people sothat blocking is preferred by many e-mail administrators.

Some systems do not detect whether a message is spam or not, but they just accept messagesfrom trusted sites; this technique is known as authentication and repudiation and it uses the DNSjust like DNSBLs but instead of listing spammers sites, it’s used to list authorized sites. Anothermethod is requiring unknown senders to pass various tests, or better challenges, before their messages

5

are delivered. Some e-mail servers could decide to reject all messages coming from certain countriesthey expect to never communicate with; therefore they use a country-based filtering technique basedon country of origin of the e-mail determined by the senders IP address. Very often used are DNSBLs,or DNS-based Blackhole Lists. These lists, published via the DNS, list sites know to emit spam, openmail relays or proxies or ISPs known to support spam, so that mail servers can easily reject mail fromthose sources. Other DNS-based anti-spam system may instead use white listing and mark as good(white) IPs domains or URLs. Some mail administrators could also reduce spam by setting restrictionson the MTA, for example enforcing technical requirements of the SMTP and blocking mail comingfrom systems not compliant with the RFC standards. For example a simple HELO/EHLO checkingcan reduce spam significantly.

The PTR DNS records in the reverse DNS can be used for different things; for example moste-mail MTAs use FCrDNS verification and if there is a valid domain name, put it into the Received:trace header field. Some MTAs perform FCrDNS verification on the domain name given on theSMTP HELO and EHLO commands, but in this case e-mail is not rejected by default. PTR DNSrecords may be also used to check the domain names in the rDNS to see if they’re likely from dial-upusers, dynamically assigned addresses, or home-based broadband customers. And finally a ForwardConfirmed reverse DNS verification can create a weak form of authentication that there is a validrelationship between the owner of a domain name and the owner of the network that has been givenan IP address. Despite this authentication is weak, it can be strong enough to be used for whitelistingpurposes because spammers and phishers cannot usually bypass this verification when they use zombiecomputers to forge the domains.

3.2.1 Filtering techniques

Filtering techniques can rely on many different characteristics of e-mail messages. Content filteringtechniques rely on the specification of lists of words or regular expressions disallowed in mail messages,so that the mail servers would reject any message containing these phrases. Header filtering insteadinspects the header of the e-mail, where information about the message is contained. This fieldsare often spoofed by spammer in order to hide their identities or try to make the e-mail look morelegitimate than it is, but many of these spoofing techniques can be detected.

Spammers always try to disguise their messages in order to sidestep filtering. To do so they forexample spell words frequently used in spam messages, and therefore included in filtering lists, indifferent ways to make it harder for the administrator to recognize them, or they may introduceinvisible-to-the-user HTML comments in the middle of those words; this techniques are anyway quiteeasy to detect as the technique of sending spam consisting entirely of images so that the anti-spamsoftware can’t analyse the words. Content filtering can also be implemented to analyse the URLspresent in an e-mail message (spamvertise).

Statistical content filtering is a kind of document classification system which uses naive Bayesclassifiers to predict whether a message is spam or not, based on collections of spam and nonspam(ham) e-mails submitted by users. This system requires no maintenance, but users must mark messagesas spam or ham so that the filtering software can learn from these judgements. Thus a statistical filtercan respond quickly to a change in spam content, without administrative intervention. Spammerstry to fight this technique by inserting many random but valid noise words or sentences into theirmessages while attempting to hide them from view, making it more likely that the filter will classifythem as neutral. However these noise countermeasures are largely ineffective.

3.3 Automated techniques for e-mail senders

Not only e-mail administrators can control the amount of spam delivered. Also e-mail senders can usedifferent techniques to make sure they don’t send spam, so that they cannot be blocked and be puton DNSBLs.

A recent method known as CAPTCHA is often used by ISPs and web e-mail providers on newaccounts to verify they’re legitimate users and not maybe a spammer trying to create new accountwith automated machines. Also e-mail providers should verify credit cards used for subscription are

6

not stolen and check the Spamhaus Project ROKSO list before accepting new customers. One featurespammers always try to exploit is the difficulty of implementation of opt-in mailing lists. To avoid thisit’s very important that mailing lists use instead confirmed opt-in , so that an address is never added toa mailing list until the owner of that address confirms the opt-in. This point is very important becauseit’s at the basis of anti-spam techniques and black lists such as those implemented by Spamhaus. Tocombat spam firewall and routers can be useful too; these could for example be programmed to stopSMTP traffic (through port 25) from those machines that are not supposed to send e-mail. As it mayhappen that also home users are blocked by an ISP doing this, e-mail could still be sent from thosecomputers through port 587. All port 25 traffic can also be intercepted by a NAT (Network AddressTranslator) and redirected to a mail server for verifications, for example for rate limiting.

An important contribution to fight spam is always well accepted from e-mail users. Spamcop forexample gathers spam reports from users and, by monitoring these reports, ISPs can learn of problemsbefore their mail servers are blacklisted.

3.4 Ongoing research

Many other new approaches have been proposed to improve the e-mail systems in fighting spam.Some of these techniques are based on a sort of certification attached to the e-mail message, such

as a so called ham password, a proof that the message is a ham (not spam) message, or some kindof electronic stamps which would imply a system of electronic micropayments with electronic money.Others are actually based on real money; these are the so called cost-based systems that rely on thefact that one of the reasons why spam has grown so much is that sending e-mail is completely for free,so if a sender had to pay some cost in order to send spam it would be probably too expensive.

Another techniques that has been proposed is the proof-of-work system, which implies a paymentnot in terms of money but in terms of computational load. A sender has to perform a calculation thattakes some time and the receiver will later verify this calculation but in much less time; doing so thecomputational load for a spammer who wants to send millions of spam messages would be too high,while a legitimate user who wants to send e-mail will just have to wait a few seconds more.

Also Microsoft Corp. chairman Bill Gates is active in spam fighting and proposed similar methodsand a new one based on money but not in all cases; the recipient of the e-mail message is free to decidewhether a message is spam or not. In the former case the sender (that is the spammer) would becharged for a fixed sum, while someone sending a wanted and legitimate e-mail wouldn’t be chargedfor anything by the recipient. Bill Gates was confident and quite sure about this method he announcedin 2002 that spam would have been over in 2 years, but as we all can see we’re still pretty much farfrom a solution.

3.5 Techniques for researchers and law enforcement

Increasingly, anti-spam efforts have required co-ordination between law enforcement, researchers, ma-jor consumer financial services companies and Internet service providers who need e-mail spam, identitytheft and phishing evidence to track and monitor the risks and activities. To do so honeypots areoften used. As we’ll see later in detail, honeypots are simply an imitation MTA looking like an openrelay or proxy, thus attracting spammers. This system will collect a large amount of spam e-mail andwill then submit addresses to DNSBLs, store the messages for further analysis or just discard them.

7

Chapter 4

Honeypots

A honeypot is a trap set to detect, deflect or in some manner counteract attempts at unauthorizeduse of information systems. It is always disguised as something containing valuable information orresources to attract attackers, in our case spammers. Honeypots are assigned unused IP addressesand they have no production value, so that all the traffic they see is surely malicious or unauthorized.For this reason we are sure that all the traffic passing through honeypots designed to thwart spamis illicit. Honeypots’ IP addresses are usually hidden so that no user can find them, but they can becollected by address harvesting techniques in order to be added in spammers mailing lists.

Honeypots can be classified depending on two factors. Based on the deployment, we can recognize:

• production honeypots, easy to use, mainly used to improve the security of an organization,captures limited information about attacks and attackers;

• research honeypots, usually run by non-profit organizations to capture extensive informationabout attacks and attackers and learn how to better protect against them.

The second classification is based on the level of involvement of the honeypot. We can distinguish thefollowing categories:

• low-interaction honeypots, called honeyd, GPL licensed daemons that works by emulating com-puters on the unused IP addresses of a network and provides simple functionalities;

• mwcollect and nepenthes, used to collect autonomously spreading malware and obtain the mal-ware binaries without being infected (as all it’s done in a virtualized environment);

• honeytraps, which create port listeners based on TCP connection attempts to monitor trafficand handle some unknown attacks;

• high-interaction honeypots, called honeynets, which are networks of real systems containingseveral honeypots.

After seeing all these classifications and types of honeypots, let’s concentrate on what we’re mostinterested in: spam honeypots. These honeypots have been created to masquerade as abusable re-sources such as open mail relays and open proxies which are very attractive for attackers, in order todiscover the activities of the spammers. Honeypots have very important functionalities. Not only theyblock spam, but they make possible the determination of the source of the attack and bulk captureof the spam, which will be analysed and will be useful to determine URLs and response mechanismsused by spammers. For example for open relay honeypots it’s easy to deceive spammers determiningthe e-mail addresses (dropboxes) used by spammers to target their test messages and transmittingany illicit relay e-mail received addressed to that dropbox e-mail address, in order to indicate to thespammer that the honeypot is a real abusable open relay. So, since the introduction of honeypots asanti-spam tools, spammer have started using chains of abused systems to send spam, to make detec-tion of the actual source more difficult. So one merit of honeypots is for sure having made the abuseless easy and less safe for spammers.

8

Many non-profit organizations started using honeypots and spamtraps in order not only to blocka large amount of spam passing through or directly addressed to their honeypots, but also to analysespam messages and their senders. Doing so they were able to create large Block Lists (DNSBLs),published on the web for free, that any ISP or mail server can query to control the traffic overthe respective networks. These organizations include The Spamhaus Project (www.spamhaus.org),SORBS (www.au.sorbs.net) and SpamCop.net (www.spamcop.net).

9

Chapter 5

The Spamhaus Project

The Spamhaus Project is a volunteer effort founded by Steve Linford in 1998 that aims to track e-mailspammers and spam-related activity. Spamhaus is responsible for three widely used DNS Blockliststhat many internet service providers use to reduce the amount of spam they take on.

Generating these three Blocklists, Spamhaus follows a strict policy and a precise definition ofspam is needed. So as we said before, e-mail messages are considered spam if they’re both bulk andunsolicited (UBE); spam is not an issue about content, doesn’t matter what’s written in the message,but about consent. For this reason it’s very important to understand the meaning of Opt-in, Opt-out,Confirmed Opt-in. To Opt-in means to have one’s e-mail address added to a mailing list. Spammersexploit the fact that once ad address is opted-in, the recipient rarely opts-out in a formal way to deletehis address from that mailing list, so he will go on sending spam to that address. From the legal pointof view that is still unsolicited e-mail and therefore spam. To send solicited e-mail the recipient musthave verifiably confirmed permission for the address to be included on the specific mailing list, byconfirming (responding to) the list subscription request verification.

5.1 Spamhaus DNSBLs

Spamhaus DNSBLs are a free public service offered to mail server operators on the internet. ISPsand other large sites doing large numbers of queries can also sign-up for an rsync-based feed of theseDNSBLs, which Spamhaus calls its Data Feed, as long as they are not in Spamhaus’s top ten worstspam service ISPs list, and they must also pass a background check to make sure they do not knowinglyor intentionally provide services to spammers. The three main DNSBLs of the Spamhaus Project arethe Spamhaus Block List (SBL), the Exploits Block List (XBL) and the Policy Block List (PLB).

5.1.1 DNSBL filtering

A DNSBL is a database that is queried in realtime by internet mail servers for the purpose of obtainingan opinion on the origin of incoming email. The role of a DNSBL is to provide an opinion, to anyonewho asks, on whether a particular IP address meets Spamhaus’ own policy for acceptance of inboundemail. Every internet network that chooses to implement spam filtering is, by doing so, making apolicy decision governing acceptance and handling of inbound email. The receiver unilaterally makesthe choices on whether to use DNSBLs, which DNSBLs to use, and what to do with an incomingemail if the email message’s originating IP address is ”listed” on the DNSBL. The DNSBL itself, likeall spam filters, can only answer whether a condition has been met or not.

5.1.2 Spamhaus Block List - SBL

The Spamhaus Block List targets verified spam sources such as spammers, spam gangs and spamsupport services. It is a database of IP addresses which do not meet Spamhaus’ policy for acceptanceof inbound e-mail. SBL listings are made based on the definition of spam as UBE and therefore there’sno check on the content or legality of the message, but just a check whether it complies that definitionof spam or not. The listing criteria for the SBL is the following: sources of unsolicited bulk e-mail

10

sent to Spamhaus spamtraps or submitted to Spamhaus by trusted third party intelligence are listedin the SBL; spam services, including mail, web, DNS and other servers identified as being an integralpart of a spam operation or being under the direct control of spammers are listed in the SBL; the SBLalso lists known spam operations and gangs listed in the ROKSO list (we’ll see it later), and servicessupporting these known spam operations.

IP addresses are removed immediately from the SBL database upon receipt by the SBL Team ofnotification from the IP owner (the Internet Service Provider responsible for assigning or routing theIP address) that the reason for listing has been corrected or terminated. If this doesn’t happen, SBLrecords are automatically removed after they time out. This time-out can be different for any entryof the SBL list, depending on the spam source (anyway it’s always the entry editor to decide it). Forunidentified spammers it can be 2 to 14 days, persistent spammers may have time-outs of 6 months,while known spam gangs can be listed for up to 1 year or more.

5.1.3 Exploits Block List - XBL

The Exploits Block List is a realtime database of IP addresses of hijacked PCs infected by illegal thirdparty exploits, including open proxies (HTTP, socks, AnalogX, wingate, etc), worms/viruses withbuilt-in spam engines, and other types of trojan-horse exploits. The XBL includes listings gatheredby Spamhaus as well as by other contributing DNSBL operations, the Composite Blocking List (CBL)and the Not Just Another Bogus List (NJABL), two highly-trusted DNSBL sources, with tweaks bySpamhaus to maximise the data efficiency and lower False Positives. The XBL can be used by settingthe mail server’s anti-spam DNSBL feature to query xbl.spamhaus.org this query will return a codedenoting the source of the data in the XBL zone. For example a return code such as 127.0.0.4 meansthe data source is the CBL list, the return code 127.0.0.5 means the data source is the NJABL listand so on.

5.1.4 Policy Block List - PBL

The Spamhaus PBL is a DNSBL database of end-user IP address ranges which should not be deliveringunauthenticated SMTP email to any internet mail server except those provided for specifically by anISP for that customer’s use, like dynamic and DHCP type IP address space designated as not allowedto make direct SMTP connections, or static assignments that shouldn’t be sending email without priorarrangement. Examples of such are an ISP’s core routers, corporate users required by policy to sendvia their internal mail server, and unassigned IP addresses. Much of the data is provided to Spamhausby the owners (ISPs) of the IP address space. PBL IP address ranges are added and maintained byeach network participating in the PBL project, and by the Spamhaus PBL team particularly for thosenetworks not partecipating themselves to the project and where spam received by those IP ranges isconsistent with spaces containing high concentrations of botnet zombies, a major cource of spam.

The PBL can be queried directly as pbl.spamhaus.org. As response there will be also in this casea return code which will be either 127.0.0.10 if the IP was entered by a participating ISP or 127.0.0.11if it was entered by Spamhaus. NS lookup of an (inverse) address which is not listed in the PBL willreturn NXDOMAIN.

5.1.5 Combined DNSBLs

Spamhaus also provides two combined DNSBLs. One is the SBL+XBL, which allows users to querysbl-xbl.spamhaus.org once and get return codes from both lists. A newer combination is called ZEN,which allows users to query zen.spamhaus.org once and get return codes from the SBL+XBL and thenewer PBL.

ZEN is the combination of all Spamhaus DNSBLs into one single blocklist to make querying fasterand simpler. ZEN can be queried from zen.spamhaus.org and as the other Spamhaus DNSBLs, itreturns a code. This code will be:

• 127.0.0.2, if the data source is the SBL, which will contain direct UBE sources, spam servicesand ROKSO spammers;

11

• 127.0.0.4−8, if the data source is the XBL, which will contain illegal third party exploits (proxies,worm, trojan;

• 127.0.0.10 − 11, if the data source is the PBL, which will contain non-MTA IP address rangesset by outbound mail policy.

5.2 ROKSO

The Spamhaus Register of Known Spam Operations (ROKSO) is a database of ”hard-core spamgangs” - spammers and spam operations who have been terminated from three or more ISPs due tospamming. The ROKSO list is not a DNSBL; it is, rather, a directory of publicly-sourced informationabout these persons and their business and at times criminal activities.

To be placed on the ROKSO list a spammer must first be terminated by a minimum of 3 ISPs forAUP violations. Once listed in ROKSO, IP addresses under the control of ROKSO-listed spammersare automatically and preemptively listed in the Spamhaus Block List. For qualified Law EnforcementAgencies Spamhaus provides a special version of this ROKSO database which gives access to recordswith evidence, logs and information on illegal activities of many of these gangs, too sensitive to publishhere.

Each spam operation, or ”spam gang”, consists on average of between 1 to 5 spammers. Themajority of the spammers on the ROKSO List operate illegally and move from network to networkand country to country seeking out Internet Service Providers with poor security or known for notenforcing of anti-spam policies. Many of these spam operations pretend to operate ”offshore”. Thosewho don’t hide behind anonymity pretend to be small ISPs themselves, claiming to their providersthat the spam is being sent not by them but by non-existent customers. When caught, almost all usethe age old tactic of lying to each ISP long enough to buy a few days or weeks more of spamming andwhen terminated simply move on to the next ISP already set up and waiting.

5.3 DROP

The Spamhaus Don’t Route Or Peer (DROP) List is an advisory ”drop all traffic” list, consistingof stolen zombie netblocks and netblocks controlled entirely by professional spammers. DROP is atiny sub-set of the SBL designed for use by firewalls and routing equipment. DROP is simply atext list of these IP address spaces, with the numbers of the underlying SBL listings as comments.When implemented at a network or ISP’s core routers, DROP can protect all the network’s users fromspamming, scanning, harvesting and DDoS attacks originating on rogue netblocks.

12

Chapter 6

SORBS

SORBS stands for Spam and Open Relay Blocking System. It is an open proxy and open mail relayDNSBL, later improved with complementary lists that include various other classes of hosts. TheSORBS DNSBL was created in 2002 first as a private list, then launched to the public in 2003. Inthe beginning it was conceived as an anti-spam project based on a daemon checking ”on-the-fly” ifthe e-mail it received had passed through proxies and open relay servers. The DNSBL created in thisway listed thousands of compromised hosts and proxy servers. Lately SORBS has also expanded toinclude in its list hacked and hijacked servers, formmail scripts, trojan infestations and now it alsopre-emtively lists all dynamically allocated IP address spaces.

SORBS provides many different zones identified as *.sorbs.net. Some examples are dnsbl.sorbs.net(including all the other DNS zones except spam.dnsbl.sorbs.net), rhsbl.sorbs.net (containint all RHSzones), and obviously all their sub-zones. SORBS also provides other aggregated zones such assafe.dnsbl.sorbs.net, problems.dnsbl.sorbs.net, relays.dnsbl.sorbs.net, proxies.dnsbl.sorbs.net. Thiszones are those which servers query and address for new entries requests. In addition to providing theSORBS zones, SORBS also makes the ASPEWS and SPEWS data available by DNSBL lookup, butas the policy of SORBS was the publishing of data that is fully under SORBS control, the ASPEWSand SPEWS zones are not included in the SORBS aggregate zone.

6.1 DUHL

SORBS adds IP ranges that belong to dialup modem pools, dynamically allocated wireless, and DSLconnections as well as DHCP LAN ranges by using reverse DNS PTR records, WHOIS records, andsometimes by submission from the ISPs themselves. These IPs form the so called DUHL (DynamicUser and Host List). It is similar to other DUL lists, but while these list dial-up ranges only, the DUHLalso lists IP spaces where addresses are assigned dynamically, as the increasing use of cable modemand DSL connections has made dial-up quite rare and simple DUL lists are no longer so efficient.

SORBS DUHL originally started life as a straight import of the Dynablock list maintained byEasynet NL. SORBS accepts requests for adding or removing entries from ISPs responsible for a certainIP address space, beside listing dynamically allocated addresses that SORBS comes across, typicallyafter receiving spam from them, and performing reverse DNS naming. Using rDNS, SORBS uses IETFdraft ”draft-msullivan-dnsop-generic-naming-schemes-00.txt” about static and dynamic assignmentrecommendations, to understand whether a network allocated static or dynamic addresses, relyingon the respect of recommendations about naming schemes. Matthew Sullivan of SORBS proposed inthis draft that generic reverse DNS addresses include purposing tokens such as ”static” or ”dynamic”.This draft has actually expired, and generally it is considered more appropriate for ISPs to simplyblock outgoing traffic to port 25 if they wish to prevent users from sending email directly, ratherthan specifying it in the reverse DNS record for the IP. Another very important thing is that SORBSexpects hosts with long TTLs, as short TTL values (especially under 1 hour) usually indicate therecord is about to change. Removal/deletion requests for example need the Time To Live of the PTRrecord to be 43200 seconds or more.

13

6.2 Submissions and queries

Submissions to SORBS can be made for three different lists:

• The Dynamic User/Host List (DUHL). This is a IP based list, and therefore forms part ofdnsbl.sorbs.net, and is available seperately as dul.dnsbl.sorbs.net. SORBS accepts submissionsto DUHL only from its registered logins with registered e-mail address matching the WHOISrecord for the domain.

• The Bad DNS Config List. This is a domain based list (sometimes knows as a Right HandSide Block List - RHSBL), and forms part of rhsbl.sorbs.net. It is available seperately asbaddns.rhsbl.sorbs.net. This list is explictitly for domains with bad DNS configurations, thatcan cause real problems with some mail servers. There are two reasons why hosts and do-mains could be listed here: the first one is that at least one MX record points to 127.0.0.1/32,0.0.0.0/8 or 255.255.255.0/8. The second one is that at least one MX record points to 10.0.0.0/8,172.16.0.0/12, 192.168.0.0/16 or to any address 224.0.0.0 - 254.255.255.255 and does not have aMX record in normal address space.

• The No e-mail from this domain list. Like the previous one, this is a domain based list part ofrhsbl.sorbs.net. It lists hosts and domains that will never be used for sending legitimate e-mail.For example SuperNews admins have indicated that no mail will ever be sent from the domains*.supernews.net.

SORBS can be queried by providing the address we want to check. This query will produce a returncode that indicates which database the test result was obtained from. If the query is made on aggregatezones, the return code will still identify the specific zone from which the result was obtained. All returncodes are in the form 127.0.0.x. For example 127.0.0.2 refers to http.dnsbl.sorbs.net, 127.0.0.8 refersto block.dnsbl.sorbs.net. If an IP address appears in more than one database, all applicable codes arereturned, so we can have multiple codes returned in order to know all the databases containing thatIP address.

6.3 SORBS certificates

SORBS also has its own CA (the SORBS Certificate Authority), a self-signed authority which issuesand signs certificates for e-mail clients, browsers and web servers. This certificate can be freelydownloaded from SORBS website (www.au.sorbs.net) and can be used to sign own e-mail messages.

14

Chapter 7

SpamCop

SpamCop is a free spam reporting service, which allows recipients of unsolicited bulk e-mail (UBE)and unsolicited commercial e-mail (UCE) to report offenses to the senders’ ISPs, and sometimes theirweb hosts. SpamCop uses these reports to compile a DNSBL of computers sending spam called the”SpamCop Blocking List” (SCBL) and websites referenced in the spam are used to create the SpamURI Realtime Blocklists (SURBL) RHSBL. SpamCop has tools for ISPs to manage the reports sentto them, to see details on individual spam messages, and to mark incidents as resolved.

7.1 SpamCop Blocking List

The SpamCop Blocking List (SCBL) is a list of IP addresses which have transmitted reported emailto SpamCop users, which in turn is used to block and filter unwanted email. The SCBL is a fast andautomatic list of sites sending reported mail, with a number of report sources, including automatedreports and SpamCop user submissions. Being time-based the SCBL also quickly and automaticallydelists these sites when reports stop.

The SCBL aims to block spam with minimal blocking or misidentification of wanted email. Wantede-mail may also be blocked and this may happen often, given the power of the SCBL and for thisreason this method should always be used together with whitelists containing wanted senders of e-mail.

The SCBL lists IP addresses reported both by SpamCop users and spamtraps. The system sendingspam e-mail to which the address refers, could either be a direct e-mail source such as a site’s primarymail server or an indirect source like an open relay or open proxy that have been abused to send spam.The number of reports referencing an IP are weighted by the SCBL against the total amount of e-mailsent by that IP. However this is not a very good method as IPs sending a lot of spam may never belisted if they also send a large amount of non-spam e-mail. SpamCop also monitors traffic throughsites using its SCBL as it’s queried at every SMTP transaction; the total amount of queries for eachIP address are counted, and the presence of that IP on the SCBL is checked, in order to estimate howmuch e-mail is transmitted by each IP. When a sampled site queries the SCBL about an IP addresssending mail which is not reported mail, that host is given a reputation point, which will be used forlisting.

Some blocking lists block mail from misconfigured or insecure servers such as open proxies or openrelays, or from certain classes of machines such as machines with dynamically-assigned IP addresses(see SORBS DUHL). The SCBL does not consider these characteristics. Instead, the SCBL lists onlyIP addresses of machines that are sending reported email.

7.1.1 SCBL rules

Timeliness is key to the SCBL’s value. The automated queries results in fast listing of spam, whichincreases the accuracy of the SCBL. Also, without any additional reports, a reported address stays onthe SCBL for only 24 hours. This limits the amount of damage if users make a mistake and reportlegitimate mail using SpamCop.

15

The listing system operates based on the following rules, taking into account the reputation pointsand number of reports.

• The SCBL lists IP addresses with a large number of reports relative to reputation points. Thetreshold is manually set by the SpamCop team in order to make the list as accurate as possible.

• Reports are weighted in terms of freshness, which means on how recently the e-mail was received:

– most recently received reports are counted 4 : 1;

– reports for e-mail 48 hours and older are counted 1 : 1, with a linear sliding scale betweenthe most recent and 48 hours past;

– reports for e-mail more than one week old are ignored.

• total reports are weighted with respect to spamtrap reports scores in the following way: forspamtrap scores less than 6, the number of spamtrap reports is multiplied by 5; for spamtrapscores more than 7, this number is squared. This scores are then added to the total of reports.For example:

– an IP address with 2 spamtrap reports and 3 SpamCop user reports will have a weightedscore of (2 ∗ 5) + 3 = 13

– a host with 7 spamtrap reports and 3 manual reports will have weighted score (7∗7)+3 = 52.

• The SCBL does not count reports regarding URLs or addresses in the body of the email. There-fore, the SCBL does not list websites or email addresses used to receive replies in reported email,unless that IP is also used to send the mail.

• The SCBL will not list an IP address with only one report filed.

• With only two reports against an IP address, the SCBL will list the IP address for a maximumof 12 hours after the most recent reported mail was sent.

• The SCBL will not list an IP address if there are no reports against it within 24 hours.

• If a server sends bounces to an SCBL spamtrap in sufficient quantity to meet the listing criteria,the SCBL will list that server. This situation results as some mail servers do not reject mailduring the SMTP transaction, but rather accept the mail and then send a bounce message later.Viruses and spam often contain a forged From: field so if the e-mail is rejected or blocked duringthe SMTP transaction, the bounce will go to the connecting IP. If the bounce comes after themail is accepted for delivery, then the bounce will go to the address in the From: field. Virusesand spam often use addresses from the list of recipients to populate the From: field. Sometimes,these addresses are spamtraps.

7.2 Limitations

For first-time SpamCop Reporters, the SpamCop Parsing and Reporting Service requires the reportermanually verify that each submission is spam and that the destinations of the spam reports are correct.People who use tools to automatically report spam, who report e-mail that is not spam, or report tothe wrong people may be fined or banned. This verification requires extra time and effort. Despitethese steps, reports to innocent bystanders do happen and ISPs may need to configure SpamCop tonot send further reports if they don’t want to see them again. SpamCop Reporters with a proventrack record are allowed to file Quick Reports, reducing both time and effort.

It is not clear whether reporting spam using SpamCop’s reporting service actually reduces theamount of spam received, and complaints on SpamCop’s online forum provide anecdotal evidenceto support some skepticism about its effectiveness. Spammers who determine the identity of thecomplaintants can, by doing so, also verify that the email addresses are still in use. What is clear is

16

that much spam email is filtered or blocked by the SCBL, which is fed by many SpamCop Reportersreporting their spam.

That said, SpamCop is effective at helping ISPs, web hosts and email providers identify accountsthat are being abused and shut them down before the spammer finishes operations. Finally, SpamCopprovides information from its reports to third parties who are also working to fight spam, amplifyingthe impact of its services beyond its own reach.

It is also remarkable in its own right that SpamCop has survived for so many years, consideringthe severity of opposition other anti-spam companies have faced in the past. SpamCop has dealt withattacks by spammers thus far by hiring services from Akamai, but is still the target of many hackersand could face serious difficulties if it continues to grow in size and effectiveness. Significant offensiveweapons can be wielded by the criminal syndicates behind spammers. SpamCop views itself as anattempt to stop spam without the necessity of governmental intervention, but because it lacks thepower of a government or large ISP, it may have greater difficulty dealing with spammers’ expertiseas well as the large ”bot” networks that they control and that they could use to perform a massiveDDoS attack.

17

Chapter 8

Conclusions

We’ve seen many different anti-spam techniques and in particular some based on honeypots andspamtraps and how these techniques are used to create useful blocking lists and databases. Theintroduction of these methods, as we’ve already said has the great merit to have made the abuse ofnetwork exploitable resources harder and more subject to risks for spammers. Beside this, associations,like The Spamhaus Project, which implement not only lists of simple IP addresses but also databaseswith detailed descriptions and evidence of spammer’s attacks and techniques used, can be really helpfulif joint with an efficient legislation and law enforcement from the State.

Furthermore some interesting aspects come from the listing policies of these DNSBLs. Some arecreated just thanks to feeds from honeypots or trusted third parties, while for example the SpamCopSCBL also accepts feeds from its registered users and this can thus balance the filtering and listingwith respect to what users actually consider spam. On the other hand it’s true that not always thismethod is efficient or at least we have no assurance of this, as for example not all spam reported bysome users will be blocked as the listing criteria is slightly more complicated.

Another relevant point about Spamhaus, SORBS, SpamCop and all the other honeypot-basedanti-spam organizations is the fact that there will always be a trade-off between not rejecting all thespam vs blocking legitimate mail; some of them are often considered too aggressive. For this reasonit is very important to have a balanced listing criteria and it’s advisable to use whitelists in orderto prevent messages from wanted senders to be blocked. The last point to be considered is the pricein time to be paid for queries to the DNSBLs and databases, but this depends on each mail serveradministrator’s sake.

In conclusion we can say that despite not being the ultimate anti-spam tool which will defeatthe problem of spam forever, honeypots have had a good impact in fighting spam and the threeorganizations analyzed have been for years reason of matters for spammers. As I said, they stillrequire more co-ordination with law enforcement, that’s what they were created for, and less toleranceon the State side, so that spammers would not just be blocked by few servers, but blocked in front ofa court.

18

Bibliography

[1] www.wikipedia.org

[2] www.cbsnews.com

[3] www.spamhaus.org

[4] www.au.sorbs.net

[5] www.spamcop.net

[6] Matthew Sullivan Spam and Open Relay Blocking System IETF Internet Draft

19

Date post:	12-Nov-2014
Category:	Documents
Upload:	project-symphony-collection
View:	1,216 times
Download:	3 times

Anti-Spam Techniques

Documents