Computer Sciences Dept.

On the Effectiveness of Pre-Acceptance Spam Filtering

Tatsuya Mori, Holly Esquivel, Aditya Akella, Z. Morley Mao, Yinglian Xie, Fang Yu
2009

Modern SMTP servers apply a variety of mechanisms to stem the volume of spam delivered to users. These techniques can be broadly classified into two categories: preacceptance approaches, which apply prior to a message being accepted (e.g blacklisting and whitelisting), and post-acceptance techniques which apply after a message has been accepted (e.g. content based signatures). In recent years, pre-acceptance techniques have attracted a lot of attention. In addition to cutting down spam, effective and accurate pre-acceptance filtering is crucial to reducing the load on SMTP servers.

In this paper, we empirically study the limits of effectiveness of pre-acceptance approaches. In our study, we first classify SMTP senders into three main categories: end hosts, legitimate servers and spam gangs.We argue that both the effectiveness and the role played by pre-acceptance approaches differ significantly across spam sent by the hosts in these categories.

We find that end-hosts make up over 88% of all senders and contribute nearly 54% of all spam. Spam gangs make up less than 1.2% of all senders, but contribute more than 30% of all spam. Both these sets of spammers can be filtered using address blacklists. However, we find that the blacklists corresponding to spam gangs may have to be updated as frequently as once every few days in order to be effective. We find that legitimate servers make up less than 1% of all e-mail senders, and contribute less 0.4% of all spam. Furthermore, these servers send an overwhelming fraction of all ham. Thus, simple whitelisting can be employed to permit all e-mail from them. Whitelists of legitimate servers can be constructed relatively easily and updated infrequently.

On the whole, we find that it is possible to build effective preacceptance filters which can eliminate nearly 90% of all spam today.

Download this report (PDF)


Return to tech report index

 
Computer Science | UW Home