Elon Computing Sciences


Presentation at Elon Student Undergraduate Research Forum, Spring 2005

Eric M. Hydrick (Dr. Lynn Heinrichs) Department of Computing Sciences

Unrequested and unwanted e-mail, often referred to as spam, is a growing problem in today’s society. People who send spam work hard to avoid detection from filters designed to prevent such messages from reaching users. One form of filtering e-mail that has received significant attention is Bayesian filtering. Bayesian filtering is unique in that it is highly customizable and is the only filter to actively seek out indicators that a message is legitimate and/or wanted e-mail, referred to as ham (Conrad-Murray, 2003). Bayesian filtering also has the potential to “evolve” as spam changes, making it extremely customizable for a user. However, when stored and configured on each user’s personal computer (PC), this kind of filtering uses a lot of computing power and complicates network administration. (Conrad-Murray, 2003). An alternative to locating the filtering software on each PC is to install it on the e-mail server itself and just filter all messages at one location. The drawback to this approach is that different users would consider different things spam and different things legitimate. With only one set of filtering rules to work from, just how effective can this be?

This project looks at the effectiveness of placing a Bayesian filter on an e-mail server. This is important because a network administrator wants the best implementation possible of a Bayesian filter. A commercial product was tested by setting up two independent networks and forwarding e-mail messages from one network to another network with the Bayesian filter at the server. Data were recorded regarding the number of undetected spam, the number of blocked ham, and the total number of messages sent of each type. After examining the results from my experiment, I will show that the results are inconclusive regarding the effectiveness of implementing a server-side Bayesian filter in a live network. I will discuss why additional testing is needed before the reliability of implementing a Bayesian filter on a server can be stated with any certainty. I will also discuss future directions for research into this field.