Everyday, we are swamped in junk emails. Too many junk emails! Junk emails slow down our productivity and are plainly annoying. For this project, you need to to design a system that recognizes junk emails and removes them from a user's mailbox. In doing so, it is important for the program to remove as many junk emails as possible while at the same time removing as few useful emails as possible (ideally zero). Therefore, the core of the system is an intelligent text classification system.
A closer study of junk emails reveals that the senders are getting more sophisticated. For example, senders are increasingly making junk mails appear very similar to legitimate emails. Similarly, they constantly change the sender's addresses using dynamic email accounts. To counter these problems, the junk mail classification systems must be "smart" enough to keep up with the tactics.
The intent in this project is to use a string database with string matching to filter out junk emails. the database could include: Subject-line strings, junk userids, content strings, etc. It may also be necessary to learn more about how the junk mails are generated. For ambiguous emails, degrees of "junkness" may be needed.