Spam Dector: Is your SMS spam or ham?

The goal of the 'Introduction to Machine Learning' module's coursework was to assess our ability to solve a machine learning project. There were multiple project options we could pick from and I picked the Spam/Ham SMS detector. In this project, I was given a single dataset. The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research. This corpus has been collected from free or free for research sources at the Internet. Each instance consists of features and a variable to be predicted.a public set of SMS labeled messages that have been collected for mobile phone spam research. This corpus has been collected from free or free for research sources at the Internet. Each instance consists of features and a variable to be predicted.
My task was to obtain a predictor with as small test error as you can. However, while the performance of my final predictor will be taken into account, the main component of the assessment is the process I used to obtain my solution. As such, I was required to perform the following steps:

  • Present the problem of my choice in a formalised way, choose a loss function that reflects the potential use of the predictor

  • Propose and implement baseline predictors/classifiers and methods to train them, e.g., include a linear method from the course. Present my findings
  • Propose more advanced algorithms to solve the problem
  • Implement the methods proposed above, give insights into the training and evaluation process. 
 * Asses their performance and present my proposed solution to the problem
  • Discuss my overall findings and conclusions
The coursework was assessed based on a technical report comprising my methodology, approach and results. My code, to build and test the models, was written in python.