The goal of the 'Introduction to Machine Learning' module's coursework was to assess our ability to solve a machine learning project. There were multiple project options we could pick from and I picked the Spam/Ham SMS detector. In this project, I was given a single dataset. The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research. This corpus has been collected from free or free for research sources at the Internet. Each instance consists of features and a variable to be predicted.a public set of SMS labeled messages that have been collected for mobile phone spam research. This corpus has been collected from free or free for research sources at the Internet. Each instance consists of features and a variable to be predicted.
My task was to obtain a predictor with as small test error as you can. However, while the performance of my final predictor will be taken into account, the main component of the assessment is the process I used to obtain my solution. As such, I was required to perform the following steps: