For my final year dissertation, I designed and implemented a recommendation system for anime TV shows and Movies. The software is catered towards those who are new to anime to offer an easy way to get started with this genre of media entertainment.
Below is a system architecture diagram for the project at the component level. The system was built using the Model-View-Controller architecure.
Models contain the back-end data logic and with the use of Entity Framework, I was able to use these models to represent tables in my database and execute CRUD commands.
Views represent the front-end web pages that the user may interact with. These pages rely on dynmaic data from the models that is processed by the controllers.
I designed and implemented a hybrid model that uses both content-based and collaborative filtering techniques. Content-based and collaborative filtering are common methods for building recommender systems.
However, both have drawbacks when used in isolation. My model attempts to mitigate these pitfalls by using both methods to generate more accurate recommendations.
The pre-processing stage involved manipulating the original dataset to suit the needs of the project. I used Pandas dataframes to restructure the data so it was more useful for generating recommendations. This included creating extra features for the recommender engine and omitting any data that was not useful.
The final stage was to generate the anime recommendations based on similarity calculations using our vectorised data. I chose the K-Nearest Neighbours Algorithm in order to cluster similar groups of users and anime together. I experimeneted with multiple distance metrics: Cosine and Euclidean distance.
Finally, I carried out a comprehensive evaluation process to determine the optimal model distance metric and model design for generating recommendations.
Below is a code snippet from my project that shows the model being fit with data and the use of K-Nearest Neighbours to cluster data points into groups of five based on similarity.
*Data source can be found here
In order to evaluate the performance of the model, I devised a custom formula to measure performance. It involves analysing each anime recommendation and assigns points depending on how relevant each recommendation is to the users. I compared the results between two distance metrics: Cosine Distance and Euclidean Distance. After reviewing the results, I then revised the evaluation formula to address new-found limitations. Throughout this process, I learnt how important it is that the evaluation techniques used are appropriate in order to properly gauge the performance of the system and make the correct improvements to the model.
The diagram below illustrates the logic behind the final evaluation formula that was used.
The main rules for the evaluation model are shown implemented in the code below. The code shows how points are allocated for each user's anime recommendations.