Hi everyone! My name is Trenton Goins, and I am very excited to be running Hacware’s Engineer Takeover this month! I am Hacware’s newest Software Engineer and am thrilled to be here. During my takeover, I will be showcasing what it’s like to be a new hire at Hacware, some perks of working here, and give you a detailed walkthrough on one of my first projects here at Hacware.
I have been working at Hacware for just a little over a month now. This is my first job out of college, and as such it can be a little daunting at times. But the people at Hacware have been nothing but supportive, and the office environment is always fun and relaxing. Especially when there is Ping Pong!
As far as work goes, I am still in the training phase of my employment. I have been dividing my time between working with the main team to contribute to our current project, and continuing my own personal education and training. In the following sections, I will give an example on one such training exercise I worked on, and how it was both educational and creative. But first, we need to introduce the topic of Machine Learning, one of Hacware’s specialties.
Machine learning, put simply, is the application of Artificial Intelligence to provide a system the ability to improve itself over time, without outside intervention. The scale and complexity of machine learning programs can vary wildly, from Google’s DeepMind to Pandora to Netflix’s recommendation system. These systems use sophisticated algorithms to develop a program that can solve a problem, and improve its own ability to solve that problem over time.
As Netflix learns more about your viewing habits, the program (Ideally) gets better and better at recommending you items that you would enjoy to watch. It might incorporate your viewing history, what genres you watch the most, actors, directors etc. In the case of Pandora, they have a “like” system in place, where they might attempt to place a user in a group that has similar taste in music based on what they like and dislike. From there, users get recommendations based on the group they fall into.
So there are many different approaches, and there is almost no end to the depth of this particular field. But, we can start small and see what a very simple machine-learning program looks like. This was one of my first tasks when starting at Hacware. After some time for research, it was time to make a content-based recommendation system.
A content-based recommendation system is a system in which users receive recommendations based on implicit data, such as viewing history, what kind of objects did the user buy/view, even generic information about the user such as age, location, gender etc. All of these can be used to get an idea of what kind of items a user is interested in, and then make recommendations based on those items. Many online shopping and media sites take this approach. If a user watched a Quentin Tarantino movie, they may like more movies by Quentin Tarantino. If they bought hedge-trimmers recently, they may be interested in more gardening equipment. And so on.
Content-filtering is one approach to handling recommendations. Another popular approach is Collaborative-filtering. Instead of worrying so much about similarity between items, a system might worry more about grouping different users into categories, and only making recommendations based on what other users in your category enjoy. These systems often require explicit data, such as a “like” or review system. Since we are not interested in using explicit data, we won’t be taking a collaborative approach for this program.
It is worth mentioning as well that many sites, such as Amazon or Netflix, take hybrid approaches. Combining both content and collaborative information to get a really good idea of what a user might be interested in. But that is a topic for another time.
For this project, I was even allowed to choose the topic and dataset. So, I chose a topic that I was interested in, board games. The website BoardGameGeek has public data on over 10,000 board games, and extensive meta-data available. From the number of players, the game mechanics, genre, time to play, even the name of the game designer. From here, our objective was clear. Create a recommendation system that gives recommendations based on a list of games a user already owns. If they play a lot of strategy games, ideally they would receive a lot of recommendations for strategy games. If they love 4-player card games about social deception, then they would receive dozens of recommendations for games along those lines. Maybe they aren’t all card games, or aren’t all 4 player, but most importantly the games that have the most elements in common with what a user already owns would get recommended first.
So, we know what we want, and we have our dataset but how do we make it happen? Well, we are going to use a method known as Cosine Similarity. Put simply, this is a mathematical formula that allows us to determine the similarity between two vectors. Remember, a vector is simply a geometric object with a length and a direction. So, if we can somehow turn all the board games in our dataset into vectors, we could then compare any two board games with the Cosine Similarity, and determine just how similar any two board games are with each other. With this, we can start making recommendations.
Really, you don’t need to know how exactly that formula works. For now, just trust that it will retrieve our similarity scores without trouble. But now we need our vectors. How do we turn a database of board games with all sorts of names, game mechanics, genres, designers and more into a series of vectors? Well, there are many approaches, but one approach I found online was to create what will be referred to as the metadata “soup”. Essentially, we are going to take every single relevant value in a board game’s row, strip everything of spaces and capital letters, and append each element one after another to create a single string of characters that describes everything about the game. So, within our database, if the game looks like this:
Here we have the game Gloomhaven.
As you can see, it is a complicated game with a lot of moving parts. Cards, tiles, units, tokens, you name it.
So our metadata “soup will look like this:
Soup: “adventureexplorationntasyfightingminiaturesisaacchildresaction/movementprogramming” etc.
Once we have our enormous string of metadata, we are ready to turn it into a vector. Luckily, Python has many handy libraries available to it so we don’t have to go and write the vectorization method ourselves. So, we create the metadata soup for every item in the list, and then vectorize them all. Every single item in our dataset has a vector associated with it, derived entirely from its own metadata.
If you can’t quite follow what the information to the left means, don’t worry. The exact value of the numbers is irrelevant to us. So long as we are able to compare any given game to any other, and see how similar the two are.
Now the only thing left is to choose two games and take the cosine similarity between the two.
So, I created a list in a separate file of all the board games our “user” owns. Obviously, we don’t want to make recommendations for games he already owns, and we want to recommend games that are similar to the ones he does own. In this case, we will just start by saying they only own Gloomhaven.
The program will loop through the user’s list of games and try its best to find a list of games that it considers to be most similar to the ones we provided. For reference, I printed out the user’s favorite mechanics, categories, and designers. In this case, The user only has 1 game in his list of owned games, so it’s simply a description of Gloomhaven.
Favorite Mechanics: [‘Action / Movement Programming’, ‘ Co-operative Play’, ‘Grid Movement’, ‘ Hand Management’, ‘ Modular Board’]
Favorite Categories: [‘Adventure’, ‘ Exploration’, ‘ Fantasy’, ‘ Fighting’, ‘Miniatures’]
Favorite Designers: [‘Isaac Childres’]
Looks like it caught those favorite mechanics correctly, there is only one game in our user’s collection at the moment. And here is our first batch of recommendations. For this example, I just had it print out the top 5, but it could print any number of recommendations needed. Here are those recommendations:
As you can see, we get a lot of Co-operative games, a lot of Adventure, Exploration, Fantasy games. If you aren’t familiar with these games, this probably won’t mean a whole lot to you, but I have played almost all of them and can say with good confidence that these are good recommendations for someone who enjoys Gloomhaven.
As you can see, Mage Knight is another one of those complicated games with a lot of moving parts, with cards, tiles, units, tokens etc. Needless to say, it’s an excellent recommendation for anyone who likes Gloomhaven.
You might also notice that I sort the recommendations based on the “owned” field. This number keeps track of the most purchased copies in the database. So in general, the higher the number, the more popular the game. Since we are trying to make a content-based recommendation system, we don’t want to base our recommendations on any sort of user ratings or reviews, but raw purchase data is a good metric. After all, we want to make good recommendations. Even if an extremely obscure game is an absolute perfect match to one of our owned games, that might be a weaker recommendation than a game that is less similar but more people have purchased it.
Our recommendation system is a success! We can say the user owns as many games as we like, and we will get more and more diverse recommendations.
For example, what would happen if we said we love Gloomhaven, but also a game like Codenames (A card game, involving a lot of social deception)
Here I had it print out the top 10 recommendations. And as you can see, it tries to divide the recommendations between Gloomhaven and Codenames, to try and give a diverse distribution of recommendations. By the way, Coup, The Resistance, and Spyfall, are all fantastic recommendations.
There is also a quirk of the system where it only filters exact matches for the names of games. Therefore, situations may arise where remakes, expansions, and sequels will frequently get matched due to them all having a high similarity score but not having the exact same name. In this case, Codenames: Pictures. In some cases, we may want to get these sorts of recommendations, so I left the feature be. But depending on requirements, you may want to filter out any games that have similar names to games we already own.
That wraps up my walk-through of my most recent training exercise. It was a ton of fun to do, and I learned a lot about machine learning, recommendation engines, python, databases and much much more. I hope you enjoyed exploring this little project of mine. This is just one small example of the kind of creative, challenging work you get to do at Hacware. And this is just for practice!