Coding a solution to a real-time science competition

SLICED is a competitive computing exhibition – where participants are given 2 hours to research and predict the data they have just seen. The episodes are underway this summer, and I recommend that you check them out if you’re interested in data analytics, data science, or machine learning.

Nick Wan and Meg Risdal are the hosts of SLICED. In Week 1, competitors were challenged to predict board game ratings, and were given a list of features (Release Year, how long the games last, etc.).

You can check tBfirst episode here in Twitchand learn more about the performance and schedule here.

As a challenge, I wanted to try to build a forecast in less than two hours – similar to competitors, albeit at a much lower pressure because 200 people didn’t observe me during coding.

The material includes ~ 3500 board games with various descriptive columns and a “Geek rating” that we are trying to predict. The goal is to predict an unknown Geek rating for 1,500 extra board games. Below is an example of a few columns, and all the data (and scoring) is available kaggle.

Data overview

One of the first things I did (after importing the libraries and data) was to draw a pairwise grid of different properties for our target variable, geek_rating. Some of the columns were text or a little more structured, and I saved them for later stages. In doing so, I noticed a few interesting relationships:

  1. The features of the board game (min. Players, max. Players, average playing time) seem to have a loose relationship to rating prediction. There’s probably a good place in how many players and how long the game lasts before it’s rated well.
  2. There seems to be a stronger (and non-linear) relationship between how many people own / vote for a game and how high it is rated.
  3. There are games for a while before they get a high rating.
pg = sns.PairGrid(train_df, x_vars=['min_players', 'max_players', 'avg_time', 'year', 'owned', 'num_votes', 'age'], y_vars=['geek_rating'])
Pairgrid plot

Going a little further, drawing the minimum number and rating of players, we see that the minimum scores for 1-2 players are the best on average. A similar analysis of the maximum value of the players yields that the most “normal” game lineups that appear to be popular.

sns.boxplot(data=train_df, x='min_players', y='geek_rating')
Minimum player boxplot

By visually scanning some of the game mechanics, I selected some keywords that got better scores. A better method would be to parse the different sentences and create a relationship using game technique and rating summary stats … but two hours got away from me quickly!


Grouping of players

I decided to add a “player grouping” feature just to capture some visually perceived relationships to the number of players. A machine learning algorithm, such as decision trees, could retrieve this automatically, but it was quick to encode data that I thought would be useful in combinations to facilitate learning.

def player_grouping(df):
if df['min_players'] <= 0:
return 'Low'
elif df['max_players'] <= 3:
return 'Low'
elif df['min_players'] == 8:
return 'Exact'
elif df['min_players'] == 5:
return 'Odd'
elif df['max_players'] > 3 and df['max_players'] <= 7:
return 'Exact'
return 'Other'

train_df['player_grouping'] = train_df.apply(lambda row: player_grouping(row), axis=1)

New derived player grouping feature

Class scoring

The next feature I wanted to create was based on the game category (Strategy, Dice, etc.).

This information was stored in several columns. I created a search vocabulary using the first one (better if you have all 12 but time to fly …) and then I browsed through the different columns to find the average score associated with the category terms. For example Medicine, Renaissance and Civilization categories were the most successful – and Trivia, memory and number worst of categories.

category_lookup_dict = dict(train_df.groupby('category1')['geek_rating'].mean())

def get_combined_category_scoring(df, category_dict, col_list):
score_list = []
for col in col_list:
if df[col] != np.nan:
# Handle errors for new categories not profiled
if len(score_list) > 0:
return np.mean(score_list)
return 6.09 # avg for missing categories

col_list_cat = [col for col in train_df.columns if 'category' in col]
train_df['cat_score'] = train_df.apply(lambda row: get_combined_category_scoring(row, category_lookup_dict, col_list_cat), axis=1)

A team of mechanics

I did a similar drawing technique with the game mechanics field, but less scientific about calculating averages and just a binary flag approach as time decreased.

The final step was to select the columns as input to the prediction and machine learning algorithm. I tried a few and ended up in Gradient Boosting. I didn’t spend much time optimizing the hyperparameters, and just went to the defaults.

feature_cols = ['age', 'player_grouping', 'owned', 'num_votes', 'cat_score', 'min_players', 'max_players', 'avg_time',
'min_time', 'max_time', 'year', 'mechanic_group']
target_col = 'geek_rating'

x = train_df[feature_cols]
y = train_df[target_col]

reg = GradientBoostingRegressor(), y)
predictions = reg.predict(x)
print(f'RMSE for training set: {np.sqrt(mean_squared_error(y_true=y, y_pred=predictions))}')

The mean square error was 0.141 for the training set and 0.167 for the validation set (30% of the training samples).

After incorporating these features and a few iterations, I ended up with the following notebook and a result of 0.177 RMSE – about the 9th result table.

Scoreboard (shows broadcasts during Twitch streaming only)

This was a fun challenge and would recommend others to try their hands at analyzing the data. I have close zero board game expertise in addition to the childhood family monopoly, and it was interesting to see how accurate a forecast I could get.

All examples and files are available Github.


Please enter your comment!
Please enter your name here