One of the most important keys in any tennis match is the ability to save break points. Look ATP Scoreboard and you’ll be reminded of this fact when you see the best pros in the game consistently at the top of the break-even savings percentage. But what makes a player succeed in saving break points? From Jeff Sackman’s step-by-step dataset for each Grand Slam over the past 10 years, we can dive in and see how professionals save breakpoints during the biggest stages of tennis.

First let’s start by importing the relevant library:

`import pandas as pd`

import seaborn as sn

import matplotlib.pyplot as plt

import numpy as np

Then we need to import the data. The Australian Open and French Open data from 2018 are packaged differently, so we’re dropping these few tournaments so far.

years = [i for i in range(2011, 2021)]

slams = ['ausopen', 'frenchopen', 'usopen', 'wimbledon']

data = pd.DataFrame()for year in years:

for slam in slams:

if year >= 2018 and slam in ['ausopen', 'frenchopen']: #these slams did not have the same data collected

continue

try:

new_data = pd.read_csv('./tennis_slam_pointbypoint-master/' + str(year) + '-' + slam + '-points.csv')

if year == 2011 and slam == 'ausopen':

data = new_data

else:

data = pd.concat([new_data, data])

except FileNotFoundError:

print(year, slam)

continue

Next, I dropped rows that didn’t have rally data (in this case, those that didn’t have a speed figure) and collected relevant attributes like aces, net points, unforced errors, winners, etc. I refer to anyone interested Github publishing a file for an article would be overwhelming.

Finally, we can begin to analyze. The first decision made on the tennis court is to serve first. As shown below, there seems to be a huge help in serving first, as there are significantly fewer break points in the first game than in the second game (a decrease of about 25%). The best way to be good at saving break points is not to encounter them first and serving first can minimize the break points that a player will face throughout the match.

#imported library's include (Seaborn, Matplotlib.pyplot, pandas)

#sets the x axis to be 12 games and then uses a pandas groupby statementa_plot = sn.lineplot(x = [i for i in range(0,12)], y = data.groupby('GamesPlayed').mean()['BreakPoint'][0:11])a_plot.set(ylim = (0, .19))

plt.title("Break Points Faced per Game")

plt.grid()

plt.xlabel("Games into Set")

The next thing we are going to explore is how to serve during break points when everything is on a tennis court that you can fully control. Although we are somewhat limited in the sense that the data set does not give us a radar reading if a player lost their first service (i.e., how hard they tried to hit), we can still examine how service speed affects several key interruption factors in a point savings percentage.

I grouped service speeds into buckets at 5 MPH intervals from 80-140 MPH. All meters are tied to this loop, and the name of each meter must be fairly intuitive

for i in range(80, 145, 5): #bucketing by fives and then appending to lists corresponding to different metricsbucket_data = data[(data.ServeSpeed >= i) & (data.ServeSpeed <= i + 5)]

trimmed_data = bucket_data[bucket_data.BreakPoint]x = bucket_data.groupby('BreakPoint').mean()#all data

y = trimmed_data.groupby('BreakPointWon').mean()#just bp datasave_percentage.append(1 - x.loc[True, 'BreakPointWon'])#doing the one because has two entries True and False

rally_length.append(y.loc[True, 'Rally'])buckets.append(i)

winner_percentage_loss.append(y.loc[False, 'Winner'])

winner_percentage_save.append(y.loc[True, 'Winner'])unforced_error_percentage_loss.append(y.loc[True, 'UnforcedError'])

unforced_error_percentage_save.append(y.loc[False, 'UnforcedError'])net_point.append(x.loc[True, 'ServerNetPoint'])

net_point_won.append(x.loc[True, 'ServerNetPointWon'] / x.loc[True, 'ServerNetPoint'])return_net_point.append(x.loc[True, 'NetPoint'] - x.loc[True, 'ServerNetPoint'])

return_net_point_won.append((x.loc[True, 'NetPointWon'] - x.loc[True, 'ServerNetPointWon']) / (x.loc[True, 'NetPoint'] - x.loc[True, 'ServerNetPoint']))

As you might expect, an increase in service speed correlates with a break-even savings percentage. However, when you hit 130, the effect seems to level off. Ultimately, service speed can be formulated into a decision equation. If a player is able to determine how much more likely he is to lose his first service for each MPH increase and what percentage of the probability of getting a point for his second dose, he will be able to calculate the optimal rate at which they should serve.

import matplotlib.ticker as mtick #imports to enable percentage reading#graphs save percentage by serve speed

ax = sn.lineplot(x = buckets, y = [i * 100 for i in save_percentage])

ax.yaxis.set_major_formatter(mtick.PercentFormatter())#sets y axis as a percentplt.grid()

plt.title("Save Percentage by Serve Speed")plt.ylabel("Save Percentage")

plt.xlabel("Serve Speed")

Next, we look at winners and flawless mistakes. As we can see below, the winning percentage increases steadily as your bidding rate increases, where the percentage chance of your opponent hitting the winner remains relatively constant. This can lead to important tactical decisions as a server. For example, if you played a big batsman who hit many winners, hitting a slower dose may be okay because the speed of service doesn’t seem to affect their chances of hitting a winner. However, if you need to hit the winners to win most of your points, you can take the risk of hitting a little harder on both your first and second doses to increase the likelihood of the winner hitting.

#uses two lines on one plot to visualize Winner percentage

ax1 = sn.lineplot(x = buckets, y = [i *100 for i in winner_percentage_loss], label = 'Server')

ax2 = sn.lineplot(x = buckets, y = [i * 100 for i in winner_percentage_save], label = 'Returner')

ax1.yaxis.set_major_formatter(mtick.PercentFormatter())

ax2.yaxis.set_major_formatter(mtick.PercentFormatter())plt.grid()

plt.title("Winner Percentage Server vs Returner by Serve Speed")plt.ylabel("Winner Percentage")

plt.xlabel("Serve Speed MPH")

We must also take into account the probability of an unforced error. As the chart below shows, in practice, your opponent is unlikely to make a forced mistake if you hit a percentage of 90 MPH and 135 MPH – a percentage of 140 MPH is likely due to the small sample size. When the server steadily decreases your percentage to hit a non-compelling error as the service speed increases. Again, this can lead to important tactical changes. If your opponent is random and makes a lot of forced mistakes, you may want to hit a softer first service because the speed of service doesn’t affect their tendency to make a mistake. However, if you’ve been inconsistent as a server, reducing the likelihood of an unforced error hitting by hitting a harder service would probably be the right move.

#uses two lines on one plot to visualize unforced errors

ax1 = sn.lineplot(x = buckets, y = [i * 100 for i in unforced_error_percentage_save], label = 'Server')

ax2 = sn.lineplot(x = buckets, y = [i * 100 for i in unforced_error_percentage_loss], label = 'Returner')

ax1.yaxis.set_major_formatter(mtick.PercentFormatter())

ax2.yaxis.set_major_formatter(mtick.PercentFormatter())plt.grid()

plt.title("Unforced Error Percentage Server vs Returner by Serve Speed")plt.ylabel("Unforced Error Percentage")

plt.xlabel("Serve Speed MPH")

Finally, we find that when a player comes online, he has a 68% chance of winning a point. Getting into the net is a successful strategy, especially at break points, when you force your opponent to take a hard shot at high pressure. As we can see, the percentage of profit online remains somewhat constant as you serve speed, with both ends likely due to small sample sizes, but your ability to reach the net increases dramatically – from 10% at 80 MPH to about 20% at 120-130 MPH.

#graphs by percentage chance to get to the Net

ax = sn.lineplot(x = buckets, y = [i * 100 for i in net_point])ax.yaxis.set_major_formatter(mtick.PercentFormatter())plt.grid()

plt.title("Percentage Chance to Get to the Net")plt.ylabel("Server Net Point Percentage")

plt.xlabel("Serve Speed MPH")

#graphs by win percentage at the net

ax = sn.lineplot(x = buckets, y = [i * 100 for i in net_point_won])

ax.yaxis.set_major_formatter(mtick.PercentFormatter())plt.grid()

plt.title("Win Percentage at the Net by Serve Speed")plt.ylabel("Server Net Point Win Percentage")

plt.xlabel("Serve Speed MPH")