Find our github page here!
Dataset
We decided us for the dataset UFC-Fight historical data from 1993 to 2019 from the website Kaggle. This set provides data of all UFC fights from 1993 to 2019 and all fighters, that were members of UFC during this time period.
The data was split in two csv files, that we downloaded from the website and imported in our notebook. There were 2 datasets, which where already preprocessed, which we didn’t use. The reason for that is, that in these sets already modifications were done. We preferred to work with the raw data, to do our own modifications.
Since we used the raw data, in the next step we had to do several modifications, before we can analyze the data. In the first step we imported both datasets in to two dataframes. Then we tidied each dataframe, before we finally merged these two into one main dataframe.
First, we load all the necessary libraries for our project.
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import re
import datetime
Transforming and Tidying: Fighter Dataset
In this step, we load the first dataset of the fighters. This dataset contains all attributes of each fighter. We recognized that the dataframe has several columns with more than one value. Therefore, we needed to split the columns into two (All Name columns into First and Last Name). Because all new columns were added at the right side of the dataframe, we needed to rearrange them back. Furthermore, we had to change the data types, because the whole dataframe was mainly stored as object types. While we wanted to change the data types, we faced the same problem with more than one value in a column, e.g. the weight of a fighter was stored in 200 lb, instead only the number. We extracted the unit description and put it in the column names. For better analyzes afterwards we change the height and the reach of a fighter from imperial units (feet and inches) into metric units (cm).
# load first cvs file, with fighter attributes, to dataframe
fighter_df = pd.read_csv("./data/raw_fighter_details.csv")
# start tidying fighter_df dataframe...
# seperate name column into first and last name and drop old column
new = fighter_df["fighter_name"].str.split(" ", n = 1, expand = True)
fighter_df["First Name"]= new[0]
fighter_df["Last Name"]= new[1]
fighter_df.drop(columns =["fighter_name"], inplace = True)
# rearange columns
fighter_df = fighter_df[['First Name', 'Last Name', 'Height', 'Weight', 'Reach', 'Stance', 'DOB']]
# change data types and rename it
fighter_df["DOB"] = pd.to_datetime(fighter_df["DOB"])
fighter_df["Weight"]= fighter_df["Weight"].str.split(" ", n = 1, expand = True)
fighter_df.rename(columns={'Weight':'Weight (lbs)'}, inplace=True)
fighter_df["Weight (lbs)"] = pd.to_numeric(fighter_df["Weight (lbs)"])
# convert height and reach from feet and inches to cm and rename columns
fighter_df['Height'] = fighter_df['Height'].str.replace('"','')
fighter_df['Height'] = fighter_df['Height'].str.replace('\'','')
newH = fighter_df["Height"].str.split(" ", n = 1, expand = True)
newH[0] = pd.to_numeric(newH[0])
newH[1] = pd.to_numeric(newH[1])
newH["cm"] = newH[0] * 30.48 + newH[1] * 2.54
fighter_df["Height"] = newH["cm"]
fighter_df.rename(columns={'Height':'Height (cm)'}, inplace=True)
# convert reach from inches to cm
fighter_df['Reach'] = fighter_df['Reach'].str.replace('"','')
fighter_df['Reach'] = pd.to_numeric(fighter_df['Reach'])
fighter_df['Reach'] = fighter_df['Reach'] * 2.54
fighter_df.rename(columns={'Reach':'Reach (cm)'}, inplace=True)
# print tidy data -> we leave NaN/NaT, because if we don't it will drop to many fighters who had a fight yet
fighter_df.head()
Transforming and Tidying: Fights Dataset
After loading the dataset, we had the same problem with too many values in the columns. Since there are always two fighters and a referee involved, we had to do it several times for the names. After rearranging the new name columns, we come to the attributes. These were always listed with two values, e.g. number of significant strikes of attempts. It was not possible to understand the number attempts clearly, because there were different numbers in other columns and the description of Kaggle was also not helpful. Therefore, we decided to drop it, because it is not necessary for the further analyzes. We keep the number of successful attacks (strikes, clinch etc.). Only the attempts of takedowns are important to us (because it has a big influence on the points and the whole fitch), which we split into a new column. Finally, we changed the necessary data types to finish the tidying process.
# load second cvs file with all fights from 1993 - 2019 to dataframe
totalfight_df = pd.read_csv("./data/raw_total_fight_data.csv", sep=";")
# start tidying totalfight_df dataframe...
# for R_fighter: seperate name column in first and last name
newR = totalfight_df["R_fighter"].str.split(" ", n = 1, expand = True)
totalfight_df["R_First Name"]= newR[0]
totalfight_df["R_Last Name"]= newR[1]
totalfight_df.drop(columns =["R_fighter"], inplace = True)
# for B_fighter: seperate name column in first and last name
newB = totalfight_df["B_fighter"].str.split(" ", n = 1, expand = True)
totalfight_df["B_First Name"]= newB[0]
totalfight_df["B_Last Name"]= newB[1]
totalfight_df.drop(columns =["B_fighter"], inplace = True)
# for winner: seperate name column in first and last name
newW = totalfight_df["Winner"].str.split(" ", n = 1, expand = True)
totalfight_df["Winner_First Name"]= newW[0]
totalfight_df["Winner_Last Name"]= newW[1]
totalfight_df.drop(columns =["Winner"], inplace = True)
# for referee: seperate name column in first and last name
newR = totalfight_df["Referee"].str.split(" ", n = 1, expand = True)
totalfight_df["Referee_First Name"]= newR[0]
totalfight_df["Referee_Last Name"]= newR[1]
totalfight_df.drop(columns =["Referee"], inplace = True)
#rearange columns
cols_to_order = ['R_First Name', 'R_Last Name', 'B_First Name', 'B_Last Name', 'Winner_First Name','Winner_Last Name']
new_columns = cols_to_order + (totalfight_df.columns.drop(cols_to_order).tolist())
totalfight_df = totalfight_df[new_columns]
# the columns SIG_STR (significant strikes) and TOTAL_STR (total strikes) is always described as a number of attempts
# Problem: the attemps differ in the two columns and there is no description why
# Solution: we drop the attemps and only count the strikes (without of attempts)
totalfight_df["R_SIG_STR."] = totalfight_df["R_SIG_STR."].str.split(" ", n = 1, expand = True)
totalfight_df["R_SIG_STR."] = pd.to_numeric(totalfight_df["R_SIG_STR."])
totalfight_df["B_SIG_STR."] = totalfight_df["B_SIG_STR."].str.split(" ", n = 1, expand = True)
totalfight_df["B_SIG_STR."] = pd.to_numeric(totalfight_df["B_SIG_STR."])
totalfight_df["R_TOTAL_STR."] = totalfight_df["R_TOTAL_STR."].str.split(" ", n = 1, expand = True)
totalfight_df["R_TOTAL_STR."] = pd.to_numeric(totalfight_df["R_TOTAL_STR."])
totalfight_df["B_TOTAL_STR."] = totalfight_df["B_TOTAL_STR."].str.split(" ", n = 1, expand = True)
totalfight_df["B_TOTAL_STR."] = pd.to_numeric(totalfight_df["B_TOTAL_STR."])
# change data types of SIG_STR_pct and rename it in SIG_STR (%)
totalfight_df["R_SIG_STR_pct"] = totalfight_df["R_SIG_STR_pct"].str.split("%", n = 1, expand = True)
totalfight_df["R_SIG_STR_pct"] = pd.to_numeric(totalfight_df["R_SIG_STR_pct"])
totalfight_df.rename(columns={'R_SIG_STR_pct':'R_SIG_STR (%)'}, inplace=True)
totalfight_df["B_SIG_STR_pct"] = totalfight_df["B_SIG_STR_pct"].str.split("%", n = 1, expand = True)
totalfight_df["B_SIG_STR_pct"] = pd.to_numeric(totalfight_df["B_SIG_STR_pct"])
totalfight_df.rename(columns={'B_SIG_STR_pct':'B_SIG_STR (%)'}, inplace=True)
# split column TD (takedown of attempts) and save takedowns in TD and attemps in TD_pct
# we overwrite the TD_pct with attempts and rename it, because we can calculate the % anytime
newTD = totalfight_df["R_TD"].str.split("of", n = 1, expand = True)
totalfight_df["R_TD"]= newTD[0]
totalfight_df["R_TD_pct"]= newTD[1]
totalfight_df.rename(columns={'R_TD_pct':'R_TD_ATT'}, inplace=True)
newTD = totalfight_df["B_TD"].str.split("of", n = 1, expand = True)
totalfight_df["B_TD"]= newTD[0]
totalfight_df["B_TD_pct"]= newTD[1]
totalfight_df.rename(columns={'B_TD_pct':'B_TD_ATT'}, inplace=True)
# change data type to numeric
cols = ["R_TD", "B_TD", "R_TD_ATT", "B_TD_ATT"]
totalfight_df[cols] = totalfight_df[cols].apply(pd.to_numeric, errors='coerce')
# for the columns HEAD, BODY, LEG, DISTANCE, CLINCH, GROUND we drop the second value attemps, because it is less relevant
# e.g. Head hits of attemped head hits, will only show the hits
totalfight_df["R_HEAD"] = totalfight_df["R_HEAD"].str.split(" ", n = 1, expand = True)
totalfight_df["B_HEAD"] = totalfight_df["B_HEAD"].str.split(" ", n = 1, expand = True)
totalfight_df["R_BODY"] = totalfight_df["R_BODY"].str.split(" ", n = 1, expand = True)
totalfight_df["B_BODY"] = totalfight_df["B_BODY"].str.split(" ", n = 1, expand = True)
totalfight_df["R_LEG"] = totalfight_df["R_LEG"].str.split(" ", n = 1, expand = True)
totalfight_df["B_LEG"] = totalfight_df["B_LEG"].str.split(" ", n = 1, expand = True)
totalfight_df["R_DISTANCE"] = totalfight_df["R_DISTANCE"].str.split(" ", n = 1, expand = True)
totalfight_df["B_DISTANCE"] = totalfight_df["B_DISTANCE"].str.split(" ", n = 1, expand = True)
totalfight_df["R_CLINCH"] = totalfight_df["R_CLINCH"].str.split(" ", n = 1, expand = True)
totalfight_df["B_CLINCH"] = totalfight_df["B_CLINCH"].str.split(" ", n = 1, expand = True)
totalfight_df["R_GROUND"] = totalfight_df["R_GROUND"].str.split(" ", n = 1, expand = True)
totalfight_df["B_GROUND"] = totalfight_df["B_GROUND"].str.split(" ", n = 1, expand = True)
# change all columns above to numeric
cols = ["R_HEAD", "B_HEAD", "R_BODY", "B_BODY", "R_LEG", "B_LEG", "R_DISTANCE", "B_DISTANCE", "R_CLINCH", "B_CLINCH", "R_GROUND", "B_GROUND"]
totalfight_df[cols] = totalfight_df[cols].apply(pd.to_numeric, errors='coerce')
# change data type of date/time columns
totalfight_df["last_round_time"] = pd.to_timedelta(totalfight_df["last_round_time"]+':00')
totalfight_df["date"] = pd.to_datetime(totalfight_df["date"])
# print tidy dataframe
totalfight_df.head()
Transforming and Tidying: Merging process
In this step we merged the dataframes. First we merged the fighter_df and totalfight_df to main_df. Then we did some transformation on the new dataframe main_df. The next step was creating the score for every fighter. To do this, we had to count every fight of each fighter, counting their wins and calculating their losses. We did this in the totalfight_df. the last step was to and put these information into the fighter_df dataframe. To do so, we had to create new dataframes, which we merged with the fighter_df, then we copied the copied the columns (totalfights, wins, losses) to the fighter_df.
# merging both tables on name
# Problem: we have two fighters in totalfight_df which have to merge with the table fighter_df
# Solution: we do two merges: the first with the Red fighter (R_...), second with the Blue fighter (B_...) -> new df main_df
R_match = pd.merge(totalfight_df, fighter_df, how='left', left_on=['R_First Name','R_Last Name'], right_on = ['First Name','Last Name'])
main_df = pd.merge(R_match, fighter_df, how='left', left_on=['B_First Name', 'B_Last Name'], right_on = ['First Name','Last Name'])
# drop duplicates, name columns that were merged from fighter_df
main_df.drop(columns =["First Name_x", "Last Name_x", "First Name_y", "Last Name_y"], inplace = True)
main_df.dtypes
# rename new columns the same way like the others, with R_/B_ to be consistent
main_df.rename(columns={'Height (cm)_x':'R_Height (cm)', "Weight (lbs)_x": "R_Weight (lbs)", "Reach (cm)_x":"R_Reach (cm)", "Stance_x":"R_Stance", "DOB_x":"R_DOB"}, inplace=True)
main_df.rename(columns={'Height (cm)_y':'B_Height (cm)', "Weight (lbs)_y": "B_Weight (lbs)", "Reach (cm)_y":"B_Reach (cm)", "Stance_y":"B_Stance", "DOB_y":"B_DOB"}, inplace=True)
main_df.rename(columns={'Totalfights_x':'R_Totalfights', "wins_x": "R_wins", "losses_x":"R_losses"}, inplace=True)
main_df.rename(columns={'Totalfights_y':'B_Totalfights', "wins_y": "B_wins", "losses_y":"B_losses"}, inplace=True)
# the column fight type has still to many values, which we need to extract in other columns and drop the old column
main_df['Sex'] = np.where(main_df['Fight_type'].str.contains("Women"), 'Women', 'Man')
main_df['Title bout'] = np.where(main_df['Fight_type'].str.contains("Title"), 1, 0)
main_df['Title_bout'] = main_df['Title bout'].astype('bool')
main_df.drop(columns =["Title bout"], inplace = True)
main_df.drop(columns =["Fight_type"], inplace = True)
# creating Score for every fighter
# creating new dataframes and adding a column with the counts of fights and wins grouped by all fighters
df23 = totalfight_df.groupby(['R_First Name','R_Last Name']).size().reset_index().rename(columns={0:'count'})
df24 = totalfight_df.groupby(['B_First Name','B_Last Name']).size().reset_index().rename(columns={0:'count'})
df25 = totalfight_df.groupby(['Winner_First Name','Winner_Last Name']).size().reset_index().rename(columns={0:'wwins'})
#merging the 3 dataframes to calculate the totalfights, wins and losses of every fighter
df_2324 = pd.merge(df23, df24, how='outer', left_on=['R_First Name','R_Last Name'], right_on = ['B_First Name','B_Last Name'])
df_2324.fillna(0, inplace=True)
df_2324['ttotalfights'] = df_2324['count_x'] + df_2324['count_y']
df_232425 = pd.merge(df_2324, df25, how='outer', left_on=['R_First Name','R_Last Name'], right_on = ['Winner_First Name','Winner_Last Name'])
df_232425.fillna(0, inplace=True)
df_232425['llosses'] = df_232425['ttotalfights'] - df_232425['wwins']
#merging the dataframe with our fighter_df dataframe and creating 3 new columns(totalfights, wins, losses) in the fighter_df
df_xx = pd.merge(fighter_df, df_232425, how='left', left_on=['First Name','Last Name'], right_on = ['B_First Name','B_Last Name'])
fighter_df['Totalfights'] = df_xx['ttotalfights']
fighter_df['wins'] = df_xx['wwins']
fighter_df['losses'] = df_xx['llosses']
# get UFC Weight classes from: https://www.ladbrokes.com.au/betting-info/ufc/weight-divisions/
pd.options.mode.chained_assignment = None
main_df['Weight_class'] = np.nan
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 0) & (main_df['B_Weight (lbs)'] <= 115)] = 'Strawweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 115) & (main_df['B_Weight (lbs)'] <= 125)] = 'Flyweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 125) & (main_df['B_Weight (lbs)'] <= 135)] = 'Bantamweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 135) & (main_df['B_Weight (lbs)'] <= 145)] = 'Featherweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 145) & (main_df['B_Weight (lbs)'] <= 155)] = 'Lightweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 155) & (main_df['B_Weight (lbs)'] <= 170)] = 'Welterweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 170) & (main_df['B_Weight (lbs)'] <= 185)] = 'Middleweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 185) & (main_df['B_Weight (lbs)'] <= 205)] = 'Light Heavyweight'
main_df['Weight_class'][(main_df['B_Weight (lbs)'] > 205)] = 'Heavyweight'
# our dataframe is now tidy and ready for meaningful analyzes
main_df.head()
Graph 1: Fights and finishes comparison
UFC is gaining more and more popularity. In the early days the sport was considered too brutal. Only very few fighters dared to enter the Octagon. But meanwhile MMA has become a very popular sport. One reason for this is certainly the organization UFC, which was criticized in the early days, but now has become the biggest and most succesful organization in the sport. Every fighter would like to join this organization and become a champion in his weight class.
For this reason we want to create a graph that shows the number of fights per year in this organization, since the beginning in 1993 till 2018. At the same time we want to show the number of finishes. Finishes are fights that are decided by KO/TKO or submission and not by Decision.
#create new column (finish: yes/no)
main_df.loc[main_df['win_by'].str.contains("KO/TKO"),'finish'] = 'True'
main_df.loc[main_df['win_by'].str.contains("TKO - Doctor's Stoppage"),'finish'] = 'True'
main_df.loc[main_df['win_by'].str.contains("Submission"),'finish'] = 'True'
main_df.finish.fillna(value="False", inplace=True)
#creating a new dataframe, to manipulate the values for the plot
new12 = main_df.copy(deep=True)
#setting the date as index, to group by years
new12.set_index("date", inplace=True)
#grouping by year and creating new dataframe
plot_df = new12.groupby(by=[new12.index.year])['finish'].describe()
#calculating the fights, ending with finishes and with decisision an creating a new boolean column
plot_df.loc[plot_df['top'].str.contains("True"),'finish'] = plot_df.freq
plot_df['finish'] = plot_df['finish'].astype(float)
plot_df['count'] = plot_df['count'].astype(float)
plot_df.loc[plot_df['top'].str.contains("False"),'finish'] = plot_df['count'] - plot_df['freq']
plot_df["finishing_ratio"] = plot_df['finish'] / plot_df['count']
plot_df.rename(columns={'count':'Total Fights', 'finish':'Total Submissions'}, inplace=True)
#plotting all fights and all fights wit finishes
plot1 = plot_df["Total Fights"].plot.line(figsize=(15, 10), legend=True)
plot1 = plot_df["Total Submissions"].plot.line(figsize=(15, 10), legend=True)
plot1.set_ylabel("Amount of fights each year")
plot1.set_xlabel("")
plt.xticks(np.arange(1993, 2020, 1))
As you can see, the number of fights per year was very small at the beginning. In the year 1993 there were only 8 fights and the number did not rise very high in the further years. But from 2004/2005 the number of fights per year increased very fast until 2014 when there were 500 fights. With this number it remained then up to now. What is interesting is that with the larger number of fights the ratio of finishes to total fights became smaller.
Graph 2: Finishing ratio
To check this, the next graph shows the ratio of finishes to total battles over the years.
#plotting finishing ration
plot2 = plot_df["finishing_ratio"].plot.line(figsize=(15, 10))
plot2.set_ylabel("Finishings in % of total fights")
plot2.set_xlabel("")
plt.xticks(np.arange(1993, 2019, 1))
As you can see. The ratio became smaller over the years. There can be several reasons for this. MMA is still a very young sport. But in recent years more and more good fighters have appeared. Also the technique has been improved and today you know that if you want to be successful, you have to master all segments of this sport. This includes stand-up, wrestling and ground fighting. But the most important reason is certainly the fact that UFC only includes the elite of the sport in its organization. In the beginning, UFC still had trouble finding fighters who are willing to fight in the MMA. Due to the increasing popularity of the sport more and more fighters have appeared and UFC has no trouble finding more fighters, but gets the best fighters from other organizations worldwide.
Graph 3: HOW DO FIGHTS GET FINISHED
Next, we want to know how do fights get finished. There are usually 6 different ways in which a fight can be finished:
Unfortunately there were no differences made between KO and TKO. Therefore these two types were combined. The term "other" is used if the fight has to be stopped for other reasons, or if the fight was later considered invalid, e.g. if a fighter did not reach the weight or if he was banned due to doping.
#condense win_by column
new12.loc[new12['win_by'].str.contains("Decision"),'win_by'] = 'Decision'
new12.loc[new12['win_by'].str.contains("TKO"),'win_by'] = 'KO/TKO'
new12.loc[new12['win_by'].str.contains("Overturned"),'win_by'] = 'Other'
new12.loc[new12['win_by'].str.contains("Could"),'win_by'] = 'Other'
#plotting finishes
finishes_counts = new12.win_by.value_counts()
plot3 = finishes_counts.plot.pie(figsize=(15, 10), fontsize=11)
plot3.set_ylabel("")
As you can see, most of them are decisions. This confirms our two graphs above. The second most fights are ended by KO/TKOs. Only at third place come the submissions. So you can assume that fighters who have better striking skills often finish a fight prematurely than fighters who have better ground fighting skills.
Graph 4: In which round are the most finishes
Next we want to find out in which round the most finishes are in percentage terms
#show only fights, wich endet with submission
aax5 = new12.loc[new12.finish == "True"]
# get ration of last round to total fights with submission
lastRound_ratio = aax5.groupby("last_round")["R_First Name"].count() / aax5["R_First Name"].count()
# plot result
plot4 = lastRound_ratio.plot.bar(figsize=(15, 10), title="Proportion of last round, finished by submission")
plot4.set_xlabel("Round")
plot4.set_ylabel("Proportion in %")
It can be seen, that of all finishes, the most are in the first round. the chance of beeing finished decreases every round. One reason is ceertainly, that fighter get tired after the first round and the striking power decreases. Since the most fights have just 3 round and only the main or title fights have 5, the percentage of finishes in round 4 & 5 are very small.
Graph 5: KO's for each weight class
Most people say that, the higher the weight class the more KO's there are. But this rumor could never be proofed with data. Therefore, we filtered our main dataframe to get all fights with KO's. But the total KO's in each weight class is not meaningful, because some weight classes had more fights than others. Since we created the weightclasses, we can now find out the ratio between KO's and total fights of every weightclass.
# get all fights with KO's
KO_counts = main_df.loc[main_df.win_by == 'KO/TKO']
# get ratio between KO's and total fights
KO_perc = KO_counts.Weight_class.value_counts() / main_df.Weight_class.value_counts()
# sort result ascending
KO_perc.sort_values(inplace=True)
# make plot more nicely
ax1 = KO_perc.plot.bar(figsize=(20,10), title="KO's in % of Total Fights")
# print bar plot
ax1
As we can see in the bar plot below there is a correlation between weight and KO's. The heavier the weight of a fighter, which is expressed in the weight classes, the higher is the possibility of a KO. The only exceptions are the weight classes Featherweight and Bantamweight which are switched. But as recognizable in the graph below the difference is very small, it looks almost the same. Almost 50% of every fight in the heabyweight division ends with an KO/TKO. On this account we can say this mith is true.
Graph 6: Average strikes per round compared with proportion of KO/TKO's
After proofing that in the heavyweight weight class are the most KO’s, we were looking for some reasons. Therefore, we focused on the strikes in each round. Since we only have the total strikes each fighter (Red and Blue) made in a fight, we needed to divide it through the rounds of a fight, to get the average strikes per round. We plotted the result and named each point with the according weight class.
# get all weight classes
all_wclasses = main_df.Weight_class.unique()
# get all fights
tot_fights = main_df.groupby("Weight_class")["R_First Name"].describe()
# get all rounds
tot_rounds = main_df.groupby("Weight_class")["last_round"].sum()
# get mean hits per round in each weight class, save result in new df
plt6_df = pd.DataFrame((main_df.groupby("Weight_class")["R_TOTAL_STR."].sum() + main_df.groupby("Weight_class")["B_TOTAL_STR."].sum()) / tot_rounds)
plt6_df["mean_strike_round"] = pd.DataFrame((main_df.groupby("Weight_class")["R_TOTAL_STR."].sum() + main_df.groupby("Weight_class")["B_TOTAL_STR."].sum()) / tot_rounds)
# get proportion of KO/TKO of all fights and change data type
plt6_df["KO/TKO"] = (main_df.loc[main_df.win_by == 'KO/TKO'].groupby("Weight_class")["R_First Name"].describe()["count"]) / main_df.groupby("Weight_class")["R_First Name"].describe()["count"]
plt6_df['KO/TKO'] = plt6_df['KO/TKO'].astype(float)
# plot result
aax1 = plt6_df.plot.scatter(x ='mean_strike_round', y ='KO/TKO', alpha=0.5, figsize=(10, 8), s=300)
aax1.set_ylabel("Proportion of KO/TKO of all fights")
aax1.set_xlabel("Average strikes each round")
# labeling dots for weight classes
for i, txt in enumerate(plt6_df.index):
aax1.annotate(txt, (plt6_df["mean_strike_round"].iloc[i], plt6_df["KO/TKO"].iloc[i]))
As seen above, there is no correlation between average strikes per round and proportion of KO/TKO. Heavyweight is the weight class with the fewest strikes of all classes, followed by Light Heavyweight etc. So, the high proportion of KO/TKO in heavyweight is not caused by the amount of strikes in each round. Interesting fact, we see on the graph above, the higher the weight class the less strikes there are. Maybe we need to focus on other data.
**Graph 7: Ratio of significant strikes per round compared with proportion of KO/TKO's
We could not find a reason for the high KO/TKO rate in the heavyweight weight class yet, but we found out that the higher the weight class the less strikes there are. We now analyzed the proportion of significant strikes of total strikes to proportion of KO/TKO. Changing the focus on another variable shows a completely different picture as shown below.
# get all weight classes
all_wclasses = main_df.Weight_class.unique()
# get all fights
tot_fights = main_df.groupby("Weight_class")["R_First Name"].describe()
# get all rounds
tot_rounds = main_df.groupby("Weight_class")["last_round"].sum()
# get mean hits per round in each weight class, save result in new df
plt6_df["sig_strike_round"] = pd.DataFrame((main_df.groupby("Weight_class")["R_SIG_STR."].sum() + main_df.groupby("Weight_class")["B_SIG_STR."].sum()) / tot_rounds)
plt6_df["ratio_strike_round"] = plt6_df["sig_strike_round"] / plt6_df["mean_strike_round"]
# get proportion of KO/TKO of all fights and change data type
plt6_df["KO/TKO"] = (main_df.loc[main_df.win_by == 'KO/TKO'].groupby("Weight_class")["R_First Name"].describe()["count"]) / main_df.groupby("Weight_class")["R_First Name"].describe()["count"]
plt6_df['KO/TKO'] = plt6_df['KO/TKO'].astype(float)
plt6_df["ratio"] = (main_df.groupby("Weight_class")["R_SIG_STR (%)"].mean() + main_df.groupby("Weight_class")["B_SIG_STR (%)"].mean())/2
# plot result
aax1 = plt6_df.plot.scatter(x ='ratio', y ='KO/TKO', alpha=0.5, figsize=(10, 8), s=300, c="red")
aax1.set_ylabel("Proportion of KO/TKO of all fights")
aax1.set_xlabel("ratio of significant strikes")
# labeling dots for weight classes
for i, txt in enumerate(plt6_df.index):
aax1.annotate(txt, (plt6_df["ratio"].iloc[i], plt6_df["KO/TKO"].iloc[i]))
Here we could found a correlation that proofs our hypothesis and gives an explanation. Thus in the heavyweight, there is the highest proportion of significant strikes (in relation to total strikes) which leads also to the highest KO/TKO rate.