Exploring FIFA

Verdict- Market Value do affect Wage of a Player to an extent.

What is the preferred Foot among the players and how does it affect their positioning?We found of all the players in our dataset less than 25% of the Players are left footed (shown in Bar chart below).

To check whether the preferred foot of the player has any impact on the position of a player, we took the proportion of the preferred foot grouped by Position of a player.

#Get the required details from the dataframedfo = dfa[[‘Position’,’Preferred Foot’]].

groupby(‘Position’)[‘Preferred Foot’].

value_counts().

unstack()#Top 5 Left foot print(“Top 5 Left Foot Positions:”)print(dfo[‘Left’].

sort_values(ascending = False).

head(5))#Top 5 Right foot print(“!.Top 5 Right Foot Positions:”)print(dfo[‘Right’].

sort_values(ascending = False).

head(5))#plot datadfo[‘Left’]= dfo[‘Left’]/dfo[‘Left’].

sum()dfo[‘Right’]= dfo[‘Right’]/dfo[‘Right’].

sum()fig, ax = plt.

subplots(figsize=(15,7));dfo[‘Right’].

plot(ax=ax);dfo[‘Left’].

plot(ax=ax);plt.

legend([‘Right’,’Left’])plt.

title(“Position vs Foot”)It can be observed from above that the proportion is same for both left and right foot, with a few exceptions.

Which means it hardly matters whether you are a lefty or righty the distribution of Positions, the demand for one position over other will be roughly the same.

Further exploring, the top 5 positions as per the Foot (check below), we found that CB (Center Back) is the third most preferred spot with ST (Striker) being in top 5 for both.

Though there are some striking differences like Goal Keepers are mostly Right Footed!.Verdit- Yes, Foot do have an impact but only little not very substantial.

Furthermore Striker, Goalkeeper and Center-Back are top three positions in terms of no.

of players (refer screesnshot below)!Can we predict the Value of a player based on its attributes (like accuracy, shot power, reactions, dribbling etc)?#features chosendfv = dfa[['Preferred Foot','Position','Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes','Value_kEuro']]Position & Preferred foot columns are encoded using one-hot encoding, and further after formatting data, removing NaNs; we split the data and tried to predict using RandomForestRegressor and GridSearch (hyper-parameter tuning).

We achieved R-Squared Score of ‘0.

42’ .

#To predict the "value" based on chosen attributesy = dfv['Value_kEuro']X = dfv.

drop(['Value_kEuro'],axis=1)#train test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.

33, random_state=42)# Create the parameter grid based on the results of random search param_grid = { 'bootstrap': [True], 'max_depth': [80, 90, 100, 110,150,200], 'max_features': [2, 3,4], 'min_samples_leaf': [3, 4, 5], 'min_samples_split': [8, 10, 12], 'n_estimators': [100, 200, 300, 1000]}# Create a based modelrf = RandomForestRegressor()# Instantiate the grid search modelgrid_search = GridSearchCV(estimator = rf, param_grid = param_grid, cv = 3, n_jobs = -1, verbose = 2)# Fit the grid search to the datagrid_search.

fit(X_train,y_train)grid_search.

best_params_y_pred = grid_search.

predict(X_test)print("Rsquared: ",r2_score(y_test, y_pred))Note- The Model can further be improved, using different algorithms and/or feature engineering.

Also using Mutual Info Regressor, we found the following as Top 5 most important features in deciding the Value of a Player- Reactions, Ballcontrol, Composure, Dribbling, & ShortPassing.

Verdict- Not all Features are equally useful, also one can predict the Market value given enough data and attributes of Players.

Further we can also ask fairly straightforward questions from the data (given we have right amount of data).

Below are two such questions we attempted, followed by the Conclusion.

Clubs with the highest median wages (Top 11)?dfa[['Wage_kEuro','Club']].

groupby(['Club'])['Wage_kEuro'].

median().

sort_values(ascending=False).

head(11)Players with largest release clause (Top 11)?dfa[['ReleaseClaus_kEuro','Name']].

sort_values(by='ReleaseClaus_kEuro',ascending=False)['Name'].

head(11).

reset_index(drop=True)ConclusionGiven the FIFA 19 player dataset, several questions can be asked.

Above we came up with 5 questions that we tried to answer, you may not be interested in same question or may like to explore the dataset from a different perspective.

The Exploration we have covered in this blog post is the tip of the ice-berg, with advance machine learning techniques coupled with the right set of questions, a lot can be achieved and understood.Feel free to use the Notebook here or here and play around with the dataset.

Hope you enjoyed reading the Blog Post, and the above EDA gave you an overview of how the a certain dataset can be approached and what are some techniques that can be used.

.. More details

Leave a Reply