How to create a classification model using Xgboost in Python

Xgboost is one of the great algorithms in machine learning. It is fast and accurate at the same time! More information about it can be found here.

You can learn more about XGBoost algorithm in the below video.

The below snippet will help to create a classification model using xgboost algorithm.

#### Create Loan Data for Classification in Python ####
import pandas as pd
import numpy as np
ColumnNames=['CIBIL','AGE', 'SALARY', 'APPROVE_LOAN']
DataValues=[[480, 28, 610000, 'Yes'],
             [480, 42, 140000, 'No'],
             [480, 29, 420000, 'No'],
             [490, 30, 420000, 'No'],
             [500, 27, 420000, 'No'],
             [510, 34, 190000, 'No'],
             [550, 24, 330000, 'Yes'],
             [560, 34, 160000, 'Yes'],
             [560, 25, 300000, 'Yes'],
             [570, 34, 450000, 'Yes'],
             [590, 30, 140000, 'Yes'],
             [600, 33, 600000, 'Yes'],
             [600, 22, 400000, 'Yes'],
             [600, 25, 490000, 'Yes'],
             [610, 32, 120000, 'Yes'],
             [630, 29, 360000, 'Yes'],
             [630, 30, 480000, 'Yes'],
             [660, 29, 460000, 'Yes'],
             [700, 32, 470000, 'Yes'],
             [740, 28, 400000, 'Yes']]

#Create the Data Frame
LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames)
LoanData.head()

#Separate Target Variable and Predictor Variables
TargetVariable='APPROVE_LOAN'
Predictors=['CIBIL','AGE', 'SALARY']
X=LoanData[Predictors].values
y=LoanData[TargetVariable].values

#Split the data into training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

###################################################################
###### Xgboost Classification in Python #######
import pandas as pd
from xgboost import XGBClassifier
clf=XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=500, objective='binary:logistic', booster='gbtree')

#Printing all the parameters of XGBoost
print(clf)

#Creating the model on Training Data
XGB=clf.fit(X_train,y_train)
prediction=XGB.predict(X_test)

#Measuring accuracy on Testing Data
from sklearn import metrics
print(metrics.classification_report(y_test, prediction))
print(metrics.confusion_matrix(y_test, prediction))

#Plotting the feature importance for Top 10 most important columns
%matplotlib inline
feature_importances = pd.Series(XGB.feature_importances_, index=Predictors)
feature_importances.nlargest(10).plot(kind='barh')

#Printing some sample values of prediction
TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors)
TestingDataResults['TargetColumn']=y_test
TestingDataResults['Prediction']=prediction
TestingDataResults.head()

#### Create Loan Data for Classification in Python ####

import pandas as pd

import numpy as np

ColumnNames=['CIBIL','AGE', 'SALARY', 'APPROVE_LOAN']

DataValues=[[480, 28, 610000, 'Yes'],

[480, 42, 140000, 'No'],

[480, 29, 420000, 'No'],

[490, 30, 420000, 'No'],

[500, 27, 420000, 'No'],

[510, 34, 190000, 'No'],

[550, 24, 330000, 'Yes'],

[560, 34, 160000, 'Yes'],

[560, 25, 300000, 'Yes'],

[570, 34, 450000, 'Yes'],

[590, 30, 140000, 'Yes'],

[600, 33, 600000, 'Yes'],

[600, 22, 400000, 'Yes'],

[600, 25, 490000, 'Yes'],

[610, 32, 120000, 'Yes'],

[630, 29, 360000, 'Yes'],

[630, 30, 480000, 'Yes'],

[660, 29, 460000, 'Yes'],

[700, 32, 470000, 'Yes'],

[740, 28, 400000, 'Yes']]

#Create the Data Frame

LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames)

LoanData.head()

#Separate Target Variable and Predictor Variables

TargetVariable='APPROVE_LOAN'

Predictors=['CIBIL','AGE', 'SALARY']

X=LoanData[Predictors].values

y=LoanData[TargetVariable].values

#Split the data into training and testing set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

###################################################################

###### Xgboost Classification in Python #######

import pandas as pd

from xgboost import XGBClassifier

clf=XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=500, objective='binary:logistic', booster='gbtree')

#Printing all the parameters of XGBoost

print(clf)

#Creating the model on Training Data

XGB=clf.fit(X_train,y_train)

prediction=XGB.predict(X_test)

#Measuring accuracy on Testing Data

from sklearn import metrics

print(metrics.classification_report(y_test, prediction))

print(metrics.confusion_matrix(y_test, prediction))

#Plotting the feature importance for Top 10 most important columns

%matplotlib inline

feature_importances = pd.Series(XGB.feature_importances_, index=Predictors)

feature_importances.nlargest(10).plot(kind='barh')

#Printing some sample values of prediction

TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors)

TestingDataResults['TargetColumn']=y_test

TestingDataResults['Prediction']=prediction

TestingDataResults.head()

Author Details

Farukh Hashmi

Lead Data Scientist

Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

https://thinkingneuron.com/

thinkingneuron@gmail.com

3 thoughts on “How to create a classification model using Xgboost in Python”

Deepti
August 20, 2021 at 10:15 am

Hi! Farukh sir. Can you share a code example for classification and Prediction using XGBoost of a dataset. Your example is really helpful for learning.

1. Farukh Hashmi
  August 20, 2021 at 10:29 am
  
  Hi Deepti,
  
  Thank you for the kind words!
  You can look into any one of the classification case studies in the below link for end-to-end examples.
  https://thinkingneuron.com/python-case-studies/
  
Shah
July 8, 2022 at 10:54 am

Thanks for the guidance, I followed your code for 10K rows and 20 Column (the last column is my target), but the accuracy was 60%, I increased the n-estimator to 10,000, max_depth=5 and learning rate= 0.5, the accuracy increased to 64%. Do you have any clue why I can not get higher accuracy?

3 thoughts on “How to create a classification model using Xgboost in Python”

Leave a Reply! Cancel Reply