Decision Trees is a supervised machine learning algorithm that can be used for regression as well as classification use cases. More information about it can be found here.
You can learn more about Decision Tree Classifiers in the below video.
The below code helps to generate a decision tree for classification use cases.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #### Create Loan Data for Classification in Python #### import pandas as pd import numpy as np ColumnNames=['CIBIL','AGE', 'SALARY', 'APPROVE_LOAN'] DataValues=[[480, 28, 610000, 'Yes'],              [480, 42, 140000, 'No'],              [480, 29, 420000, 'No'],              [490, 30, 420000, 'No'],              [500, 27, 420000, 'No'],              [510, 34, 190000, 'No'],              [550, 24, 330000, 'Yes'],              [560, 34, 160000, 'Yes'],              [560, 25, 300000, 'Yes'],              [570, 34, 450000, 'Yes'],              [590, 30, 140000, 'Yes'],              [600, 33, 600000, 'Yes'],              [600, 22, 400000, 'Yes'],              [600, 25, 490000, 'Yes'],              [610, 32, 120000, 'Yes'],              [630, 29, 360000, 'Yes'],              [630, 30, 480000, 'Yes'],              [660, 29, 460000, 'Yes'],              [700, 32, 470000, 'Yes'],              [740, 28, 400000, 'Yes']] #Create the Data Frame LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames) LoanData.head() #Separate Target Variable and Predictor Variables TargetVariable='APPROVE_LOAN' Predictors=['CIBIL','AGE', 'SALARY'] X=LoanData[Predictors].values y=LoanData[TargetVariable].values #Split the data into training and testing set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ################################################################### ###### Single Decision Tree CLASSIFICATION in Python ####### import pandas as pd from sklearn import tree #choose from different tunable hyper parameters clf = tree.DecisionTreeClassifier(max_depth=3,criterion='entropy') #Printing all the parameters of Decision Trees print(clf) #Creating the model on Training Data DTree=clf.fit(X_train,y_train) prediction=DTree.predict(X_test) #Measuring accuracy on Testing Data from sklearn import metrics print(metrics.classification_report(y_test, prediction)) print(metrics.confusion_matrix(y_test, prediction)) #Plotting the feature importance for Top 10 most important columns %matplotlib inline feature_importances = pd.Series(DTree.feature_importances_, index=Predictors) feature_importances.nlargest(10).plot(kind='barh') #Printing some sample values of prediction TestingDataResults=pd.DataFrame(data=X_test, columns=Predictors) TestingDataResults['TargetColumn']=y_test TestingDataResults['Prediction']=prediction TestingDataResults.head() | 
		
			Author Details		
			
			
							
			
				Lead Data Scientist			 		
		
	Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

