Hyperparameter tuning is one of the most important steps in machine learning. As the ML algorithms will not produce the highest accuracy out of the box. You need to tune their hyperparameters to achieve the best accuracy. You can follow any one of the below strategies to find the best parameters.
- Manual Search
- Grid Search CV
- Random Search CV
- Bayesian Optimization
In this post, I will discuss the Random Search CV. The CV stands for cross-validation.
What is the difference between GridSearch CV and RandomSearchCV?
The main difference between these two techniques is the obligation to try all parameters. GridSearchCV has to try ALL the parameter combinations, however, RandomSearchCV can choose only a few ‘random’ combinations out of all the available combinations.
For example in the below parameter options, GridSearchCV will try all 20 combinations, however, for RandomSearchCV you can specify how many to try out of all these. by specifying a parameter called “n_iter“. If you keep n_iter=5 it means any random 5 combinations will be tried.
1 2 3 4 |
# Parameters to try Parameter_Trials={'n_estimators':[100,200,300,500,1000], 'criterion':['gini','entropy'], 'max_depth': [2,3]} |

In the below code, the RandomizedSearchCV function will try any 5 combinations of hyperparameters.
We have specified cv=5. This means the model will be tested(cross-validated) 5 times. By dividing the data into 5 parts, choosing one part as testing and the other four as training data. The final accuracy for each combination of hyperparameter is the average of these five iterations.
Hence here total times the model will be fitted is n_iter=5 X cv=5 = 25 times!
n_jobs=1 specifies the number of parallel threads to run and verbose=5 means how much detail to print out while fitting the model, the higher the value, the more the details printed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
################################################################### #### Create Loan Data for Classification in Python #### import pandas as pd import numpy as np ColumnNames=['CIBIL','AGE', 'SALARY', 'APPROVE_LOAN'] DataValues=[[480, 28, 610000, 'Yes'], [480, 42, 140000, 'No'], [480, 29, 420000, 'No'], [490, 30, 420000, 'No'], [500, 27, 420000, 'No'], [510, 34, 190000, 'No'], [550, 24, 330000, 'Yes'], [560, 34, 160000, 'Yes'], [560, 25, 300000, 'Yes'], [570, 34, 450000, 'Yes'], [590, 30, 140000, 'Yes'], [600, 33, 600000, 'Yes'], [600, 22, 400000, 'Yes'], [600, 25, 490000, 'Yes'], [610, 32, 120000, 'Yes'], [630, 29, 360000, 'Yes'], [630, 30, 480000, 'Yes'], [660, 29, 460000, 'Yes'], [700, 32, 470000, 'Yes'], [740, 28, 400000, 'Yes']] #Create the Data Frame LoanData=pd.DataFrame(data=DataValues,columns=ColumnNames) LoanData.head() #Separate Target Variable and Predictor Variables TargetVariable='APPROVE_LOAN' Predictors=['CIBIL','AGE', 'SALARY'] X=LoanData[Predictors].values y=LoanData[TargetVariable].values ############################################################ # Random Search CV from sklearn.model_selection import RandomizedSearchCV #Random Forest (Bagging of multiple Decision Trees) from sklearn.ensemble import RandomForestClassifier RF = RandomForestClassifier() # Parameters to try Parameter_Trials={'n_estimators':[100,200,300,500,1000], 'criterion':['gini','entropy'], 'max_depth': [2,3]} Random_Search = RandomizedSearchCV(RF, Parameter_Trials, n_iter=5, cv=5, n_jobs=1, verbose=5) RandomSearchResults=Random_Search.fit(X,y) |
Sample Output

How to access the best hyperparameters?
The best combination of hyperparameters is stored as “best_params_” in the results.
1 2 3 4 5 |
# Fetching the best hyperparameters RandomSearchResults.best_params_ # All the parameter combinations tried by RandomizedSearchCV RandomSearchResults.cv_results_['params'] |
Sample Output
