How to find best hyperparameters using GridSearchCV in python

Hyperparameter tuning is one of the most important steps in machine learning. As the ML algorithms will not produce the highest accuracy out of the box. You need to tune their hyperparameters to achieve the best accuracy. You can follow any one of the below strategies to find the best parameters.

In this post, I will discuss Grid Search CV. The CV stands for cross-validation. Grid Search CV tries all the exhaustive combinations of parameter values supplied by you and chooses the best out of it.

Consider below example if you are providing a list of values to try for three hyperparameters then it will try all possible combinations. In this case, all combinations mean 5X2X2 = 20 combinations of hyperparameters. Hence adding one more additional hyperparameter will exponentially increase the number of combinations to try, hence increasing the time taken dramatically. You must be careful in choosing only the most important parameters to tune.

How will I know what values to provide for each hyperparameter?

By looking at the online sklearn documentation for the algorithm or by using shift+tab after clicking on the algorithm function you can check the sample values of each hyperparameter.

With some experience, you will develop some ideas about what values work better for most of the data, hence, you can prepare a laundry list of good values you would want to try for each dataset.

In the below example GridSearchCV function performs the task of trying out all the parameter combinations provided. Here it turns out to be 20 combinations.

For each combination, GridSearchCV also performs cross-validation. You can specify the depth of Cross-Validation using the parameter ‘cv’.

cv=5 means, the data will be divided into 5 parts, one part will be used for testing and the other four parts for training. This is also known as K-fold Cross-validation of the model, here K=5. This will be repeated 5 times by changing the test data every time. The final accuracy is the average of these 5 times.

Any value between 5 to 10 is good for cross-validation. Remember, the higher the value, the more time it will take for computation because the model will be created those extra number of times.

If you choose cv=5 in the below case, then, 20X5=100 times the Random Forest model will be fitted.

Notice that, rows sampling is not done here as it is done by GridSearchCV based on the ‘cv’ input provided.

n_jobs=1 means how many parallel threads to be executed.

verbose=5 means print the model fitting details, the higher the value, the more the details printed.

Sample Output

GridSearchCV hyperparameter tuning
GridSearchCV hyperparameter tuning

How to access the best hyperparameters?

The best parameters are stored as “best_params_” inside the results. You can now create the Random Forest model using these best parameters.

Sample Output:

Finding best hyperparameters in GridSearchCV
Finding best hyperparameters in GridSearchCV

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *