How to test ML models using K-fold cross-validation in Python

K-fold cross validation splits the data into “K-parts”, then iteratively use one part for testing and other parts as training data.

How many parts should you divide the data? Popular choices are:

  • K=5: Divide the data into five parts(20% each). Hence, 20% data for testing and 80% for training in every iteration
  • K=10: Divide the data into ten parts(10% each). Hence 10% data for testing and 90% for training in every iteration.

As compared to the Bootstrapping approach, which relies on multiple random samples from full data, K-fold cross-validation is a systematic approach.

The final accuracy is the average accuracy of all iterations.

Overall flow of K-fold cross-validation for ML models testing
Overall flow of K-fold cross-validation for ML models testing

You can learn more about sampling in the below video.

In the below code snippet, I show you how you can perform K-fold cross-validation on a Decision Tree regressor. The Same approach is an application to all other algorithms.

Sample Output

K-fold cross validation sample values in Python

In the next post, I will discuss another technique which is applicable when the data is dependent on time. Hence time-based systematic sampling is used for testing the models.

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

3 thoughts on “How to test ML models using K-fold cross-validation in Python”

Leave a Reply!

Your email address will not be published. Required fields are marked *