How to test ML models using time-based sampling in Python

This type of scenario occurs when the data is dependent on time. So in this scenario, the data is split based on the time. For example, the data from the first three quarters of the year is used as training and the last quarter is used as testing. The same concept applies to months/weeks/years.

You can learn more about the types of sampling in the below video.

This is different from K-fold cross-validation and bootstrapping, because here we do purposeful sampling based on the requirement.

Purposeful time based sampling for testing ML models

In the below code I will show you how to test a decision tree model by sampling based on time.

Usually you will get a date column in the data, using which you can split the data, based on the threshold.

Sample Output

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *