How to measure the correlation between two categorical variables in python

This is a situation that arises often during classification machine learning. The target variable is categorical and the predictors can be either continuous or categorical, so when both of them are categorical, then the strength of the relationship between them can be measured using a Chi-square test.

Chi-square test finds the probability of a Null hypothesis(H0).

  • Assumption(H0): The two columns are NOT related to each other
  • Result of Chi-Sq Test: The Probability of H0 being True
  • More information on ChiSq can be found here

It can help to understand whether both the categorical variables are correlated with each other or not.

In the below scenario, we try to measure the correlation between GENDER and LOAN_APPROVAL.

Sample Output:

Chi-square test between two categorical variables to find correlation
Chi-square test between two categorical variables to find the correlation

H0: The variables are not correlated with each other. This is the H0 used in the Chi-square test.

In the above example, the P-value came higher than 0.05. Hence H0 will be accepted. Which means the variables are not correlated with each other.

This means, if two variables are correlated, then the P-value will come very close to zero.

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

2 thoughts on “How to measure the correlation between two categorical variables in python”

  1. How to decide what will be H0? The variables are not correlated with each other or The variables are correlated with each other.

Leave a Reply!

Your email address will not be published. Required fields are marked *