Stats 101: What is Mode value and when to use it?

The mode is the most frequently occurring value in a set.

For example, the Mode value is ‘2’ in the below data set.
This is because ‘2’ is occurring five times. which is the maximum occurrence of any digit in this data

What will happen if two values have the same occurrence?

Then there will be two modes and the data is known as Bimodal.

In the below example, both ‘1’ and ‘2’ are modes as both of these values are occurring two times each.

Mode is a type of average

There are three major types of averages, Mean, Median, and Mode.
Each one of them has a purpose and a special meaning. In general, all these averages are used to find the central point of a dataset, however not each one of them is able to provide the correct picture all the time.

Simple Definitions

The mode is different from the other two averages in terms of calculation.

Mean = Sum of all values/number of values
Median = Arrange the data in increasing order and find the middle value
Mode = Most frequently present value in data

When Should I use Mode?

A simple rule of thumb is to use mode when you see repetitions in the dataset. That means there are some values that are occurring multiple times. Now, this is something that is applicable to numeric as well as categorical data both.

Especially, when we are dealing with categorical data, Mean and Median both does not make sense and in such scenarios, only Mode can be used to find out the average or the central tendency of the data.

Mode can be used for numeric as well as categorical data

Numeric Data Example

In the below scenario, the data is numeric but there are repetitive values present.
Since the data is numeric I can calculate the mean and median to try to find the average value.
However, the best representative of the group is ‘5’ since it is occurring the maximum number of times in the data and this is calculated using Mode.

Categorical Data Example

In the below scenario, the data is non-numeric as it is just a listing of fruit names. Hence, we cannot calculate the mean, simply because we cannot add strings. For the same reason, the Median cannot be calculated here as well since these are strings and cannot be arranged in increasing order.

But we can count how many numbers of times apple is present, how many numbers of times banana is present so on and so forth. And finally, select that fruit which is present the maximum number of times. This simple process is nothing but finding the Mode value.

In the below data, Mode is ‘apple’ since it is present the maximum number of times as compared to other fruits

  • apple-3 times
  • banana-2 times
  • orange-1 time

Use mode to find the central tendency when there are repetitions

Data Science Tip: Missing value treatment in Machine Learning

While preparing data for machine learning two major types of columns are observed, numeric or categorical.

Numeric data can have two types, either continuous or discrete.

For numeric data the missing values must be replaced with the median value if the data is continuous, i.e. repetitions are not present, e.g. Sales, Profit, Turnover.
But if the numeric data is discrete, i.e. only a few numeric values are getting repeated, then the missing values for such columns must be done by the mode value. e.g. Gender (0/1), Job Code (101, 102, 103, 103 etc.)

When we need to replace missing values in a column which is categorical, use Mode value of that column as the replacement. Because the Mean and Median cannot be computed for such data.

Conclusion:

  • The Mode is a type of average
  • Use Mode when you see repetitions in the dataset.
  • The mode is used for numeric as well as categorical data
  • Categorical Missing values should be replaced by Mode
  • Discrete numeric missing values should be replaced by Mode

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

2 thoughts on “Stats 101: What is Mode value and when to use it?”

Leave a Reply!

Your email address will not be published. Required fields are marked *