Filtering data in Pandas DataFrame

Sometimes you need to get only few rows or only a few columns from the data or a mix of both. This is known as data filtration or data subsetting.

Getting a part of data based on certain conditions is a daily task for a Data Scientist! Pandas has good filtering mechanisms which are vector based, fast and easy to formulate!

Listing down some of the most commonly used filtering mechanisms for Data Frames

  • Extracting only one column
  • Extracting multiple columns
  • Extracting rows based on a condition on a single column
  • Extracting rows based on conditions on multiple columns

Extracting only one column from a data frame

This is the simplest form of subsetting, all you need to do here is provide the name of the column as a string variable as shown below.

Sample Output:

Selecting only one column from a DataFrame
Selecting only one column from a DataFrame

Extracting Multiple columns from a data frame

When you want to extract multiple columns, you can pass the names of columns as a list. There are two options, either you can pass the list of columns directly or store the list as a variable and pass that variable.

Sample Output:

Selecting multiple columns from a pandas DataFrame
Selecting multiple columns from a pandas DataFrame

Extracting rows based on a condition on a single column

A filter condition in python looks more like an english statement!

Below statement shows the boolean vector output created by a condition statement in python

In the example below, you are comparing if the Age of the employee is greater than or equal to 24 or not. You can see in the boolean vector output, it says True if the condition evaluates is true otherwise it says False.

This information can be stored and passed to the data frame for filtering and getting only those rows where the condition was True.

Sample Output:

Filter condition in Python
Filter condition in Python

Now using the condition and extracting only those rows where Age is greater or equal to 25

Sample Output:

Filtering data on a single condition
Filtering data on a single condition

Extracting rows based on conditions on multiple columns

You can also combine multiple conditions to filter data. These conditions can be combined in below listed waus

  • AND operation: Both conditions must be true
  • OR operation: Either of the conditions can be true

AND operation gives lesser rows because it has to satisfy all the conditions.

OR operation on the same conditions gives more rows as it has to satisfy any of the conditions.

In below example Its shown how to extract rows satisfying two conditions like Age>=25 and Gender is Female.

Sample Output:

Filtering data based on multiple conditions
Filtering data based on multiple conditions

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *