How to remove duplicate rows from data in pandas DataFrame

Duplicate rows can be deleted from a pandas data frame using drop_duplicates() function.

You can choose to delete rows which have all the values same using the default option subset=None

Or you can choose a set of columns to compare, if values in two rows are the same for those set of columns then the whole row will be dropped.

The option keep=’first’ will keep the first occurrence and delete the second occurrence of the duplicate row.

Sample Data:

Dropping duplicate rows from the pandas data frame
Dropping duplicate rows from the pandas data frame


Dropping duplicate rows based on a few columns

You can drop rows based on only a few selected columns as well by supplying the list of columns as an input to the drop_duplicates() function.

Sample Output:

Dropping duplicate rows based on few columns
Dropping duplicate rows based on few columns

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *