How to join two pandas data frames

You can perform SQL-style join between two pandas data frames as well! all you need is one common column in both the data frames. In SQL it is known as the primary key.

In the below example, two data frames are joined based on the common key “id”.

Sample Output:

Inner Join

An inner join will combine all the columns of the first data frame with the second where common values of ‘id’ columns are found.

Sample Output:

Inner join on two pandas data frames
Inner join on two pandas data frames


Left Join

A left join will keep all the rows from the first data frame and put NA for corresponding missing column values in the second data frame based on the common column. In the below example there were no records for id=104 and 105 in the second data frame, hence the values became NaN.

Sample Output:

Left join on two pandas data frames
Left join on two pandas data frames


Right Join

Right join will keep all the rows of the second data frame and try to find the corresponding rows in the first data frame based on the common column.

In the below example, the second data frame has only 3 rows and all the ids are present in the first column as well, hence, the corresponding values will be fetched.

Sample Output:

Right join between two pandas data frames
Right join between two pandas data frames


What if there is no common column?

If there are no common columns between two data frames and still you want to add few columns from one data frame to another, then you can simply select those columns from the source data frame and assign it to the target data frame.

In the below example, EmployeeData does not has “Dep” and “DOJ” columns, if you simply want to add it without using any common column then simply select those two columns from the DepartmentData and assign it to the EmployeeData.

The only thing you need to understand is, the number of rows must be the same in both the data frames, otherwise, NaN values will appear in place of missing rows as shown in the below example.

Sample Output:

Combining two data frames without common column
Combining two data frames without common column

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *