Creating a new variable in pandas data frame is an easy task! Either you can pass the values of that new column or you can generate the values of new columns based on the existing columns.
The code snippet shown below creates two new columns based on the Age column.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# Defining Employee Data import pandas as pd import numpy as np EmployeeData=pd.DataFrame({'Name': ['ram','ravi','sham','sita','gita'], 'id': [101,102,103,104,105], 'Gender': ['M','M','M','F','F'], 'Age': [21,25,24,28,25] }) # Priting data print(EmployeeData) # Creating a new variable in data based on existing variable EmployeeData['NewAge']= EmployeeData['Age'] + 10 # Priting data print(EmployeeData) # Creating a new variable in data based on existing variable EmployeeData['AgeSquared']= EmployeeData['Age'] **2 # Priting data print(EmployeeData) # Creating a new variable in data based on existing variable EmployeeData['AgeLOG']= np.log(EmployeeData['Age']) # Priting data print(EmployeeData) |
Sample output

Sometimes the logic to be applied for each value of an existing column may be a little complex to fit in one line, so we define a function for that and apply that function to each value of the column. The results are stored as a new column.
The code snippet shown below checks if the age of every person is greater than a given age or not, if yes, then that person is a senior employee otherwise a fresher!
The defined function takes two inputs, one first is the “inpAge”(the age of the person) and the “minAge” (the minimum age to be senior).
When you apply the “checkAge” function to the Age column with the help of apply() function, it calls the “checkAge” function once for each value of Age, which becomes the first input to the “checkAge” function, the second input is supplied in form of args tuple. If there are any more inputs, all of them will be supplied in the args tuple.
This produces a column result, we store it as “EmpType”, then assign it to the data frame as a new column.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Defining Employee Data import pandas as pd EmployeeData=pd.DataFrame({'Name': ['ram','ravi','sham','sita','gita'], 'id': [101,102,103,104,105], 'Gender': ['M','M','M','F','F'], 'Age': [21,25,24,28,25] }) # Priting data print(EmployeeData) # Creating new columns based on some calculation on existing columns # Defining a function to fulfill the logic def checkAge(inpAge, minAge): if(inpAge>=minAge): return 'senior' else: return 'fresher' # Apply the function for every row in Age column # args tuple can be used to pass multiple args # First argument for function is the values of Age column EmpType=EmpData['Age'].apply(checkAge, args=(25,)) EmployeeData['EmployeeType']=EmpType # Priting data print(EmployeeData) |
Sample Output:

