How to do factor analysis in Python

Factor analysis is an unsupervised machine learning technique that finds hidden groups of columns. More information about it can be found here.

You can learn more about Factor Analysis in the below video.

The below code snippet will help to perform factor analysis.

# Sample code to do Factor Analysis in Python
# Creating the employee attitude survey data for Factor Analysis
rating&#91;43,63,71,61,81,43,58,71,72,67,64,67,69,68,77,81,74,65,65,50,50,64,53,40,63,66,78,48,85,82]
complaints=&#91;51,64,70,63,78,55,67,75,82,61,53,60,62,83,77,90,85,60,70,58,40,61,66,37,54,77,75,57,85,82]
privileges=&#91;30,51,68,45,56,49,42,50,72,45,53,47,57,83,54,50,64,65,46,68,33,52,52,42,42,66,58,44,71,39]
learning=&#91;39,54,69,47,66,44,56,55,67,47,58,39,42,45,72,72,69,75,57,54,34,62,50,58,48,63,74,45,71,59]
raises=&#91;61,63,76,54,71,54,66,70,71,62,58,59,55,59,79,60,79,55,75,64,43,66,63,50,66,88,80,51,77,64]
critical=&#91;92,73,86,84,83,49,68,66,83,80,67,74,63,77,77,54,79,80,85,78,64,80,80,57,75,76,78,83,74,78]
advance=&#91;45,47,48,35,47,34,35,41,31,41,34,41,25,35,46,36,63,60,46,52,33,41,37,49,33,72,49,38,55,39]

#Joining all the vectors together to form input matrix X
SurveyData=list(zip(rating,complaints,privileges,learning,raises,critical,advance))
import pandas as pd
InpData=pd.DataFrame(data=SurveyData, columns=&#91;"rating","complaints","privileges","learning","raises","critical","advance"])
print(InpData.head(10))

#Creating input data numpy array
X=InpData.values

######################################################################

# Exploratory Factor Analysis To find how many factors are present in the data
# Finding how many factors are present in the data
from factor_analyzer import FactorAnalyzer
fa = FactorAnalyzer()
Factors=fa.fit(X)

# Plotting the scree-plot
EigenValues=Factors.get_eigenvalues()&#91;0]
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(EigenValues)

# Creating 4 Factors based on the screeplot
from sklearn.decomposition import FactorAnalysis
FA=FactorAnalysis(n_components=4, random_state=0)
Factors=FA.fit(X)

# Printing Factor Loadings
print(pd.DataFrame(FA.components_, columns=FactorData.columns, index=&#91;'PC1','PC2','PC3','PC4']))

# Sample code to do Factor Analysis in Python

# Creating the employee attitude survey data for Factor Analysis

rating[43,63,71,61,81,43,58,71,72,67,64,67,69,68,77,81,74,65,65,50,50,64,53,40,63,66,78,48,85,82]

complaints=[51,64,70,63,78,55,67,75,82,61,53,60,62,83,77,90,85,60,70,58,40,61,66,37,54,77,75,57,85,82]

privileges=[30,51,68,45,56,49,42,50,72,45,53,47,57,83,54,50,64,65,46,68,33,52,52,42,42,66,58,44,71,39]

learning=[39,54,69,47,66,44,56,55,67,47,58,39,42,45,72,72,69,75,57,54,34,62,50,58,48,63,74,45,71,59]

raises=[61,63,76,54,71,54,66,70,71,62,58,59,55,59,79,60,79,55,75,64,43,66,63,50,66,88,80,51,77,64]

critical=[92,73,86,84,83,49,68,66,83,80,67,74,63,77,77,54,79,80,85,78,64,80,80,57,75,76,78,83,74,78]

advance=[45,47,48,35,47,34,35,41,31,41,34,41,25,35,46,36,63,60,46,52,33,41,37,49,33,72,49,38,55,39]

#Joining all the vectors together to form input matrix X

SurveyData=list(zip(rating,complaints,privileges,learning,raises,critical,advance))

import pandas as pd

InpData=pd.DataFrame(data=SurveyData, columns=["rating","complaints","privileges","learning","raises","critical","advance"])

print(InpData.head(10))

#Creating input data numpy array

X=InpData.values

######################################################################

# Exploratory Factor Analysis To find how many factors are present in the data

# Finding how many factors are present in the data

from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer()

Factors=fa.fit(X)

# Plotting the scree-plot

EigenValues=Factors.get_eigenvalues()[0]

import matplotlib.pyplot as plt

%matplotlib inline

plt.plot(EigenValues)

# Creating 4 Factors based on the screeplot

from sklearn.decomposition import FactorAnalysis

FA=FactorAnalysis(n_components=4, random_state=0)

Factors=FA.fit(X)

# Printing Factor Loadings

print(pd.DataFrame(FA.components_, columns=FactorData.columns, index=['PC1','PC2','PC3','PC4']))

Sample Output

Author Details

Farukh Hashmi

Lead Data Scientist

Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

https://thinkingneuron.com/

thinkingneuron@gmail.com

Leave a Reply! Cancel Reply