Factor analysis is an unsupervised machine learning technique that finds hidden groups of columns. More information about it can be found here.
You can learn more about Factor Analysis in the below video.
The below code snippet will help to perform factor analysis.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# Sample code to do Factor Analysis in Python # Creating the employee attitude survey data for Factor Analysis rating[43,63,71,61,81,43,58,71,72,67,64,67,69,68,77,81,74,65,65,50,50,64,53,40,63,66,78,48,85,82] complaints=[51,64,70,63,78,55,67,75,82,61,53,60,62,83,77,90,85,60,70,58,40,61,66,37,54,77,75,57,85,82] privileges=[30,51,68,45,56,49,42,50,72,45,53,47,57,83,54,50,64,65,46,68,33,52,52,42,42,66,58,44,71,39] learning=[39,54,69,47,66,44,56,55,67,47,58,39,42,45,72,72,69,75,57,54,34,62,50,58,48,63,74,45,71,59] raises=[61,63,76,54,71,54,66,70,71,62,58,59,55,59,79,60,79,55,75,64,43,66,63,50,66,88,80,51,77,64] critical=[92,73,86,84,83,49,68,66,83,80,67,74,63,77,77,54,79,80,85,78,64,80,80,57,75,76,78,83,74,78] advance=[45,47,48,35,47,34,35,41,31,41,34,41,25,35,46,36,63,60,46,52,33,41,37,49,33,72,49,38,55,39] #Joining all the vectors together to form input matrix X SurveyData=list(zip(rating,complaints,privileges,learning,raises,critical,advance)) import pandas as pd InpData=pd.DataFrame(data=SurveyData, columns=["rating","complaints","privileges","learning","raises","critical","advance"]) print(InpData.head(10)) #Creating input data numpy array X=InpData.values ###################################################################### # Exploratory Factor Analysis To find how many factors are present in the data # Finding how many factors are present in the data from factor_analyzer import FactorAnalyzer fa = FactorAnalyzer() Factors=fa.fit(X) # Plotting the scree-plot EigenValues=Factors.get_eigenvalues()[0] import matplotlib.pyplot as plt %matplotlib inline plt.plot(EigenValues) # Creating 4 Factors based on the screeplot from sklearn.decomposition import FactorAnalysis FA=FactorAnalysis(n_components=4, random_state=0) Factors=FA.fit(X) # Printing Factor Loadings print(pd.DataFrame(FA.components_, columns=FactorData.columns, index=['PC1','PC2','PC3','PC4'])) |
Sample Output

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!
