Uniform Manifold Approximation and Projection (UMAP) is an improvisation of the t-SNE algorithm. The basic concept is the same, projecting higher dimension data into lower dimensions. More information about this algorithm can be found here.
You can learn more about UMAP algorithm in the below video.
The code snippet below will help you to create a UMAP model for any given data in Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# Library for UMAP #!conda install -c conda-forge umap-learn # Sample code to do UMAP in Python %matplotlib inline import umap import matplotlib.pyplot as plt #Creating the employee attitude survey data for UMAP rating=[43,63,71,61,81,43,58,71,72,67,64,67,69,68,77,81,74,65,65,50,50,64,53,40,63,66,78,48,85,82] complaints=[51,64,70,63,78,55,67,75,82,61,53,60,62,83,77,90,85,60,70,58,40,61,66,37,54,77,75,57,85,82] privileges=[30,51,68,45,56,49,42,50,72,45,53,47,57,83,54,50,64,65,46,68,33,52,52,42,42,66,58,44,71,39] learning=[39,54,69,47,66,44,56,55,67,47,58,39,42,45,72,72,69,75,57,54,34,62,50,58,48,63,74,45,71,59] raises=[61,63,76,54,71,54,66,70,71,62,58,59,55,59,79,60,79,55,75,64,43,66,63,50,66,88,80,51,77,64] critical=[92,73,86,84,83,49,68,66,83,80,67,74,63,77,77,54,79,80,85,78,64,80,80,57,75,76,78,83,74,78] advance=[45,47,48,35,47,34,35,41,31,41,34,41,25,35,46,36,63,60,46,52,33,41,37,49,33,72,49,38,55,39] #Joining all the vectors together to form input matrix X SurveyData=list(zip(rating,complaints,privileges,learning,raises,critical,advance)) import pandas as pd InpData=pd.DataFrame(data=SurveyData, columns=["rating","complaints","privileges","learning","raises","critical","advance"]) print(InpData.head(10)) #Creating input data numpy array X=InpData.values ####################################################################### # Reducting the data in 2 components using UMAP UMAP_Object=umap.UMAP(n_neighbors=5, min_dist=0.3, n_components=2) ComponentValues=UMAP_Object.fit_transform(X) #Creating the dataframe ReducedData=pd.DataFrame(data=ComponentValues, columns=['Comp1','Comp2']) print(ReducedData.head(10)) |
Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!