How to create clusters using DBSCAN in Python

Density Based Spatial Clustering of Applications with Noise(DBSCAN) is one of the clustering algorithms which can find clusters in noisy data. It works even on those datasets where K-Means fail to find meaningful clusters. More information about it can be found here.

You can learn more about the DBSCAN algorithm in the below video.

The below code snippet will help to create clusters in data using DBSCAN.

Creating data for clustering

Sample Output:

Moons clustering data for DBCAN
Moons clustering data for DBCAN


Finding Best hyperparameters for DBSCAN using Silhouette Coefficient

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b – a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples – 1.

The best value of the Silhouette Coefficient is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster

Sample Output

Finding best hyperparameters for DBSCAN
Finding best hyperparameters for DBSCAN


Creating clusters using the best hyperparameters

DBCAN clustering in python
DBSCAN clustering in python

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

2 thoughts on “How to create clusters using DBSCAN in Python”

  1. Hi! Thanks for the code snippet. Just a heads up it appears there may be a rendering error in line 20:

    if(len(np.unique(db.fit_predict(X)))>1):

Leave a Reply!

Your email address will not be published. Required fields are marked *