Want to take your software engineering career to the next level? Join the mailing list for career tips & advice Click here


Density Based Clustering (DeBaCl) Toolbox

Subscribe to updates I use DeBaCl

Statistics on DeBaCl

Number of watchers on Github 87
Number of open issues 8
Average time to close an issue 11 days
Main language Python
Average time to merge a PR 1 day
Open pull requests 2+
Closed pull requests 1+
Last commit over 4 years ago
Repo Created almost 8 years ago
Repo Last Updated 9 months ago
Size 5.96 MB
Organization / Authorcoaxlab
Page Updated
Do you use DeBaCl? Leave a review!
View open issues (8)
View DeBaCl activity
View on github
Book a Mock Interview With Me (Silicon Valley Engineering Leader, 100s of interviews conducted)
Software engineers: It's time to get promoted. Starting NOW! Subscribe to my mailing list and I will equip you with tools, tips and actionable advice to grow in your career.
Evaluating DeBaCl for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

DeBaCl: DEnsity-BAsed CLustering

Travis CI Pending Pull-Requests Github Issues License Docs

DeBaCl is a Python library for density-based clustering with level set trees.

Level set trees are a statistically-principled way to represent the topology of a probability density function. This representation is particularly useful for several core tasks in statistics:

  • clustering, especially for data with multi-scale clustering behavior
  • describing data topology
  • exploratory data analysis
  • data visualization
  • anomaly detection

DeBaCl is a Python implementation of the Level Set Tree method, with an emphasis on computational speed, algorithmic simplicity, and extensibility.


DeBaCl is available under the 3-clause BSD license.


DeBaCl is currently compatible with Python 2.7 only. Other versions may work, but caveat emptor; at this time DeBaCl is only officially tested on Python 2.7. The package can be downloaded and installed from the Python package installer. From a terminal:

$ pip install debacl

It can also be installed by cloning this GitHub repo. This requires updating the Python path to include the cloned repo. On linux, this looks something like:

$ git clone https://github.com/CoAxLab/DeBaCl/
$ export PYTHONPATH='DeBaCl'


All of the dependencies are Python packages that can be installed with either conda or pip. DeBaCl 1.0 no longer depends on igraph, which required tricky manual installation.


  • Python 2.7
  • (coming soon: Python 3.4)

Required packages:

  • numpy
  • networkx
  • prettytable

Strongly recommended packages

  • matplotlib
  • scipy

Optional packages

  • scikit-learn


Construct the level set tree

import debacl as dcl
from sklearn.datasets import make_moons

X = make_moons(n_samples=100, noise=0.1, random_state=19)[0]

tree = dcl.construct_tree(X, k=10, prune_threshold=10)
print tree
| id | start_level | end_level | start_mass | end_mass | size | parent | children |
| 0  |    0.000    |   0.196   |   0.000    |  0.220   | 100  |  None  |  [1, 2]  |
| 1  |    0.196    |   0.396   |   0.220    |  0.940   |  37  |   0    |    []    |
| 2  |    0.196    |   0.488   |   0.220    |  1.000   |  41  |   0    |    []    |

Plot the level set tree

Clusters are represented by the vertical line segments in the dendrogram. In this example the vertical axis is plotted on the density scale, so that the lower endpoint of a cluster's branch is at its start_level and the upper endpoint is at its end_level (see the table above), and the length of the branch is the persistence of the cluster.

fig = tree.plot(form='density')[0]

Query the level set tree for cluster labels

import matplotlib.pyplot as plt

labels = tree.get_clusters(method='leaf')  # each leaf node is a cluster
clusters = X[labels[:, 0], :]

fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c='black', s=40, alpha=0.4)
ax.scatter(clusters[:, 0], clusters[:, 1], c=labels[:, 1], s=80, alpha=0.9,
ax.set_ylabel('x1', rotation=0)


Running unit tests

From the top level of the repo:

$ nosetests -s -v debacl/test


DeBaCl open issues Ask a question     (View All Issues)
  • about 4 years end_level has huge number
  • about 4 years huge memory being used
  • almost 5 years Surprising behavior for `get_clusters` with the `first-k` method.
  • almost 5 years Travis badge doesn't update with correct status
DeBaCl open pull requests (View All Pulls)
  • Added functionality to allow X to be a precomputed distance matrix
  • Add version checking for dictionary calls
DeBaCl list of languages used
Other projects in Python
Powered by Autocode - Instant Webhooks, Scripts and APIs
Autocode logo wordmark