dev java, machine_learning

My previous post was about grid-based clustering. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) takes another approach called density-based clustering. It grows regions with high density (above threshold provided) into clusters and discovers clusters of arbitrary shape.

DBSCAN Implementation

First off the data is loaded and a distance matrix is calculated based on the data points.


The algorithm visits every data point and finds its neighbours. If the neighbours are dense enough than the cluster is expanded to include those points as well.


So data points that are close enough to each other are included in the same cluster.


Joining dense clusters is a similar approach taken in grid clustering. The difference is this way the clusters can have arbitrary shapes.


dev java, machine_learning

CLIQUE (CLustering In QUEst) algorithm is a grid-based clustering algorithm that partitions each dimension of the dataset as a grid.

CLIQUE Implementation

The algorithm starts by assigning each data point to the closest grid in the matrix.


After loading the data it looks something like this:


In order for a grid cell to be considered as “dense” it has to have data points more than or equal to the threshold. In this sample the threshold was provided as 2. So after the adjacent dense cells are joined together we get 3 clusters:



dev java, machine_learning

A rough set is an approximation of a set that gives the lower and upper borders of the set.

Rough set implementation

The sample implementation starts off with reading the data set and the configuration file. The object and attribute index lists are acquired from the configuration. The items that have indices specified in the object list are considered as the set of objects and the rest of the items are the complement of the data set.

RoughSet 1

If a given index in attribute index list equal in both object list and complementary object list are not equal the object is considered to belong to the negative border, if not it belongs to the positive border.

Sample Output

RoughSet Sample Output