dev java, machine_learning

CLIQUE (CLustering In QUEst) algorithm is a grid-based clustering algorithm that partitions each dimension of the dataset as a grid.

CLIQUE Implementation

The algorithm starts by assigning each data point to the closest grid in the matrix.

CLIQUE 1

After loading the data it looks something like this:

CLIQUE 2

In order for a grid cell to be considered as “dense” it has to have data points more than or equal to the threshold. In this sample the threshold was provided as 2. So after the adjacent dense cells are joined together we get 3 clusters:

CLIQUE 3

Resources

dev java, machine_learning

A rough set is an approximation of a set that gives the lower and upper borders of the set.

Rough set implementation

The sample implementation starts off with reading the data set and the configuration file. The object and attribute index lists are acquired from the configuration. The items that have indices specified in the object list are considered as the set of objects and the rest of the items are the complement of the data set.

RoughSet 1

If a given index in attribute index list equal in both object list and complementary object list are not equal the object is considered to belong to the negative border, if not it belongs to the positive border.

Sample Output

RoughSet Sample Output

Resources

dev java, machine_learning

Genetic algorithm sounds fascinating to me as in a way it mimics evolution. It’s quite easy to implement and understand.

Genetic Algorithm

The algorithm has 3 stages:

  • Selection: Every generation is subjected to a selection process which eliminates the unfit ones.
  • Cross-over: This happens randomly among the selected generation and some genomes swap genes.
  • Mutation: Some genes are mutated randomly based on a given threshold value for mutation.

Implementation

The implementation starts with generating random bit sequences to represent genomes. The length of the genome and the number of genomes in a generation are specified in the configuration.

Genetic Algorithm 1

First generation is calculated randomly and their fitness is calculated based on the fitness function the system uses. After the groundwork has been done we have a genome pool each with their own fitness values and selection ranges. The width of the selection range is determined by the fitness of the genome. So that the larger the range the more likely they are selected for the next generation.

Genetic Algorithm 2

In selection process, a random value is generated and the genome whose selection range includes that value is selected. Same genome can be selected more than once.

Genetic Algorithm 3

After the next generation is selected and their new fitness values are calculated cross-over starts. In cross-over 2 random genomes are picked and they swap genes. The point in the genome string is specified in the configuration (3 in this example)

Genetic Algorithm 4

The final phase of the algorithm is mutation. In this step a number of genomes are picked and some bits is flipped randomly.

Genetic Algorithm 5

The most important thing about this algorithm is having a good fitness function that can accurately compute the fitness values for a given feature.

Resources