-->

dev java, machine_learning

Genetic algorithm sounds fascinating to me as in a way it mimics evolution. It’s quite easy to implement and understand.

Genetic Algorithm

The algorithm has 3 stages:

  • Selection: Every generation is subjected to a selection process which eliminates the unfit ones.
  • Cross-over: This happens randomly among the selected generation and some genomes swap genes.
  • Mutation: Some genes are mutated randomly based on a given threshold value for mutation.

Implementation

The implementation starts with generating random bit sequences to represent genomes. The length of the genome and the number of genomes in a generation are specified in the configuration.

Genetic Algorithm 1

First generation is calculated randomly and their fitness is calculated based on the fitness function the system uses. After the groundwork has been done we have a genome pool each with their own fitness values and selection ranges. The width of the selection range is determined by the fitness of the genome. So that the larger the range the more likely they are selected for the next generation.

Genetic Algorithm 2

In selection process, a random value is generated and the genome whose selection range includes that value is selected. Same genome can be selected more than once.

Genetic Algorithm 3

After the next generation is selected and their new fitness values are calculated cross-over starts. In cross-over 2 random genomes are picked and they swap genes. The point in the genome string is specified in the configuration (3 in this example)

Genetic Algorithm 4

The final phase of the algorithm is mutation. In this step a number of genomes are picked and some bits is flipped randomly.

Genetic Algorithm 5

The most important thing about this algorithm is having a good fitness function that can accurately compute the fitness values for a given feature.

Resources

dev csharp, machine_learning

I’m keeping on reviving my old projects. This is the second data mining algorithm implementation. It is another clustering algorithm called k-means.

k-means Algorithm

Algorithm groups and creates k clusters from n data points. First the cluster centres are picked randomly from the data points. Then the entire dataset is iterated and all points are assigned to their closest cluster. Closest cluster is determined by measuring the distance of the data point to the centroid of the clusters. This process is repeated until there is no change in the dataset and all points are assigned to the closest ones.

K-means Results

Implementation

The project contains 6 libraries:

  • VP.KMeans.Core: Core library including the algorithm implementation
  • VP.KMeansClient.GUI: User interface for entering the parameters and plotting the clusters
  • VP.KMeansClient.Console: Console user interface. No fancy plots, just an output file is generated
  • VP.KMeans.DataGenerator.Core: Library to generate test data
  • VP.KMeans.DataGenerator.Console: Console application to feed the core library to generate test data
  • CPI.Plot3D: External library to plot the results

Resources

dev java, machine_learning

Another data mining algorithm: AGNES (Agglomerative Nesting)

AGNES Algorithm

AGNES takes a different approach than k-means which was the subject of my previous post. Initially it considers all data points as a separate cluster.

AGNES 1

Then finds the minimum distance between clusters and merges the closest clusters:

AGNES 2

The resulting cluster is added to the all cluster list the merged clusters are removed as they are no longer valid.

Resources