-->

dev java, data_mining

I was digging through some old projects and found out the Data Mining and Machine Learning projects I implemented. Instead of letting them gather dust(!) in the hard disk I decided to review and publish the source code. This will also give me a chance to revise my data mining knowledge. Let’s start with Apriori algorithm.

Apriori Algorithm

It is an algorithm to determine to most frequent items in a given collection. The term “frequent” is defined by a given threshold or “minimum support count”.

Implementation

The project is implemented in Java. It uses a Hashtable to keep count of itemsets in the database. The algorithm starts with finding the most frequent 1-item sets first. Then using the previous most frequent item list of item size k, it generates candidate item list of size k+1.

For example, for the sample data below and a threshold 4, 1,2,3,4,5 are all frequent 1-itemsets. From this list we generate a 2-item candidate list (all 10 combinations) and check if the subsets are also frequent. For 1-itemsets they are all frequent so they all pass pruning. Then we count the occurences of these candidates. Only 7 of them are equal to or greater than the threshold. From this list we generate our 3-item candidates. Such as 1,2 and 1,4 combined to generate 1,2,4. Then we count the occurrence of 3-itemsets and prune the results by checking all of its subsets.

Apriori Check Subsets

The idea of pruning is if the there are some infrequent subsets inside an itemset then the larger set cannot be frequent so it is removed from the candidate list. (1,2,3 is a candidate but as 2,3 is not a frequent 2-itemset it is removed from the candidate list) This process helps improve the performance of the algorithm as it reduced the number of iterations.

Output

1,3,4,5

1,2,4,5

1,2,4

1,3

3

1,5

1

3

1,3,5

4

1,2,4

2

1,2

3,4

3,5

1,3,4

2

3,5

1,2,3,4

Results:

Apriori Results

Source Code

I created a public repository for the source code. You can check it out here if it tickles your fancy!

Resources

misc review, gadget

Fitbit Flex

I bought this about 2 months ago. I wore it every single day since then and I just loved it! It is basically a motion sensor that detects and keeps track of your daily movements. You can set your own daily goals steps you walked, distance you took or calories you burnt.

I can keep track of distance by using my Garmin ForeRunner 10 (which I reviewed here) but this one is easier to use because it does everything in the background. Garmin takes a few minutes to start because it needs to find your GPS coordinates but that’s not the case for Flex.

Flex lets you keep track of distance, active minutes, calories and steps.

Fitbit Goals

Also you can log your weight, other exercises and food intake so that you can calculate the net calories throughout a time period.

Another great feature about it is tracking your sleep quality. You can use this data in conjunction with your daily activities.

Fitbit Sleep

And here is my favourite feature: Alarm! It turns out if something on your wrist start to vibrate you wake up. Instantly! Of course I keep my phone’s alarm still running as a fallback method but this one works pretty good.

Conclusion

I bought it for £68 and as of now it is listed as £83 on Amazon. Apparently the price fluctuates a bit but I think £70 – 80 price range is good for this product. I charge once every 3 days or so. Other than that I completely forget about it while it does its job in the background. It motivates you to reach your goals and be more active in general and the silent alarm is absolutely fantastic. I’d recommend this to anyone who would like to have more exercise.

Resources

misc review, book

Framework Design Guidelines

Fitbit Flex

Having a common framework is quite important to reduce code reuse. But designing that framework properly and consistently is a huge challenge. Even though we are living in a RESTful world now, I think having a framework or a set of common libraries for personal or commercial projects is still relevant. A well-designed well-tested framework would significantly improve any application built on top of it.

I had referred to this book partially before but this time decided to read it from cover to cover and make sure I “digest” all. It contains countless gems that every developer should know. Anyone developing even small libraries can benefit from this book a lot. You don’t need to design the .NET framework (like the authors). Also it comes with a DVD full of presentations of authors.

Companion DVD

Unfortunately I lost my DVD. Probably it’s inside one of the many CD cake boxes. I was hoping to check it out as I went along the book. But luckily I found out that it is freely available from the publisher. Check you the download link in the resources section. One thing to beware about the download is that you can come across another link in the Brad Abrams’s blog here. The download works fine but one of the presentations inside it is corrupted so I suggest you download each section separately from the site in the resources.

Some notes

The book is full of gems and very useful tips. Here are just a few:

  • Keep it simple: “You can always add, you cannot ever remove”
  • There is no perfect design: You always have to make some sacrifices and consider the trade-offs
  • Well-designed frameworks are consistent
  • Scenario-driven design: Imagine scenarios to visualize the real-world use of the API When in doubt leave the feature out, add it later. Conduct usability studies to get developers’ opinions
  • Keep type initialization as simple as possible. It lowers the barrier of entry.
  • Throw exceptions to communicate the proper usage of API. It makes the API self-documenting and supports the learning-by-doing approach.
  • Going overboard with abstractions may deteriorate the performance and usability of the framework.

Even though it’s been a few years since this books was released it is still a very helpful resource.

Resources