dev nosql

In this post we are diving into coding and developing a small application using the beer sample database that ships with Couchbase 2.0.

Environment Setup

To develop a .NET application with a Couchbase backend, we need the Couchbase .NET SDK. The current version as of this writing can be downloaded from here. But the best way to get it using Nuget. Using the SDK is fairly simple. It comes with a main class called *CouchbaseClient. *All operations are performed using this class.

Connecting to server

The first step is connecting to the server and the easiest way to do is using the configuration file.

<configuration>
    <configsections>
        <section name="couchbase" type="Couchbase.Configuration.CouchbaseClientSection, Couchbase" />
    </configsections>
    <couchbase>
        <servers bucket="beer-sample" bucketpassword="">
            <add uri="http://192.168.1.111:8091/pools/" />

            <add uri="http://192.168.1.112:8091/pools/" />

        </servers>
    </couchbase>
</configuration>

As you can see from the configuration section, if you have multiple nodes in the cluster just add their URIs to the servers list. Once IPs and the bucket and the password are specified we are done. We don’t need to explicitly connect to the database, we can just create a new client instance and start calling methods

using (CouchbaseClient client = new CouchbaseClient())
{
    // DB operations go here

}

Basic Operations

OK so far so good. We are connected to the server without a hassle. As there is already data in the server let’s get some sample data from the database. As the database is a key/value store we can add any type of data we want to. We can create our JSON objects in a string and insert/update data with it. But most likely we want to use our domain objects instead of manipulating raw JSON. There are 2 things to consider here. Once we tackle those issues the rest is quite easy:

  1. Mark your objects as Serializable: This is required to persist any object. Once you make the class serializable you can run CRUD operations on it.
  2. The default serializer is binary serializer. That means when you store an object using by calling Store method you will get something like this when you try to view the object:

Beer Binary

This is not too helpful. We cannot read and index. So we’d rather store it in JSON format. Luckily StoreJson method comes to rescue. The following code produces the result below which is exactly what we wanted. To map the key’s in JSON object to the properties in our class we use JsonProperty attribute in the Newtonsoft.Json library which is used the SDK itself.

Beer JSON Code

Beer JSON Output

Store and StoreJson methods accept an argument of type StoreMode. The values of StoreMode are Add, Set and Replace. Add is used to create a new record (INSERT), Replace is used to update an existing record (UPDATE). Set adds the record if it doesn’t exist and updates it if it exists (MERGE – but simpler). To delete an object we call the Remove method with the objects key as argument. So basically we perform CRUD operations with Get/GetJson, Store/StoreJson and Remove methods.

Querying database with views

Views in Couchbase 2.0 are functions written in JavaScript that use a technique called Map/Reduce. Map/Reduce is a complex topic that I have not fully covered yet but basically it’s a method for processing large data sets in a distributed environment. It is developed by Google. It involves 2 functions called map and reduce. The map function filters entries for certain information and can extract information. The result of a map function is an ordered list of key/value pairs called an index. The results of map functions are stored in disk by the Couchbase server. Reduce function is optional and can be used to perform sum, aggregate or similar calculations on the output of map function. Views can be grouped in design documents which can be associated with a bucket. I consider them as namespaces. Couchbase Server offers two kinds of views: Development and Production. As creating a view means creating an index. it may incur some overhead on the performance of the system. So development views are handy to fully test before publishing to production environment. Also production views cannot be edited via admin console which forces the developer to develop and test the view in development environment first. So to demonstrate what they look like let’s examine the view that returns all the breweries.

Beer_View_MapFunc

We have 2 types of objects in the database (beer and brewery). This function only emits the objects that are of type brewery.

Demo

So all this theory means nothing if we don’t put it into good use. You can get source code of the sample application (I call it Beer Explorer) from my Github account. Also if you want to see what it looks like before diving into the code I host a live version here: http://beerexplorer.me. Feel free to play with it.

dev nosql

In this post, I’ll talk about some technical details and terminology of Couchbase. The official documentation is very comprehensive and I highly recommend taking a look at it: http://www.couchbase.com/docs/

Installation

First of all I recommend you check the supported OS list here. I tried to install it on Windows 8 but turns out it’s not supported yet. Then I installed it on Windows Server 2008 R2 and a Ubuntu Server 12.10. You can find Linux installation instructions here.

Installation is quite easy. There are a few things that need to paid attention though.

  1. File locations: Actually this step is very easy, just accept the default location. But Couchbase recommends storing document and index data on different disks to get the best performance,
  2. Memory Size: First node in the cluster determines the quota and that value is inherited to the following nodes. To update it, on the management console, select Data Buckets and click on the arrow on the left of the bucket name. Then by clicking on Edit you can change this value.
  3. Bucket Type: memcached and Couchbase bucket types are significantly different so you have to choose carefully. memcached buckets don’t support persistence nor replication. They are meant to be an in-memory caching solution.
  4. Bucket Name: During setup you cannot change the name of the default bucket. Couchbase recommends to use it for testing purposes only. So it’s best to create your own bucket  for the actual data once the installation is over.
  5. Flush: This is a very dangerous operation. It allows you delete all the data in a bucket. Default is disabled and I’d recommend to keep it that way.

Basic concepts

  • A couch database is called a bucket.
  • A document is a self-contained piece of data. It is a JSON object. A row in a RDBMS would be stored in a document with all the data it’s related to. (i.e: A customer record may contain a list of orders). This approach is called Single-Document approach and the document is called an aggregate. More about it in Modeling Documents section later in this post. A new feature that came with v2.0 is these records can be indexed and queried.
  • vBucket is short for “Virtual Bucket” and they work functionally equivalent to database shards in traditional relational databases. Good news is that Couchbase will automatically manage vBuckets.
  • XDCR stands for Cross Data Center Replication. It’s a very cool feature that can be used in a multiple of scenarios such as spreading data geographically or creating an active offsite backup.

Modelling Documents: has-many vs. belongs-to

The way we model data should depend on the structure and nature of the data. There are two approaches when modelling the data. has-many means storing all the child records with the parent. For example a standard Customer – Order relation could be expressed like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
    "orders": [ "order1", "order2", "order3" ]
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent"
}

The Customer stores the IDs of the orders. This method can be problematic if the parent (Customer in this example) is updated frequently. As orders can be accessed via customer this will effect the overall query performance. belongs-to approach suggests approaching it from the other direction. If we modeled the above example with belongs-to approach we would come up with something like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent",
    "customerId": 123
}
{
    "id": "order2",
    "orderDate": "2012-12-10",
    "status": "pending",
    "customerId": 123
 }

This is preferable to avoid contention. With this method we need to use indexing to be able query all orders by customerId. has-many approach performs better because a multiple-retrieve query is faster than indexing and querying.

Backup and Restore

Before diving into playing with the data it’s always a good practice to backup the original data. Couchbase provides 2 options to accomplish these tasks:

  1. Good ol’ file copy Copy the data files stored under the default path (which is “C:\Program Files\couchbase\server\var\lib\couchbase\data” for Windows). The disadvantage of this method is that it can only be restored to offline nodes in an identical cluster environment. Also database is not compressed.

  2. cbbackup / cbrestore These tools can be found in the bin folder.

Couchbase_Backup

I think a slight disadvantage is that you have to specify password in clear text in the command line. I was expecting just providing –p parameter would end up it asking me the password after I enter the command. Instead I got an error saying the password cannot be empty.

Couchbase_Restore

Advantages are that it allows a backup to be restored onto a different size and configuration. Also it compresses the data so it’s disk-space friendly.

Tip: When specifying the backup path to cbrestore make sure to remove the trailing backslash from the path.   In the next instalment of this series I’ll post a sample application using the Beer sample database that is shipped with Couchbase 2.0

hobby raspberry_pi

It is world famous now. It is a dirt cheap ARM-based computer running Linux. Just bought one for myself. I installed Raspbian Wheezy which can be downloaded from here: http://www.raspberrypi.org/downloads. It is the recommended download for newbies so I went straight to it. I used Win32DiskImager and formatted an SD card. Installed it on the Raspberry Pi and it was good to go.

I definitely recommend buying a case which makes it a lot more fun to play with it. I also bought a 3.5” display. I think small screen goes well with the small device. If I’m going to plug something in to my 23” LED monitor, I’d prefer it to be my desktop. The display I bought can be found on Amazon. It doesn’t come with a power supply so you also have to buy a 12V – 2A DC power supply. I also needed a male – male RCA cable to connect the display to the Pi.

The result is the smallest computer I have ever had:

Raspberry Pi

I hope I can do something useful with it too.