Introduction to Couchbase - Part 2 - Playground for the mind

In this post, I’ll talk about some technical details and terminology of Couchbase. The official documentation is very comprehensive and I highly recommend taking a look at it: http://www.couchbase.com/docs/

Installation

First of all I recommend you check the supported OS list here. I tried to install it on Windows 8 but turns out it’s not supported yet. Then I installed it on Windows Server 2008 R2 and a Ubuntu Server 12.10. You can find Linux installation instructions here.

Installation is quite easy. There are a few things that need to paid attention though.

File locations: Actually this step is very easy, just accept the default location. But Couchbase recommends storing document and index data on different disks to get the best performance,
Memory Size: First node in the cluster determines the quota and that value is inherited to the following nodes. To update it, on the management console, select Data Buckets and click on the arrow on the left of the bucket name. Then by clicking on Edit you can change this value.
Bucket Type: memcached and Couchbase bucket types are significantly different so you have to choose carefully. memcached buckets don’t support persistence nor replication. They are meant to be an in-memory caching solution.
Bucket Name: During setup you cannot change the name of the default bucket. Couchbase recommends to use it for testing purposes only. So it’s best to create your own bucket for the actual data once the installation is over.
Flush: This is a very dangerous operation. It allows you delete all the data in a bucket. Default is disabled and I’d recommend to keep it that way.

Basic concepts

A couch database is called a bucket.
A document is a self-contained piece of data. It is a JSON object. A row in a RDBMS would be stored in a document with all the data it’s related to. (i.e: A customer record may contain a list of orders). This approach is called Single-Document approach and the document is called an aggregate. More about it in Modeling Documents section later in this post. A new feature that came with v2.0 is these records can be indexed and queried.
vBucket is short for “Virtual Bucket” and they work functionally equivalent to database shards in traditional relational databases. Good news is that Couchbase will automatically manage vBuckets.
XDCR stands for Cross Data Center Replication. It’s a very cool feature that can be used in a multiple of scenarios such as spreading data geographically or creating an active offsite backup.

Modelling Documents: has-many vs. belongs-to

The way we model data should depend on the structure and nature of the data. There are two approaches when modelling the data. has-many means storing all the child records with the parent. For example a standard Customer – Order relation could be expressed like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
    "orders": [ "order1", "order2", "order3" ]
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent"
}

The Customer stores the IDs of the orders. This method can be problematic if the parent (Customer in this example) is updated frequently. As orders can be accessed via customer this will effect the overall query performance. belongs-to approach suggests approaching it from the other direction. If we modeled the above example with belongs-to approach we would come up with something like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent",
    "customerId": 123
}
{
    "id": "order2",
    "orderDate": "2012-12-10",
    "status": "pending",
    "customerId": 123
 }

This is preferable to avoid contention. With this method we need to use indexing to be able query all orders by customerId. has-many approach performs better because a multiple-retrieve query is faster than indexing and querying.

Backup and Restore

Before diving into playing with the data it’s always a good practice to backup the original data. Couchbase provides 2 options to accomplish these tasks:

Good ol’ file copy Copy the data files stored under the default path (which is “C:\Program Files\couchbase\server\var\lib\couchbase\data” for Windows). The disadvantage of this method is that it can only be restored to offline nodes in an identical cluster environment. Also database is not compressed.
cbbackup / cbrestore These tools can be found in the bin folder.

Couchbase_Backup

I think a slight disadvantage is that you have to specify password in clear text in the command line. I was expecting just providing –p parameter would end up it asking me the password after I enter the command. Instead I got an error saying the password cannot be empty.

Couchbase_Restore

Advantages are that it allows a backup to be restored onto a different size and configuration. Also it compresses the data so it’s disk-space friendly.

Tip: When specifying the backup path to cbrestore make sure to remove the trailing backslash from the path. In the next instalment of this series I’ll post a sample application using the Beer sample database that is shipped with Couchbase 2.0