docker registry, self_hosted

Working with Docker is great but when you want to deploy your applications to another server you need a registry to push your images so that you can pull them from the other end. To have that capability in my dev environment I decided to setup my own self-hosted Docker registry.

For the sake of brevity, I will omit creating Raspberry Pi SD Card and installing Docker on it. There are lots of gret videos and articles out there already.

Self-hosted vs Hosted

When it comes to hosted Docker registries, there are lots of free and paid options.

Benefits of self-hosted registry:

  • See the storage size and number of repos required for your system early on without having to pay anything
  • Push/pull images on the go without Internet connection during development phase
  • No privacy concerns: If you upload your image with your application in it you may have some sensitive data inside the image which may pose a risk if the 3rd party registry has full access to them
  • Free!

Benefits of hosted registry:

  • Hassle-free: No backups or server management

There are great Docker registries such as Docker Hub and Amazon ECR. I wouldn’t recommend usign a self-hosted registry for production. But if the price or privacy is a concern it can certainly be an option.

Creating Self-Hosted Registry

It sounds like it requires installing a server application but the nice thing about Docker is, even it is a Docker registry it can run in a container itself. So first off we pull the registry repo for Docker Hub:

docker pull registry

Now let’s create a container that will act as our registry:

docker run -d -p 5000:5000 --restart always --name registry registry

In my case the hostname of the Raspbeery Pi is

Now to test how we can push and pull images let’s download Docker’s hello-world image from Docker Hub:

docker pull hello-world

Now to push this inot our own registry running in Raspbeery Pi all we have to do is tag it with the server URL such as:

docker tag hello-world HOBBITON.local:5000/hello-world

At this point if we take look at the images on our local machine we can see the hello-world image is duplicated.

Now let’s push it to Pi:

docker push HOBBITON.local:5000/hello-world

This doesn’t work because of the following reason:

This is because the reigstry is considered to be insecure and by default it’s rejected by the client. We can confirm it’s deemed to be insecure if we check the server by running the following command:

docker info

At the bottom of the bottom we can see the localhost registry is insecure:

To address this we can add this registry to the list of insecure registries. For example in a Mac client we add go to Preferences –> Daemon and add the Raspberry Pi registry as shown below:

After this, if we try once again to push to Pi we can see it succeded:

If you’re pulling from a client without a user interface, another Raspberry Pi for example, try the following:

sudo nano  /etc/docker/daemon.json

and add the following (with the correct registry name):

{ "insecure-registries":["myregistry.example.com:5000"] }

and restart Docker:

sudo service docker restart

Now if we check the repository list on the registry again we can see the hello-world image hosted on our Pi:

Let’s now see if we can pull this image from another client.

And after pulling the image we can see it in the image list:

Resources

dev csharp, elasticsearch, docker, nest

I’ve been playing around with Elasticsearch on several occasions. This post is to organize those thoughts and experiences and show an easy way to setup ElasticSearch and start playing around with it.

Setup

Easiest way to setup Elasticsearch locally is using Docker. As of this writing the latest version of Elasticsearch is 7.2.0 and I’ll be using that in this example:

If you don’t already have the image, simply pull from Docker hub:

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.2.0

For development environment suggested command to run a container is

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.2.0

which keeps it very simple and straightforward but in my workout I’d like to insert a whole bunch of data and run some queries on it and I don’t want to re-generate my data over and over again. So I decided to persist my data on host.

Persisting Elasticsearch Data

Instead of running containers one by one in the command line a better approach is to create a docker-compose.yml file file and use Docker compose to start services. I used the sample YAML file provided in official Elastic documentation

version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
    container_name: es01
    environment:
      - node.name=es01
      - discovery.seed_hosts=es02
      - cluster.initial_master_nodes=es01,es02
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
    container_name: es02
    environment:
      - node.name=es02
      - discovery.seed_hosts=es01
      - cluster.initial_master_nodes=es01,es02
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata02:/usr/share/elasticsearch/data
    networks:
      - esnet

volumes:
  esdata01:
    driver: local
  esdata02:
    driver: local

networks:
  esnet:

This example creates an Elasticsearch cluster with 2 nodes and uses named volumes to persist data so next time when we bring this cluster up again we should be able to continue where we left off data-wise.

Sample Application

In my previous blog post I developed a simple test data generator to generate fake bank statement data with a library called Bogus. In this project, I will use that generator to generate lots and lots of test data, insert them into Elasticsearch and have fun with it!

When you start a C# project and start looking for a library to interact with Elasticsearch, it’s a bit confusing to find out there are actually two of them: Elasticsearch.net and NEST. The gist of it is NEST is a high-level library that uses Elasticsearch.net under the hood. It also exposes low-level client so that it actually enhances Elasticsearch.net and allows using strongly typed DSL queries. In the sample application I used NEST.

Creating Elasticsearch client

Creating a client with some basic settings is straightforward:

using (var connectionSettings = new ConnectionSettings(new Uri("http://localhost:9200")))
{
    var settings = connectionSettings
        .DefaultIndex("bankstatementindex")
        .ThrowExceptions(true);
	IElasticClient elasticClient = new ElasticClient(settings);
}

Indexing data

To index a single document **IndexDocument** method can be called. However, using this method to loop through a large number of documents is not recommended.

elasticClient.IndexDocument<BankStatementLine>(testData.First());

For multiple documents, IndexMany method should be called. If the data size too large then using BulkAll method and BulkAllObservable helper is recommended.

To see the difference I created a test to index 5,000 documents with a looping over the array and using BulkAll after that. Looping over the collection took around 26 seconds whereas bulk index took only 1.2 seconds as shown in the screenshot.

Also it displays “Done” 5 times because I set the size to 1,000 and I requested 5,000 documents to be indexed so it automatically divided the load into 5 and made 5 calls:

var bulkAll = elasticClient.BulkAll(testData, x => x
                .BackOffRetries(2)
                .BackOffTime("30s")
                .RefreshOnCompleted(true)
                .MaxDegreeOfParallelism(4)
                .Size(1000));

bulkAll.Wait(TimeSpan.FromSeconds(60),
    onNext: (b) => { Console.Write("Done"); }
);

Same result can also be achieved by subscribing to BulkAll observer:

var waitHandle = new CountdownEvent(1);

bulkAll.Subscribe(new BulkAllObserver(
    onNext: (b) => { Console.Write("."); },
    onError: (e) => { throw e; },
    onCompleted: () => waitHandle.Signal()
));

waitHandle.Wait();

Showing progress

In the sample code below I showed displaying progress using onNext action delegate:

var testData = dataGen.Generate(statementConfig.StartDate, statementConfig.EndDate, statementConfig.OpeningBalance, statementConfig.DebitTransactionRatio, statementConfig.TransactionDateInterval, statementConfig.NumberOfStatementLines);
var cancellationToken = new CancellationToken();
var batchSize = 250;
var bulkAll = elasticClient.BulkAll(testData, x => x
    .BackOffRetries(2)
    .BackOffTime("30s")
    .RefreshOnCompleted(true)
    .MaxDegreeOfParallelism(4)
    .Size(batchSize), cancellationToken);
var totalIndexed = 0;
var stopWatch = new Stopwatch();
stopWatch.Start();
bulkAll.Wait(TimeSpan.FromSeconds(60),
    onNext: (b) =>
    {
        totalIndexed += batchSize;
        Console.WriteLine($"Total indexed documents: {totalIndexed}");
    }
);

and the output looked like this:

Even though the numbers seem a bit wonky I think it’s a good example to illustrate the multi-threaded nature of BulkAll. Because I set the maximum degree of paralleism to 4 and first 1,000 were inserted in a mixed order suggesting that they were running in parallel.

Cancellation with bulk operations

BulkAll observer can also be cancelled for longer processes if necessary. The code excerpt below shows the relevant pieces to cancellation

var cancellationTokenSource = new CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
var batchSize = 250;
var bulkAll = elasticClient.BulkAll(testData, x => x
    .BackOffRetries(2)
    .BackOffTime("30s")
    .RefreshOnCompleted(true)
    .MaxDegreeOfParallelism(4)
    .Size(batchSize), cancellationToken);
var totalIndexed = 0;
var stopWatch = new Stopwatch();
stopWatch.Start();
Task.Factory.StartNew(() =>
    {
        Console.WriteLine("Started monitor thread");
        var cancelled = false;
        while (!cancelled)
        {
            if (stopWatch.Elapsed >= TimeSpan.FromSeconds(60))
            {
                if (cancellationToken.CanBeCanceled)
                {
                    Console.WriteLine($"Cancelling. Elapsed time: {stopWatch.Elapsed.ToString("mm\\:ss\\.ff")}");
                    cancellationTokenSource.Cancel();
                    cancelled = true;
                }
            }

            Thread.Sleep(100);
        }
    }
);

try
{
    bulkAll.Wait(TimeSpan.FromSeconds(60),
        onNext: (b) =>
        {
            totalIndexed += batchSize;
            Console.WriteLine($"Total indexed documents: {totalIndexed}");
        }
    );
}
catch (OperationCanceledException e)
{
    Console.WriteLine($"Taking longer than allowed. Cancelled.");
}

Querying Data

Querying data can be done by calling Search method of ElasticsearchClient. Here’s a few examples below. There are more in the sample accompanying source code:

// Get the first 100 documents
var searchResponse = elasticClient.Search<BankStatementLine>(s => s
    .Query(q => q
        .MatchAll()
    )
    .Size(100)
);
// Get transactions with date between 01/01/2018 and 10/01/2018
var searchResponse = elasticClient.Search<BankStatementLine>(s => s
    .Query(q => q
        .DateRange(x => x
            .Field(f => f.TransactionDate)
            .GreaterThanOrEquals(new DateTime(2018, 01, 01))
            .LessThanOrEquals(new DateTime(2018, 01, 10))
        )
    )
    .Size(10000)
);

Deleting data

For my tests I had to delete all frequently and it can be achieved by running the query below:

elasticClient.DeleteByQuery<BankStatementLine>(del => del
    .Query(q => q.QueryString(qs => qs.Query("*")))
);

Source Code

Sample application can be found under blog/ElasticsearchWorkout folder in the repository.

Resources

dev csharp, fake, test, data

Generating high-quality test data can have an impact on the accuracy of the tests overall. In this post I’ll show using a helpful C# library called Bogus

Showcase project: Bank Statement Generator

In this example I’ll generate fake bank statements. Normally they come in CSV files and have the following model:

public class BankStatementLine
{
    public DateTime TransactionDate { get; set; }
    public string TransactionType { get; set; }
    public string SortCode { get; set; }
    public string AccountNumber { get; set; }
    public string TransactionDescription { get; set; }
    public decimal? DebitAmount { get; set; }
    public decimal? CreditAmount { get; set; }
    public decimal Balance { get; set; }
}

I’ll use Bogus to generate realistic fake statement lines and finally save it as a CSV and see if it looks real.

Rules and restrictions

I want the fields in the model above conform to certain set of rules to be realistic:

  • Transaction Date must be within a certain range I provide as bank statements are generated for a date range.
  • Dates should be incremental and not random
  • Sort Code must be in the following format: NN-NN-NN and must be the same for the entire statement.
  • Account number must be an 8-digit number and same for the entire statement.
  • Transaction Description must be free text
  • Debit Amount and Credit Amount must be decimal numbers but only one of them can be present at any given line
  • Transaction Type must be one of the pre-defined values and also some types can be for credit and some for debit only.
  • Balance should be sum of all debit and credit amounts plus the first balance in the statement. So this value is dependent on the values that come before it.
  • The number of lines in a statement should be random.

Rule implementations

Some rules stated above are very straightforward and easy to implement. These are some samples of what Bogus is capable of. For the full documentation check out the GitHub repository.

Date range support

Generating a date between a range is simple:

.RuleFor(x => x.TransactionDate, f => f.Date.Between(startDate, endDate))

Enum and array support

For Transaction Type I want to select a random value from a list of set values. This can be done in 2 ways: By using an enum or an IEnumerable.

var transactionTypes = new[] { "FPO", "DEB", "DB", "FPI" };

and in the rule description it can be used as

.RuleFor(x => x.TransactionType, f => f.PickRandom(transactionTypes) )

Another way is using enums such as:

public enum TransactionType
{
    FPO,
    DEB,
    DB,
    FPI
}

and the rule becomes:

 .RuleFor(x => x.TransactionType, f => f.PickRandom<TransactionType>().ToString())

In my final implementation I used selecting from a list of objects. You can check out the sample code to see that version.

Number range

For the account number I need an 8-digit number which can be achieved with something like this rule:

.RuleFor(x => x.AccountNumber, f => f.Random.Long(100000000, 99999999).ToString())

Bogus API also has builtin support for account number so the following is a more elegant and expressive way of achieving the same:

.RuleFor(x => x.AccountNumber, f => f.Finance.Account())

Formatting string

Formatting Sort Code can be achieved by Field.Random.Replace method

.RuleFor(x => x.SortCode, f => f.Random.Replace("##-##-##"))

Similar to account number, it also has built-in support for sort code:

.RuleFor(x => x.SortCode, f => f.Finance.SortCode())

Null values

In my case in some fields I’d like to have null values too. This can be achieved by OrNull extension method. For example, in the code below it generates %20 of DebitAmount values null.

.RuleFor(x => x.DebitAmount, f => f.Random.Decimal(0.00m, 9999.00m).OrNull(f, 0.2f))

Common fields

In my case some values in each statement line repeat throughout the entire statement such as account number and sort code. To achieve that I created a “base” statement line and every fake statement line used these shared fields instead of generating new ones.

var commonFields = new Faker<BankStatementLine>()
    .RuleFor(x => x.AccountNumber, f => f.Finance.Account())
    .RuleFor(x => x.SortCode, f => f.Finance.SortCode())
    .Generate();


var fakeTransactions = new Faker<BankStatementLine>()
    .StrictMode(true)
    .RuleFor(x => x.AccountNumber, commonFields.AccountNumber)
    .RuleFor(x => x.SortCode, f => commonFields.SortCode)
	...
	...

Random number of objects

It’s more realistic to have varying number of lines in statements. With Generate method you can specify the exact number of items you want to generate which is good for unit tests. For my purposes I just wanted to create random of rows in each statement as I only needed the data to be imported. This can be achieved by GenerateBetween:

var statementLines = fakeTransactions.GenerateBetween(10, 20);

Dependent values

The tricky part in this scenario was the dependent values. Normally when you use RuleFor extension method it generates the value for that field alone in isolation. In my case, one restriction was Debit Amount and Credit Amount could not both have values in the same line. Also Balance depends on these values and needs to be calculated in each line.

As far as I can tell there’s no built-in support to define these dependencies. Based on my tests I was able to achieve this in 2 ways

  1. Update the values accordingly in FinishWith extension method
  2. Use Rules extension method to define multiple rules at once and implement the restrictions inside it.

I think the latter is a better solution as FinishWith sounds more like clean up, logging or similar extra activity where Rules sound more like actual business logic implementation.

So with that in mind my rules for Debit Amount, Credit Amount and Balance fields looked like this:

.Rules((f, x) =>
{
    var debitAmount = (decimal?)f.Random.Decimal(1, 100).OrNull(f, 1.0f - statementconfig.DebitTransactionRatio);
    if (debitAmount.HasValue) // Is it a debit transaction?
    {
        x.CreditAmount = null;
        x.DebitAmount = debitAmount.Value;
        balance -= x.DebitAmount.Value;

        x.TransactionType = f.PickRandom(TransactionType.AllTransactionTypes
            .Where(tt => tt.Direction == TransactionDirection.Debit || tt.Direction == TransactionDirection.DebitOrCredit)
            .Select(tt => tt.Code));
    }
    else
    {
        var creditAmount = f.Random.Decimal(1, 100);
        x.DebitAmount = null;
        x.CreditAmount = creditAmount;

        balance += x.CreditAmount.Value;

        x.TransactionType = f.PickRandom(TransactionType.AllTransactionTypes
            .Where(tt => tt.Direction == TransactionDirection.Credit || tt.Direction == TransactionDirection.DebitOrCredit)
            .Select(tt => tt.Code));
    }

    x.Balance = balance;
});

A caveat with this approach is that I cannot use StrictMode anymore as it complains about those 3 fields having null values. It specifically mentions that in the exception. If you use Rules you’re on your own to ensure that all fields are populated properly.

Another drawback of setting multiple rules at once is that it can easily make the code harder to read. Fortunately for me, the author of the library Brian Chavez kindly reviewed the code and suggested some refactorings one of which proved it was still possible to use RuleFor method and strict mode. I’ve updated the final source code with these refactorings. So with individual rules the implementation looks like this:

.RuleFor(x => x.DebitAmount, f =>
{
    return (decimal?)f.Random.Decimal(1, 100).OrNull(f, 1.0f - statementconfig.DebitTransactionRatio);
})
.RuleFor(x => x.CreditAmount, (f, x) =>
{
    return x.IsCredit() ? (decimal?)f.Random.Decimal(1, 100) : null;
})
.RuleFor(x => x.TransactionType, (f, x) =>
{
    if (x.IsCredit())
    {
        return RandomTxCode(TransactionDirection.Credit); ;
    }
    else
    {
        return RandomTxCode(TransactionDirection.Debit);
    }

    string RandomTxCode(TransactionDirection direction)
    {
        return f.PickRandom(TransactionType.AllTransactionTypes
            .Where(tt => tt.Direction == direction || tt.Direction == TransactionDirection.DebitOrCredit)
            .Select(tt => tt.Code));
    }
})
.RuleFor(x => x.Balance, (f, x) =>
{
    if (x.IsCredit())
        balance += x.CreditAmount.Value;
    else
        balance -= x.DebitAmount.Value;

    return balance;
});

IsDebit and IsCredit methods referred to above are extension methods defined like this:

public static class Extensions
{
   public static bool IsCredit(this BankStatementLine bsl)
   {
      return bsl.DebitAmount is null;
   }
   public static bool IsDebit(this BankStatementLine bsl)
   {
      return !IsCredit(bsl);
   }
}

Random text

For the transaction description for now I’ll go with random Lorem Ipsum texts. Bogus has support for this too

.RuleFor(x => x.TransactionDescription, f => f.Lorem.Sentence(3))

I probably will need to use a fixed list of descriptions soon but for the time being it’s fine. Also as shown below it’s very easy to switch to that too.

Incremental values

Similar to balance being dependent on the previous values, transaction date is also dependent as it needs to go in an incremental fashion. I couldn’t find built-in support for this so implemented it using my own shared variable like this:

.RuleFor(x => x.TransactionDate, f =>
{
    lastDate = lastDate.AddDays(f.Random.Double(0, statementconfig.TransactionDateInterval));
    if (lastDate.Date > statementconfig.EndDate)
    {
        lastDate = statementconfig.EndDate;
    }
    return lastDate;
})

Putting It All Together

So let’s see the output with the help of another nice library called Console Tables

Source Code

Sample application can be found under blog/GeneratingTestDataWithBogus folder in the repository.

Resources