xmlprogrammingcsharp

Even though it’s a noisy data format it’s still commonly used and I happen to end up in situations that I need to use .NET to serialize and deserialize to and from XML documents. Here are a few problems that I had to tackle in the past. All the sample source code can be found in a GitHub repository.

01. Skip serializing unassigned values

Let’s assume we have a hypothetical Player class that has a few fields that looks like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }
    public int YearEstablished { get; set; }
}

So it contains both value and reference fields.

Now let’s create two random players and serialize them:

XmlSerializer serializer = new XmlSerializer(typeof(List<Player>));
Player player1 = new Player() { Id = 1, FirstName = "John", LastName = "Smith", TotalGoalsScored = 50, AverageGoalsPerGame = 0.7, Team = new Team() { Name = "Arsenal" } };
Player player2 = new Player() { Id = 2, FirstName = "Jack" };
using (StringWriter writer = new StringWriter())
{
    serializer.Serialize(writer, new List<Player>() { player1, player2 });
    Console.WriteLine(writer.ToString());
}

This code yields the following XML:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</ArrayOfPlayer>

The thing to note here is that reference types were not serialized when they were unassigned, i.e Team object and LastName field in Player2. But same didn’t go for values fields. TotalScore, AverageScorePerGame and YearEstablished fields were serialized as 0. Of course this might be the desired outcome depending on your business requirements but in my case I didn’t want this because it might mislead the client consuming this data. At the very least I find it inconsistent as some unassigned values are serialized and some aren’t.

So to change the behaviour all we have to do is set the DefaultValue attribute for the numeric values like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    [DefaultValue(0)]
    public int TotalGoalsScored { get; set; }

    [DefaultValue(0)]
    public double AverageGoalsPerGame { get; set; }
    
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }

    [DefaultValue(0)]
    public int YearEstablished { get; set; }
}

With this change the output becomes:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
  </Player>
</ArrayOfPlayer>

So as the int and double values defaulted to 0 and we explicitly set the default value they won’t be serialized unless they are assigned a value other than zero.

In case you are wondering making the int and double nullable doesn’t produce the same result. In that case they are serialized with null values:

<TotalScore xsi:nil="true" />
<AverageScorePerGame xsi:nil="true" />

I think this if-you-set-it-you-get-it-back approach is consistent and makes most sense to me. I created a test project to fiddle with these classes. It has all 3 versions of the classes under different names and a console application displaying the outputs. If you want to play around you can get the source code here

02. Deserialize straight to list

Sometimes what you get in an XML document is just a list of items and the list is just a container so that the XML is well-formed. For example in the following example we have a list of players:

<PlayerList>
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</PlayerList>

The sole purpose of PlayerList tag is to act as root and contain multiple objects. Other than that it has no function. When we deserialize this to C# objects we would normally need 2 objects like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

and

[XmlRoot]
public class PlayerList
{
    [XmlElement("Player")]
    public List<Player> Players { get; set; }
}

and we can get the list of objects by:

string inputXmlPath1 = @".\InputXml.xml";
using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(PlayerList));
    PlayerList playerList = (PlayerList)playerListSerializer.Deserialize(reader);
}

In such cases I generally tend to eliminate the “middle man”. I don’t want a container class which only holds a List. So I’d like to deserialize this XML directly into List.

What I want to do is actually this:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}

But without any modifications to our classes it throws an exception:

By eliminating PlayerList class we actually stopped providing XmlRoot info to the serializer. But that can quickly be remedied by using a constructor overload of XmlSerializer:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>), new XmlRootAttribute("PlayerList"));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}     

This works when the class name matches the XmlElement name. If you need to customize your class’s name (like FinalPlayer in my example) you need to decorate the class with XmlType and supply the element name so that the serializer can do the mapping.

[XmlType("Player")]
public class FinalPlayer
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

So now we can have any class name mapping to corresponding elements and deserialized straight to a List.

03. Remove namespaces

I know using namespaces is considered a good practice as it helps avoid name conflicts but honestly I never suffered from such a problem before so I don’t mind removing them from my XML documents and clean the clutter. XML is already a noisy data format no need to bloat it any further. I think it might help when you are working with data from different sources but if you are only working with your own classes and data structures name conflict is generally not something to worry about (assuming you name your objects properly)

So say you have a simple Player class and you serialize it with a out-of-the-box XmlSerializer:

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = true, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
serializer.Serialize(writer, player);
Console.WriteLine(output.ToString());

This yields the following output:

<Player xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://
www.w3.org/2001/XMLSchema">
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

So in order to get rid of namespaces we have to specify our custom namespaces, which is empty in this case and use the overloaded XmlSerializer constructor to pass it in:

XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);

to get the XML without any namespaces:

<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

04. Change output encoding

When you serialize with the default options and with the XML declaration the encoding is UTF-16. Oddly enough there is no option to specify the output encoding. In order to achieve that and sometimes you may need to change it. For example a 3rd party was expecting UTF-8 in my case so the default value didn’t cut it for me.

So using the same Player class from the last example the following code produces an output with UTF-16

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(output.ToString());

Output:

<?xml version="1.0" encoding="utf-16"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

I found the solution here on StackOverflow.

So the solution is to extend from StringWriter and add a new constructor that accepts Encoding. StringWriter already has an Encoding property as shown below, but unfortunately it doesn’t have a public setter so we need a subclass to fiddle with it.

StringWriterWithEncoding simply overrides th encoding field:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding(Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

By using the new class the following code produces the desired outcome:

StringWriterWithEncoding utf8StringWriter = new StringWriterWithEncoding(Encoding.UTF8);
Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
XmlWriter writer = XmlWriter.Create(utf8StringWriter, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(utf8StringWriter.ToString());
Console.ReadLine();

Output:

<?xml version="1.0" encoding="utf-8"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

05. Fail deserialization when unexpected elements are encountered

By default XMLSerializer is very fault tolerant. It just salvages whatever it can and leaves alone the unmatching values. Sometimes you may need to be stricter. For example I had a case when the external source returned a whole different XML when there was an error on its end. So when that happened I wanted to be notified about it instead of getting null objects quietly.

For example, assume we have the usual PlayerList that we are deserializing to List. If for some reason we get a weird Player list like this:

<PlayerList>
  <Customer>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
  </Customer>
</PlayerList>

When we deserialize it with the following code block

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);    
    }
}

XmlSerializer doesn’t complain at all. Instead it just returns an empty list because it cannot find any Player objects. In order to change this behaviour we can use the events it exposes such as UnknownNode, UnknownElement, UnknownAttribute. UnknownNode is just the combination of the first two events. In my case I didn’t want to be too strict so I didn’t want an exception in case of a missing attribute but hooked into the UnknownElement event:

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        serializer.UnknownElement += serializer_UnknownElement;
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

and added the event handler:

void serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
    throw new ArgumentException(string.Format("Unknown element: {0}", e.Element.LocalName));
}

So now at least I can distinguish a weird list from a really empty list.

Resources

programmingdebug

Even though APIs are RESTful these days sometimes you might need to interact with a SOAP-based web service.

I had to consume an SOAP XML web service the other day and encountered a strange problem while debugging my test application. The application was working fine but when I tried to debug it was closing silently. So as a first step I opened Debug –> Exceptions and checked all to let the application break upon any type of exception it gets.

Break when any exception is thrown

After running the application with this setting at least I was able to see what the fuss was all about:

SOAP BindingFailure exception

There are various approaches to resolve this issue and finally I found the correct one on a Stackoverflow answer: Go to Project Properties and in the Build tab turn on Generate Serialization Assembly setting.

Turn on Generate Serialization Assembly setting

When this setting it turned on, it generates an assembly named {Your Application Name}.XmlSerializers.dll.

Out of curiosity I peeked into the assembly with ILSpy and it looks like this:

XmlSerializers.dll in ILSpy

Basically it just generates an abstract class deriving from XmlSerializer (named XmlSerializer1) and generates a bunch of sealed child classes deriving from that class.

I’ve never been a fan of auto-generated code anyway and looks like I’m not going to need it in my code but it’s used in the background by the framework. I added links to a few Stackoverflow answers related to that assembly though. I gather the advantage of turning it on is reduced startup time and being able to debug in this case. The disadvantage is increased deployment size which I think is negligible in this day and age so I’ll just keep it on and debug happily ever after!

Resources

gadgetraspberry piawsroute53development

Everything is *-as-a-service nowadays and books are no exception. I have a Safari Books Online subscription which I enjoy a lot. It is extremely convenient to have thousands of books at your fingertips. But… DIY still rules! There are times you may still want to have your own collection and it doesn’t just have to be an e-book collection. And on top of all it’s purely fun and educational.

Ingredients

Basically all you need is an up-and-running Raspberry Pi. If you have one, you can skip this section. These are just the components I used in my implementation:

Keyboard and display are needed for the initial setup. Once it’s connected to network you can do everything over SSH.

Calibre, my old friend!

I’ve been a fan of Calibre for many years now. With Calibre you can manage any document collection you want. I love its user interface which allows me to easily tag and categorize my collections. Also it can convert between a wide range of formats. And when I plug in my Kindle it automatically recognizes the device and I can right-click on a book and send to device very easily. Check out this page for a full list of features.

My favorite feature is that it can act as a server. I mainly use Stanza on my iPad and connect to my Calibre server to download comic books over WiFi. The downside of running it locally on my computer is that the machines needs to be on and I have to enable the content server on Calibre manually before connecting from iPad.

Here comes the project

Instead, what I’d like to have is

  • An online server available all the time: Raspberry pi is very power-efficient little monster so I can keep it running

  • Isolated from my main machine: For security reasons I don’t want to open a port on my desktop computer

  • Accessible both from inside and outside: Of course I could just launch a cheap AWS instance and use it as the content server but

    • It’s not as fun!
    • If I need to move around GBs of data local network rocks!

Also, as I said it’s mostly for fun so I don’t have to justify it completely to myself :-)

Roll up your sleeves!

Step 0: Setup Raspberry Pi

If you haven’t done it already you can easily set it up by following the Quick Start Guide on raspberrypi.org

Step 1: Install Calibre on Raspberry Pi

This one was a bit harder than I expected. The official site says “Please do not use your distribution provided calibre package, as those are often buggy/outdated. Instead use the Binary install described below.”

and the suggested command is

sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"

Alas, after running the command I got the following error:

Calibre installation error

I asked in the Calibre forums about the error and I was advised to build it from source code. Because the compiled version is for Intel processors and it doesn’t work on an ARM processor which Raspberry Pi has. The instructions for building it from source is on the same page but I haven’t tried it myself.

As a fallback method I simply used apt-get to install it:

sudo apt-get update && sudo apt-get install calibre

It worked fine but the version is 0.8.51 (latest release at the time of this writing is 2.20.0 so you can see it’s a little bit outdated!). Content server has been implemented long time ago so for all intents and purposes it’s good enough for this project.

Step 2: Run it as server

Now that we have Calibre installed we can run the content server from command line:

calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

This will run the process in the background (because of the –daemonize flag) but id the Pi restarts it will not run automatically. To fix that I added the command to crontab by first entering the following command

crontab -e

and adding the following line after the comments

@reboot calibre-server --with-library=/home/pi/calibre/myLibrary --daemonize

so that the same command is run after every reboot.

Crontab on Raspberry Pi

Now let’s test if we’re online. By default, Calibre starts serving on port 8080 with no authentication required. So just find the IP address of the Raspberry Pi and try to connect it over HTTP from your machine such as http://{Local IP}:8080

and voila!

Now we can add some books and start using it from any machine on the network.

Step 3: Add some books

First I uploaded some files to a different folder using WinSCP. If you are not on Windows I’m sure you can find a similar tool to transfer files to Raspberry Pi.

We can add books by using calibredb command like this:

calibredb add Raspberry_Pi_Education_Manual.pdf --with-library=/home/pi/calibre/myLibrary

Please note if you try to use calibre instead of calibredb you’d get the following error:

calibre: cannot connect to X server 

Because we are using the GUI we cannot use calibre directly, instead we add it using calibredb.

Calibre always copies the files to its own library so once the books are added you can delete the original ones.

After the files are added refresh the page and you should get something like this:

At this point we can download the books on any machine on the local network.

Step 4: Connect from clients

  • Kindle

Kindle has an experimental browser (I currently have a Paperwhite, I don’t know about the newer versions). So to download books, I simply go to Settings -> Experimental Browser and enter the URL of my content server (http://{Local IP}:8080):

And after you download the file you can go to home and enjoy the book on your Kindle.

Please note that Kindle cannot download PDFs. When I tried to download Raspberry Pi manual I got the following error

Only files with the extension .AZW, .PRC, .MOBI or .TXT can be downloaded to your Kindle.

So make sure you upload the the right file formats.

  • iPad / Stanza

This is my favorite app on iPad. It’s not even on AppStore anymore but I still use it to download books from Calibre.

All I had to do was click Get Books and it found the OPDS server on the network automatically so I could browse and download books right away.

Stanza

  • iPad / Web

Alternatively you can just browse to server and open it with any eBook reader app available on your iPad.

Calibre UI on iPad

[Optional] Step 5: Setup Port Forwarding

For internal usage we are done! If you want to access your library over the Internet you have to define port forwarding rule. The way to do it is completely dependant on your router so you have to fiddle with your router’s administration interface.

Basically you map an external port to an internal IP and port.

For example I mapped port 7373 to local 192.168.175:8080 so whenever I connect to my home network’s external IP on port 7373 I get my Calibre user interface.

I recommend running the server with –username and –password flags so that only authenticated users can browse your library.

[Optional] Step 6: Setup Dynamic DNS

If you have a static IP you don’t need this step at all but generally personal broadbands don’t come with static IPs. My solution for this was using AWS Route 53 and updating the DNS using AWS Python SDK (Boto).

First I had to install pip to be able to install boto

sudo apt-get install python3-pip

Then boto itself

sudo pip install boto

I created an IAM user that only has access to a single domain which I use for this kind of stuff on Route 53 and added its credentials to the AWS credentials file as explained in the nice and brief tutorial here

The script calls AWS’s external IP checker and stores it in currentIP. Then gets the hosted zone and loops through all the record sets. When it finds the subdomain I’m going to use for Calibre (‘calibre.volki.info.’) it updates the IP address with the currentIP and commits the changes. Thanks to AWS that’s all it takes to create a Dynamic DNS application.

import boto.route53
import urllib2
currentIP = urllib2.urlopen("http://checkip.amazonaws.com/").read()

conn = boto.connect_route53()
zone = conn.get_zone("volki.info.")
change_set = boto.route53.record.ResourceRecordSets(conn, '{HOSTED_ZONE_ID}')

for rrset in conn.get_all_rrsets(zone.id):
    if rrset.name == 'calibre.volki.info.':
        u = change_set.add_change("UPSERT", rrset.name, rrset.type, ttl=60)
        rrset.resource_records[0] = currentIP
        u.add_value(rrset.resource_records[0])
        results = change_set.commit()

Of course this script needs to be added to crontab and should be run every 5-10 minutes. If the external IP changes there might be some disturbance to the service but it should just take a few minutes before it’s resolved.

With this script now we can access our library over the Internet and we don’t have to worry about changes in the IP address.

Resources