Disassembling And Decompiling .NET Assemblies

July 22, 2015 dev csharp

Generally we don’t need to understand how every single line of code is compiled but sometimes it helps to have a look at what’s going on under the hood. The code we write doesn’t always translate directly to IL (Intermediate Language) by the compiler (assuming optimizations come into play). It also helps what really happens when we use shorthand notations in the language. So I decided to explore the IL and see how simple constructs are compiled. Some of them are very simple and commonly known but some of them were new to me so compiled all of them in this post.

Toolkit

I used a disassembler and decompilers to analyze the output. When we compile a .NET program, it’s converted into IL code. Using a disassembler we can view the IL. Decompiling is the process of regenerating C# code. It’s hard to read IL so getting the C# code back helps to analyse easier. I used the following tools to view the IL and C# code:

ILDASM: Main tool I used is ILDASM (IL Disassembler). It’s installed with Visual Studio so doesn’t require anything special. It’s hard to make sense of IL but generally helpful to get an idea about what’s going on.
Telerik JustDecompile: Decompiler takes an assembly and re-generates .NET code from it. It may not fully return the original source code and sometimes it optimizes stuff so it may skew the results a little bit. But it’s generally helpful when you need to see a cleaner representation
ILSpy: Same functionality as JustDecompile. I used it to compare the decompiled results.

So without further ado, here’s how some language basics we use are compiled:

01. Constructor

When you create a new class the template in Visual Studio creates a simple blank class like this:

namespace DecompileWorkout
{
    class TestClass
    {
    }
}

and we can create instances of this class as usual:

var newInstance = new TestClass();

It works even though we don’t have a constructor in our source code. The reason it works is that the compiler adds the default constructor for us. When we decompile the executable we can see it’s in there:

But when we add a constructor that accepts parameters the default constructor is removed:

It makes sense otherwise all classes would have parameterless constructor no matter what you do.

02. Properties

Properties have been supported since forever and they are extremely handy. In the olden days we used to create a field and create getter and setter methods that access that field such as:

private string _name;
public void SetName(string name)
{
    _name = name;
}
public string GetName()
{
    return _name;
}

Now we can get the same results by

public string Name { get; set; }

What happens under the hood is the compiler generates the backing field and getter/setter methods. For example if we disassemble the class with the above we’d get something like this:

Apart from the weird naming (I guess it’s to avoid naming collisions) it’s exactly the same code.

The backing field is only generated when this shorthand form is used. If we implement the getter/setter like this

public string Name 
{
    get
    {
        throw new NotImplementedException();
    }
    set
    {
        throw new NotImplementedException();
    }
}

Then the generated IL only contains the methods and not the backing field

One last thing to note is that we cannot have fields on interfaces but we are allowed to have properties. It’s because when used on interfaces the compiler again generates only the getter and setter methods.

03. Using statement

A handy shortcut to use when dealing with IDisposable objects is the using statement. To see what happens when we use using I came up with this simple code:

After decompiling:

So basically it just surrounds the code with a try-finally block and calls Dispose method of the object if it’s not null. Instead of writing this block of code it’s very handy to use the using statement. But at the end there is nothing magic about it, it’s just a shortcut to dispose objects securely.

04. var keyword and anonymous types

var keyword is used to implicitly specify a type. The compiler infers the type from the context and generates the code accordingly. Variables defined with var keyword are still strongly typed because of this feature. Initially I resisted using it as a shortcut because it’s main purpose is not to provide the luxury not to create implicitly typed variables. It’s introduced at the same time as anonymous types which makes sense because without having such a keyword you cannot store anonymous objects. But I find myself using it more and more for implicitly specifying the types. Looking around I see that I’m not the only one so if it’s just laziness at least I’m not the only one to blame!

So let’s check out how this simple code with one var and an anonymous type:

public class UsingVarAndAnonymousTypes
{
    public UsingVarAndAnonymousTypes()
    {
        var someType = new SomeType();
        var someAnonymousType = new { id = 1, name = "whatevs" };
    }
}

Decompiling the code doesn’t show the anonymous type side of the story but we can see that var keyword is simply replaced with the class type.

To see what happens with anonymous types let’s have a look at the IL:

For the anonymous type the compiler created a class for us with a very user-friendly name: <>f__AnonymousType0`2. It also generated readonly private fields and only getters that means they are immutable and we cannot set those values once the object is initialized.

06. Async and await

Async/await helps asynchronous programming much easier for us. It’s hiding a ton of complexity and I think it makes sense to invest some time into investigating how it works to use it properly. First of all, marking a method async doesn’t make everything asynchronous automagically. If you don’t use await inside the function it will run synchronously (the compiler will generate a warning about it). When you call an async method the execution continues without blocking the main thread. If it’s not awaited it returns a Task instead of the actual expected result. This means it's an ongoing process and it will run in the background without blocking the main flow. This is great so you can fire up a bunch of tasks you need and then wait for the results whenever you need all of them. For example:

public async void DownloadImageAsync(string imageUrl, string localPath)
{
    using (var webClient = new WebClient())
    {
        Task task = webClient.DownloadFileTaskAsync(imageUrl, localPath);

        var hashCalculator = new SynchronousHashCalculator();

        // Do some more work while download is running

        await task;
        using (var fs = new FileStream(localPath, FileMode.Open, FileAccess.Read))
        {
            byte[] testData = new byte[new FileInfo(localPath).Length];
            int bytesRead = await fs.ReadAsync(testData, 0, testData.Length);
            
            // Do something with testData
        }
    }
}

In this example, a file is downloaded asynchronously. The execution continues after the call has been made so we can do other unrelated work while the download is in progress. Only when we need the actual value, to open the file in this instance, we await the task. If we didn’t await we would try to access the file before it’s completely written and closed resulting in an exception.

One confusing aspect maybe calling await on the same line as in fs.ReadAsync in the above example. It looks like it’s synchronous as we are still waiting on the same line but the difference is the main thread is not blocked. If it’s a GUI-based application for exmaple it would still be responsive.

So let’s decompile the assembly to see how it works behind the scenes. This is one of the times I like having redundancy. Because Just Decompile failed to generate C# code for the class using async/await so I had to switch to ILSpy.

It generates a struct implementing IAsyncStateMachine interface. The “meaty” part is the MoveNext method:

It’s very hard to read compiler-generated code but I think this part is interesting:

switch (this.<>1__state)
{
case 0:
	taskAwaiter = this.<>u__$awaiter7;
	this.<>u__$awaiter7 = default(TaskAwaiter);
	this.<>1__state = -1;
	break;
case 1:
	goto IL_102;
default:
	this.<task>5__2 = this.<webClient>5__1.DownloadFileTaskAsync(this.imageUrl, this.localPath);
	this.<hashCalculator>5__3 = new SynchronousHashCalculator();
	taskAwaiter = this.<task>5__2.GetAwaiter();
	if (!taskAwaiter.IsCompleted)
	{
		this.<>1__state = 0;
		this.<>u__$awaiter7 = taskAwaiter;
		this.<>t__builder.AwaitUnsafeOnCompleted<TaskAwaiter, AsyncMethods.<DownloadImageAsync>d__0>(ref taskAwaiter, ref this);
		flag = false;
		return;
	}
	break;
}
taskAwaiter.GetResult();
taskAwaiter = default(TaskAwaiter);

If the task has not been completed it sets the state to 0 which then comes back to default so it goes back and forth until taskAwaiter.IsCompleted is true.

07. New C# 6.0 Features

C# 6.0 has just been released a few days ago. There aren’t many groundbreaking features as the team at Microsoft stated. Mostly the new features are shothand notations to make the code easier to read and write. I decided to have a look at the decompiled code of some of the new structs

Null-conditional operators

This is a great remedy to ease the neverending null-checks. For example in the code below

var people = new List<Person>();
var name = people.FirstOrDefault()?.FullName;

If the list is empty name will be null. If we didn’t have this operator it would throw an exception therefore we would have to do the null checks on our own. The decompiled code for the above block is like this:

private static void Main(string[] args)
{
    string fullName;
    Program.Person person = (new List<Program.Person>()).FirstOrDefault<Program.Person>();
    if (person != null)
    {
        fullName = person.FullName;
    }
    else
    {
        fullName = null;
    }
}

As we can see, the generated code is graciously carrying out the null check for us!

String interpolation

This is one of my favourites: Now we can simply place put the parameters directly inside a string instead of placeholders. Expanding on the example above, imagine the Person class is like this:

class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string FullName => $"{FirstName} {LastName}";
}

The decompiled code looks familiar though:

public string FullName
{
    get
    {
        return string.Format("{0} {1}", this.FirstName, this.LastName);
    }
}

This is exactly how I would do it if we didn’t have this new feature.

Auto-property initializers

We can set the default value for a property on the declaration line now, like this:

public string FirstName { get; set; } = "Volkan";

And the IL generated for the code is like this:

Decompiled code varies. Just Decompile doesn’t just displays the exact same statement as above. ILSpy on the other hand is closer to the IL:

public Person()
{
	this.<FirstName>k__BackingField = "Volkan";
	this.<BirthDate>k__BackingField = new DateTime(1966, 6, 6);
	base..ctor();
}

So what it does is, in the constructor it sets the private backing field with this default value we assign.

Expression bodied members

We can now provide the body of the function as an expression following the declaration like this:

public int GetAge() => (int)((DateTime.Now - BirthDate).TotalDays / 365);

The decompiled code is not very fancy:

public int GetAge()
{
    TimeSpan now = DateTime.Now - this.BirthDate;
    return (int)(now.TotalDays / 365);
}

It simply puts the expression back in the body.

Index initializers

It’s a shorthand for index initialization such as

Dictionary<int, string> dict = new Dictionary<int, string>
{
    [1] = "string1",
    [2] = "string2",
    [3] = "string3" 
};

And the decompiled code is no different than what it is now:

Dictionary<int, string> dictionary = new Dictionary<int, string>();
dictionary[1] = "string1";
dictionary[2] = "string2";
dictionary[3] = "string3";

As I mentioned there is nothing fancy about the new features in terms of the generated IL but some of them will help save a lot of time for sure.

08. Compiler code optimization

It is interesting to see how decompilers use optimization on their own. Before I went into this I thought they would just reverse engineer whatever is in the assembly but turns out that’s not the case.

First, let’s have a look at the C# compiler’s behaviour. In Visual Studio, Optimize code flag can be found under Project Properties -> Debug menu

In order to test code optimization I created a dummy method like this:

public class UnusedVariables
{
    public void VeryImportantMethod()
    {
        var testClass = new TestClass("some value");
		TestClass testClass2 = null;
        var x = "xyz";
        var n = 123;
    }
}

First I compiled it by default values:

Then disassembled the output:

I’m no expert in reading IL code but it’s obvious in the locals section there are 2 TestClass instances, a string and a integer.

When I compiled it with the optimization option (/o):

I got the following when disassembled:

There’s no trace of the value types (string and the int). Also the TestClass instance with null value is gone. But the TestClass instance, which is created in the heap rather than the stack, is still there even though there is nothing referencing it. It’s interesting to see that difference.

When I decompiled both versions with ILSpy and JustDecompile unused local variables were removed all the time. Apparently they do their own optimization.

Conclusion

It was a fun exercise to investigate and see the actual generated code. It was helpful to understand better complex constructs like async/await and also helpful to see there’s nothing to fear about the new features of C# 6.0!

Resources

RSS Feed Generation with C#

July 11, 2015 dev , aws csharp, s3, rss

Recently I got burned by another Windows update and my favorite podcatcher on the desktop, Miro, stopped functioning. I was already planning to develop something on my own so that I wouldn’t have to manually backup OPMLs (I’m sure there must be neat solutions already but again couldn’t resist the temptation of DIY!). So started exploring RSS feeds and the ways to produce and consume them. My first toy project is a feed generator.

Implementation

I developed a console application in C# that can be scheduled to generate the feed and upload it to AWS S3. Source code is here

Apparently .NET Framework has a System.ServiceModel.Syndication namespace since version 3.5 that contains all the tools need to consume and create an RSS feed with a few lines of code. The core part of the application is the part that generates the actual feed:

public SyndicationFeed GetFeed(List<Article> articles)
{
    SyndicationFeed feed = new SyndicationFeed(_feedServiceSettings.Title, _feedServiceSettings.Description, new Uri(_feedServiceSettings.BaseUri));
    feed.Title = new TextSyndicationContent(_feedServiceSettings.Title);
    feed.Description = new TextSyndicationContent(_feedServiceSettings.Description);
    feed.BaseUri = new Uri(_feedServiceSettings.BaseUri);
    feed.Categories.Add(new SyndicationCategory(_feedServiceSettings.Category));

    var items = new List<SyndicationItem>();
    foreach (var article in articles
        .Where(a => a.ispublished)
        .Where(a => a.ispubliclyvisible)
        .OrderByDescending(a => a.publisheddate))
    {
        var item = new SyndicationItem(article.title, 
            article.bodysnippet,
            new Uri (string.Format(_feedServiceSettings.ArticleUrl, article.slug)),
            article.articleid.ToString(),
            article.publisheddate);
        
        item.Authors.Add(new SyndicationPerson("", article.authorname, string.Format(_feedServiceSettings.UserUrl, article.user.username)));
        items.Add(item);
    }
    
    feed.Items = items;
    return feed;
}

The feed itself is independent from the format (RSS or Atom). The classes that do the actual formatting are derived from the abstract SyndicationFeedFormatter class: Rss20FeedFormatter and Atom10FeedFormatter. The format is read from the config file so the application supports both formats.

public SyndicationFeedFormatter CreateFeedFormatter(SyndicationFeed feed)
{
    string feedFormat = _feedSettings.FeedFormat;
    switch (feedFormat.ToLower())
    {
        case "atom": return new Atom10FeedFormatter(feed);
        case "rss": return new Rss20FeedFormatter(feed);
        default: throw new ArgumentException("Unknown feed format");
    }
}

and in the publisher service it gets the output feed:

var memStream = new MemoryStream();
var settings = new XmlWriterSettings(){ Encoding = Encoding.UTF8 };
using (var writer = XmlWriter.Create(memStream, settings))
{
    feedFormatter.WriteTo(writer);
}

I added the output of the API call as a JSON file under samples. Also implemented a fake API client called OfflineFedClient. It reads the response from a file instead of actually making an API call. It comes in handy if you don’t have an Internet connection or a valid API key. To use it in offline mode you have to change the the line that creates the client from this

var client = new FeedClient(configFactory.GetApiSettings());

to this

var client = new OfflineFeedClient(configFactory.GetOfflineClientSettings());

Lessons learned / Implementation Notes

So since the motivation behind the project is to learn more about manipulating RSS by myself here’s a few things that I’ve learned and used:

I created an IAM account that has write-only access to the bucket the feed will be stored in. It works with default permissions which is private access. But since RSS reader service will need access to the feed I had to upload the file with public access, Apparently changing ACL requires different permissions, namely s3:PutObjectAcl. Weird thing is just replacing s3:PutObject with s3:PutObjectAcl didn’t work either. They had to be both allowed. So after a few retries the final policy shaped up to be like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1418647210000",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::{BUCKET_NAME}/*"
            ]
        }
    ]
}

In this implementation I used Visual Studio’s neat feature Paste JSON As Classes.

First I captured the API response with Fiddler. Then created a blank .cs file and using this option created the class to deserialize the response. Using strongly typed objects can easily be a daunting task if you are wrapping a whole API so I’d prefer to use dynamic objects like this:

var response = client.Execute<dynamic>(request);
var latestArticles = ((IEnumerable) response.Data.payload.articles).Cast<dynamic>()
                        .OrderByDescending(a => a.publisheddate)
                        .Select(a => new 
                        { 
                            slug = a.slug,
                            id = a.articleid,
                            title = a.title,
                            publishedDate = a.publisheddate,
                            ispublished = a.ispublished,
                            isvisible = a.ispubliclyvisile
                        });

This works fine but the problem in this case was the hyphens in some of the JSON property names which are not supported in C#. I can get around it if I use the strongly typed objects and specify the property name explicitly, such as:

[JsonProperty("is-published?")]
public bool ispublished { get; set; }

But I cannot do it in the dynamic version. I’ll put a pin into it and move on for now but have a feeling it will come back and haunt me in the future!

Default output of the RSS feed passes the validation but get 3 warnings. I’m sure they can be safely ignored but just of curiosity researched a little bit to see if I could pass with flying colors. Two of the three warnings were

line 1, column 39: Avoid Namespace Prefix: a10 [help] <?xml version=”1.0” encoding=”utf-8”?><rss xmlns:a10=”http://www.w3.org/200 …

line 12, column 5302: Missing atom:link with rel=”self” [help] … encompassing the Ch</description></item></channel></rss>

I found the solution on StackOverflow (not surprisingly!)

I made a few changes in the formatter factory

case "rss":
{
    var formatter = new Rss20FeedFormatter(feed);
    formatter.SerializeExtensionsAsAtom = false;
    XNamespace atom = "http://www.w3.org/2005/Atom";
    feed.AttributeExtensions.Add(new XmlQualifiedName("atom", XNamespace.Xmlns.NamespaceName), atom.NamespaceName);
    feed.ElementExtensions.Add(new XElement(atom + "link", new XAttribute("href", _feedSettings.FeedUrl), new XAttribute("rel", "self"), new XAttribute("type", "application/rss+xml")));
    return formatter;
}

I liked the fact I only had to make changes in one place so the factory could return a customized formatter instead of the default one and the rest of the application didn’t care at all. But unfortunately the fix required the publish URL or the feed. I got around it by adding to FeedSettings in the config but now the S3 settings and Feed settings need to be changed at the same time.

My idea was to make it like a pipeline so that the feed generator didn’t have to care how and where it’s published but this fix contradicted with that approach a little bit. Unfortunately it doesn’t look possible to use variables in the config files so that I could generate Feed.Url using the other settings.

The 3rd warning was encoding-related. If don’t explicitly specify the API uses ISO-8859-1 charset. I tried playing around a with a few headers to get the response in UTF-8 but the solution came from a friend: Accept-Charset header. So adding the header fixed that issue as well:

request.AddHeader("Accept-Charset", "UTF-8");

Conclusion

The genereated Atom feed doesn’t pass the validation but I will handle it later on. Since Atom is a newer format I think I’ll go with that in the future but so far it’s good to know that it’s fairly easy to play with RSS/Atom feeds with C# so it was a fun experiment after all…

Resources

C# XML Serialization Tips

June 10, 2015 dev xml, csharp

Even though it’s a noisy data format it’s still commonly used and I happen to end up in situations that I need to use .NET to serialize and deserialize to and from XML documents. Here are a few problems that I had to tackle in the past. All the sample source code can be found in a GitHub repository.

01. Skip serializing unassigned values

Let’s assume we have a hypothetical Player class that has a few fields that looks like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }
    public int YearEstablished { get; set; }
}

So it contains both value and reference fields.

Now let’s create two random players and serialize them:

XmlSerializer serializer = new XmlSerializer(typeof(List<Player>));
Player player1 = new Player() { Id = 1, FirstName = "John", LastName = "Smith", TotalGoalsScored = 50, AverageGoalsPerGame = 0.7, Team = new Team() { Name = "Arsenal" } };
Player player2 = new Player() { Id = 2, FirstName = "Jack" };
using (StringWriter writer = new StringWriter())
{
    serializer.Serialize(writer, new List<Player>() { player1, player2 });
    Console.WriteLine(writer.ToString());
}

This code yields the following XML:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</ArrayOfPlayer>

The thing to note here is that reference types were not serialized when they were unassigned, i.e Team object and LastName field in Player2. But same didn’t go for values fields. TotalScore, AverageScorePerGame and YearEstablished fields were serialized as 0. Of course this might be the desired outcome depending on your business requirements but in my case I didn’t want this because it might mislead the client consuming this data. At the very least I find it inconsistent as some unassigned values are serialized and some aren’t.

So to change the behaviour all we have to do is set the DefaultValue attribute for the numeric values like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    [DefaultValue(0)]
    public int TotalGoalsScored { get; set; }

    [DefaultValue(0)]
    public double AverageGoalsPerGame { get; set; }
    
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }

    [DefaultValue(0)]
    public int YearEstablished { get; set; }
}

With this change the output becomes:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
  </Player>
</ArrayOfPlayer>

So as the int and double values defaulted to 0 and we explicitly set the default value they won’t be serialized unless they are assigned a value other than zero.

In case you are wondering making the int and double nullable doesn’t produce the same result. In that case they are serialized with null values:

<TotalScore xsi:nil="true" />
<AverageScorePerGame xsi:nil="true" />

I think this if-you-set-it-you-get-it-back approach is consistent and makes most sense to me. I created a test project to fiddle with these classes. It has all 3 versions of the classes under different names and a console application displaying the outputs. If you want to play around you can get the source code here

02. Deserialize straight to list

Sometimes what you get in an XML document is just a list of items and the list is just a container so that the XML is well-formed. For example in the following example we have a list of players:

<PlayerList>
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</PlayerList>

The sole purpose of PlayerList tag is to act as root and contain multiple objects. Other than that it has no function. When we deserialize this to C# objects we would normally need 2 objects like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

and

[XmlRoot]
public class PlayerList
{
    [XmlElement("Player")]
    public List<Player> Players { get; set; }
}

and we can get the list of objects by:

string inputXmlPath1 = @".\InputXml.xml";
using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(PlayerList));
    PlayerList playerList = (PlayerList)playerListSerializer.Deserialize(reader);
}

In such cases I generally tend to eliminate the “middle man”. I don’t want a container class which only holds a List. So I’d like to deserialize this XML directly into List.

What I want to do is actually this:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}

But without any modifications to our classes it throws an exception:

By eliminating PlayerList class we actually stopped providing XmlRoot info to the serializer. But that can quickly be remedied by using a constructor overload of XmlSerializer:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>), new XmlRootAttribute("PlayerList"));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}     

This works when the class name matches the XmlElement name. If you need to customize your class’s name (like FinalPlayer in my example) you need to decorate the class with XmlType and supply the element name so that the serializer can do the mapping.

[XmlType("Player")]
public class FinalPlayer
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

So now we can have any class name mapping to corresponding elements and deserialized straight to a List.

03. Remove namespaces

I know using namespaces is considered a good practice as it helps avoid name conflicts but honestly I never suffered from such a problem before so I don’t mind removing them from my XML documents and clean the clutter. XML is already a noisy data format no need to bloat it any further. I think it might help when you are working with data from different sources but if you are only working with your own classes and data structures name conflict is generally not something to worry about (assuming you name your objects properly)

So say you have a simple Player class and you serialize it with a out-of-the-box XmlSerializer:

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = true, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
serializer.Serialize(writer, player);
Console.WriteLine(output.ToString());

This yields the following output:

<Player xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://
www.w3.org/2001/XMLSchema">
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

So in order to get rid of namespaces we have to specify our custom namespaces, which is empty in this case and use the overloaded XmlSerializer constructor to pass it in:

XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);

to get the XML without any namespaces:

<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

04. Change output encoding

When you serialize with the default options and with the XML declaration the encoding is UTF-16. Oddly enough there is no option to specify the output encoding. In order to achieve that and sometimes you may need to change it. For example a 3rd party was expecting UTF-8 in my case so the default value didn’t cut it for me.

So using the same Player class from the last example the following code produces an output with UTF-16

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(output.ToString());

Output:

<?xml version="1.0" encoding="utf-16"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

I found the solution here on StackOverflow.

So the solution is to extend from StringWriter and add a new constructor that accepts Encoding. StringWriter already has an Encoding property as shown below, but unfortunately it doesn’t have a public setter so we need a subclass to fiddle with it.

StringWriterWithEncoding simply overrides th encoding field:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding(Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

By using the new class the following code produces the desired outcome:

StringWriterWithEncoding utf8StringWriter = new StringWriterWithEncoding(Encoding.UTF8);
Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
XmlWriter writer = XmlWriter.Create(utf8StringWriter, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(utf8StringWriter.ToString());
Console.ReadLine();

Output:

<?xml version="1.0" encoding="utf-8"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

05. Fail deserialization when unexpected elements are encountered

By default XMLSerializer is very fault tolerant. It just salvages whatever it can and leaves alone the unmatching values. Sometimes you may need to be stricter. For example I had a case when the external source returned a whole different XML when there was an error on its end. So when that happened I wanted to be notified about it instead of getting null objects quietly.

For example, assume we have the usual PlayerList that we are deserializing to List. If for some reason we get a weird Player list like this:

<PlayerList>
  <Customer>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
  </Customer>
</PlayerList>

When we deserialize it with the following code block

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);    
    }
}

XmlSerializer doesn’t complain at all. Instead it just returns an empty list because it cannot find any Player objects. In order to change this behaviour we can use the events it exposes such as UnknownNode, UnknownElement, UnknownAttribute. UnknownNode is just the combination of the first two events. In my case I didn’t want to be too strict so I didn’t want an exception in case of a missing attribute but hooked into the UnknownElement event:

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        serializer.UnknownElement += serializer_UnknownElement;
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

and added the event handler:

void serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
    throw new ArgumentException(string.Format("Unknown element: {0}", e.Element.LocalName));
}

So now at least I can distinguish a weird list from a really empty list.

Playground for the mind

It's all about the journey, not the destination

Disassembling And Decompiling .NET Assemblies

Toolkit

01. Constructor

02. Properties

03. Using statement

04. var keyword and anonymous types

06. Async and await

07. New C# 6.0 Features

Null-conditional operators

String interpolation

Auto-property initializers

Expression bodied members

Index initializers

08. Compiler code optimization

Conclusion

Resources

RSS Feed Generation with C#

Implementation

Lessons learned / Implementation Notes

Conclusion

Resources

C# XML Serialization Tips

01. Skip serializing unassigned values

02. Deserialize straight to list

03. Remove namespaces

04. Change output encoding

05. Fail deserialization when unexpected elements are encountered

Resources