C# XML Serialization Tips

xml, programming, csharp comments edit

Even though it’s a noisy data format it’s still commonly used and I happen to end up in situations that I need to use .NET to serialize and deserialize to and from XML documents. Here are a few problems that I had to tackle in the past. All the sample source code can be found in a GitHub repository.

01. Skip serializing unassigned values

Let’s assume we have a hypothetical Player class that has a few fields that looks like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }
    public int YearEstablished { get; set; }
}

So it contains both value and reference fields.

Now let’s create two random players and serialize them:

XmlSerializer serializer = new XmlSerializer(typeof(List<Player>));
Player player1 = new Player() { Id = 1, FirstName = "John", LastName = "Smith", TotalGoalsScored = 50, AverageGoalsPerGame = 0.7, Team = new Team() { Name = "Arsenal" } };
Player player2 = new Player() { Id = 2, FirstName = "Jack" };
using (StringWriter writer = new StringWriter())
{
    serializer.Serialize(writer, new List<Player>() { player1, player2 });
    Console.WriteLine(writer.ToString());
}

This code yields the following XML:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</ArrayOfPlayer>

The thing to note here is that reference types were not serialized when they were unassigned, i.e Team object and LastName field in Player2. But same didn’t go for values fields. TotalScore, AverageScorePerGame and YearEstablished fields were serialized as 0. Of course this might be the desired outcome depending on your business requirements but in my case I didn’t want this because it might mislead the client consuming this data. At the very least I find it inconsistent as some unassigned values are serialized and some aren’t.

So to change the behaviour all we have to do is set the DefaultValue attribute for the numeric values like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    [DefaultValue(0)]
    public int TotalGoalsScored { get; set; }

    [DefaultValue(0)]
    public double AverageGoalsPerGame { get; set; }
    
    public Team Team { get; set; }
}

public class Team
{
    public string Name { get; set; }

    [DefaultValue(0)]
    public int YearEstablished { get; set; }
}

With this change the output becomes:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfPlayer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="
http://www.w3.org/2001/XMLSchema">
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
  </Player>
</ArrayOfPlayer>

So as the int and double values defaulted to 0 and we explicitly set the default value they won’t be serialized unless they are assigned a value other than zero.

In case you are wondering making the int and double nullable doesn’t produce the same result. In that case they are serialized with null values:

<TotalScore xsi:nil="true" />
<AverageScorePerGame xsi:nil="true" />

I think this if-you-set-it-you-get-it-back approach is consistent and makes most sense to me. I created a test project to fiddle with these classes. It has all 3 versions of the classes under different names and a console application displaying the outputs. If you want to play around you can get the source code here

02. Deserialize straight to list

Sometimes what you get in an XML document is just a list of items and the list is just a container so that the XML is well-formed. For example in the following example we have a list of players:

<PlayerList>
  <Player>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
    <Team>
      <Name>Arsenal</Name>
      <YearEstablished>0</YearEstablished>
    </Team>
  </Player>
  <Player>
    <Id>2</Id>
    <FirstName>Jack</FirstName>
    <TotalGoalsScored>0</TotalGoalsScored>
    <AverageGoalsPerGame>0</AverageGoalsPerGame>
  </Player>
</PlayerList>

The sole purpose of PlayerList tag is to act as root and contain multiple objects. Other than that it has no function. When we deserialize this to C# objects we would normally need 2 objects like this:

public class Player
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

and

[XmlRoot]
public class PlayerList
{
    [XmlElement("Player")]
    public List<Player> Players { get; set; }
}

and we can get the list of objects by:

string inputXmlPath1 = @".\InputXml.xml";
using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(PlayerList));
    PlayerList playerList = (PlayerList)playerListSerializer.Deserialize(reader);
}

In such cases I generally tend to eliminate the “middle man”. I don’t want a container class which only holds a List. So I’d like to deserialize this XML directly into List.

What I want to do is actually this:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}

But without any modifications to our classes it throws an exception:

By eliminating PlayerList class we actually stopped providing XmlRoot info to the serializer. But that can quickly be remedied by using a constructor overload of XmlSerializer:

using (StreamReader reader = new StreamReader(inputXmlPath1))
{
    XmlSerializer playerListSerializer = new XmlSerializer(typeof(List<FinalPlayer>), new XmlRootAttribute("PlayerList"));
    List<FinalPlayer> playerList = (List<FinalPlayer>)playerListSerializer.Deserialize(reader);
}     

This works when the class name matches the XmlElement name. If you need to customize your class’s name (like FinalPlayer in my example) you need to decorate the class with XmlType and supply the element name so that the serializer can do the mapping.

[XmlType("Player")]
public class FinalPlayer
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int TotalGoalsScored { get; set; }
    public double AverageGoalsPerGame { get; set; }
}

So now we can have any class name mapping to corresponding elements and deserialized straight to a List.

03. Remove namespaces

I know using namespaces is considered a good practice as it helps avoid name conflicts but honestly I never suffered from such a problem before so I don’t mind removing them from my XML documents and clean the clutter. XML is already a noisy data format no need to bloat it any further. I think it might help when you are working with data from different sources but if you are only working with your own classes and data structures name conflict is generally not something to worry about (assuming you name your objects properly)

So say you have a simple Player class and you serialize it with a out-of-the-box XmlSerializer:

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = true, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
serializer.Serialize(writer, player);
Console.WriteLine(output.ToString());

This yields the following output:

<Player xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://
www.w3.org/2001/XMLSchema">
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

So in order to get rid of namespaces we have to specify our custom namespaces, which is empty in this case and use the overloaded XmlSerializer constructor to pass it in:

XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);

to get the XML without any namespaces:

<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

04. Change output encoding

When you serialize with the default options and with the XML declaration the encoding is UTF-16. Oddly enough there is no option to specify the output encoding. In order to achieve that and sometimes you may need to change it. For example a 3rd party was expecting UTF-8 in my case so the default value didn’t cut it for me.

So using the same Player class from the last example the following code produces an output with UTF-16

Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
StringBuilder output = new StringBuilder();
XmlWriter writer = XmlWriter.Create(output, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(output.ToString());

Output:

<?xml version="1.0" encoding="utf-16"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

I found the solution here on StackOverflow.

So the solution is to extend from StringWriter and add a new constructor that accepts Encoding. StringWriter already has an Encoding property as shown below, but unfortunately it doesn’t have a public setter so we need a subclass to fiddle with it.

StringWriterWithEncoding simply overrides th encoding field:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding(Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

By using the new class the following code produces the desired outcome:

StringWriterWithEncoding utf8StringWriter = new StringWriterWithEncoding(Encoding.UTF8);
Player player = new Player() { Id = 102, FirstName = "Danny", LastName = "TopScorer", AverageGoalsPerGame = 3.5, TotalGoalsScored = 150 };
XmlSerializer serializer = new XmlSerializer(typeof(Player));
XmlWriterSettings settings = new XmlWriterSettings() { OmitXmlDeclaration = false, Indent = true, Encoding = Encoding.UTF8 };
XmlWriter writer = XmlWriter.Create(utf8StringWriter, settings);
XmlSerializerNamespaces xns = new XmlSerializerNamespaces();
xns.Add(string.Empty, string.Empty);
serializer.Serialize(writer, player, xns);
Console.WriteLine(utf8StringWriter.ToString());
Console.ReadLine();

Output:

<?xml version="1.0" encoding="utf-8"?>
<Player>
  <Id>102</Id>
  <FirstName>Danny</FirstName>
  <LastName>TopScorer</LastName>
  <TotalGoalsScored>150</TotalGoalsScored>
  <AverageGoalsPerGame>3.5</AverageGoalsPerGame>
</Player>

05. Fail deserialization when unexpected elements are encountered

By default XMLSerializer is very fault tolerant. It just salvages whatever it can and leaves alone the unmatching values. Sometimes you may need to be stricter. For example I had a case when the external source returned a whole different XML when there was an error on its end. So when that happened I wanted to be notified about it instead of getting null objects quietly.

For example, assume we have the usual PlayerList that we are deserializing to List. If for some reason we get a weird Player list like this:

<PlayerList>
  <Customer>
    <Id>1</Id>
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <TotalGoalsScored>50</TotalGoalsScored>
    <AverageGoalsPerGame>0.7</AverageGoalsPerGame>
  </Customer>
</PlayerList>

When we deserialize it with the following code block

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);    
    }
}

XmlSerializer doesn’t complain at all. Instead it just returns an empty list because it cannot find any Player objects. In order to change this behaviour we can use the events it exposes such as UnknownNode, UnknownElement, UnknownAttribute. UnknownNode is just the combination of the first two events. In my case I didn’t want to be too strict so I didn’t want an exception in case of a missing attribute but hooked into the UnknownElement event:

try
{
    using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(rawXml)))
    {
        XmlSerializer serializer = new XmlSerializer(typeof(List<Player>), new XmlRootAttribute("PlayerList"));
        serializer.UnknownElement += serializer_UnknownElement;
        var result = (List<Player>)serializer.Deserialize(memoryStream);
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    if (ex.InnerException != null)
    {
        Console.WriteLine(ex.InnerException.Message);
    }
}

and added the event handler:

void serializer_UnknownElement(object sender, XmlElementEventArgs e)
{
    throw new ArgumentException(string.Format("Unknown element: {0}", e.Element.LocalName));
}

So now at least I can distinguish a weird list from a really empty list.

Resources

Comments