Verifying Data Integrity with AWS S3

devaws s3, csharp comments

When it comes to transferring files over network, there’s always a risk of ending up with corrupted files. To prevent this on transfers to and from S3, AWS provides us with some tools we can leverage to guarantee correctness of the files.

Verifying files while uploading

In order to verify the file is uploaded successfully, we need to provide AWS the MD5 hash value of our file. Once upload has been completed, AWS calculates the MD5 hash on their end and compares the both values. If they match, it means it went through successfully. So our request looks like this:

var request = new PutObjectRequest
{
    MD5Digest = md5,
    BucketName = bucketName,
    Key =  key,
    FilePath = inputPath,
};

where we calculate MD5 hash value like this:

using (var stream = new FileStream(fullPath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    using (var md5 = MD5.Create())
    {
        var hash = md5.ComputeHash(stream);
        return Convert.ToBase64String(hash);
    }
}

In my tests, it looks like if you don’t provide a valid MD5 hash, you get a WinHttpException with the inner exception message “The connection with the server was terminated abnormally”

If you provide a valid but incorrect MD5, the exception thrown is of type AmazonS3Exception with the message “The Content-MD5 you specified did not match what we received”.

Amazon SDK comes with 2 utility methods named GenerateChecksumForContent and GenerateChecksumForStream. At the time of this writing, GenerateChecksumForStream wasn’t available in the AWS SDK for .NET Core. So the only method worked for me to calculate the hash was the way as shown above.

Verifying files while downloading

When downloading we use EtagToMatch property of GetObjectRequest to have the verification:

var request = new GetObjectRequest
{
	BucketName = bucketName,
    Key =  key,
    EtagToMatch = "\"278D8FD9F7516B4CA5D7D291DB04FB20\"".ToLower() // Case-sensitive
};

using (var response = await _s3Client.GetObjectAsync(request))
{
    await response.WriteResponseStreamToFileAsync(outputPath, false, CancellationToken.None);
}

When we request the object this way and if the the MD5 hash we send doesn’t match the one on the server we get an exception with the following message: “At least one of the pre-conditions you specified did not hold”

Once important point to keep in mind is that AWS keeps the hashes in lowerc-ase and the comparison is case-sensitive so make sure to convert everything to lower-case before you send it out.

Resources

Comments