MongoDB is a NoSQL database that organizes data as JSON documents in collections instead as rows in tables. One major difference between a document in a collection and a row in a table is that while all rows in a table share the same schema, each document in a collection can be completely different. With all the huge advantages this approach brings, there is a friction when strongly typed domain objects are deserialized from a collection. A document needs to match the shape of the object that it should be serialized to.
The “official” MongoDB driver for .NET – that is the one offered and supported by 10Gen, the company behind MongoDB – offers a serializer that can be told quite exactly how tolerant it needs to be with the deserialization of JSON documents in to objects:
- When a JSON document defines extra elements that are not present in the class that it is deserialized into, the serializer can either ignore them, stash them into a “catch-all” property, or throw an exception.
- When properties on the class are missing in the JSON document, the serializer can set the property to null, it can set a default value, or it can throw an exception.
These options are well described in the official documentation.
Handling deserialization errors in collections
The Mongo C# driver supports operations to retrieve single elements from the database as well as collections. As we’ve seen, there is a lot of flexibility in defining a level of tolerance that matches your application’s requirements for the deserialization of single objects, the policy for collections is very simple: if any element fails, the whole collection is not loaded. While this is the right behaviour for scenarios that rely on data integrity, applications that focus on availability are threatened by complete loss of functionality when just a single document in a collection does not match the deserialization requirements.
- A scenario that relies on data integrity is a form in a line of business application that edits a list of closely related data like line items of an invoice. The line items all need to provide the same set of information, otherwise the invoice cannot reliably be calculated.
- A scenario that focuses on availability is a search in a product catalogue in an online shop. If a single product is not correctly formatted in the database, it’s better to just not show that single product than letting the whole search page become unavailable and basically close the shop until the ill-formatted product is identified and corrected.
In the case of the search page, a better behaviour would be to exclude any document that cannot be serialized, and provide a callback that identifies the problematic documents. The approach I’ve taken is to copy the default implementation of the collection serializer, and augment the bit that iterates over the elements of the collection with an event handler that keeps the iteration running:
while (bsonReader.ReadBsonType() != BsonType.EndOfDocument) { var elementType = discriminatorConvention.GetActualType(bsonReader, typeof(T)); var serializer = BsonSerializer.LookupSerializer(elementType); T element; try { element = (T)serializer.Deserialize(bsonReader, typeof(T), elementType, null); list.Add(element); } catch (FileFormatException exc) { // Pass the exception to the provided callback if (HandleDeserializationError != null) { HandleDeserializationError(exc); } // Move the cursor to the next element after the faulted one while (bsonReader.State != BsonReaderState.EndOfDocument) { if (bsonReader.State == BsonReaderState.Value) bsonReader.SkipValue(); if (bsonReader.State == BsonReaderState.Type) bsonReader.ReadBsonType(); } bsonReader.ReadEndDocument(); } } bsonReader.ReadEndArray();
You can download the full source from Bitbucket. Please note that it’s not production quality – it has not been used in production, but that could change over the next few weeks, and then it’s going to get an update with the real thing.
Image may be NSFW.
Clik here to view.
Clik here to view.
