DAM and Flexible Data Models Using Document Databases

In my last post, I demonstrated a “flexible” way to store digital asset metadata inside a relational database. The results were not perfect. Relational databases require a strict data schema so that all information fits into pre-defined two-dimensional tables. This make it very difficult to achieve “flexibility”. In other words, it is hard in a relational database to allow different assets to have different metadata and to allow metadata to change over time. If the metadata is very complex, with multiple values and hierarchical nesting of information (think multiple people associated with one or more assets, each with multiple addresses and phone numbers), the “flexible” relational database approach I presented in my last blog post simply doesn’t work.

An alternative approach is to store your digital asset metadata inside a document database. Instead of the paradigm of one-record-equals-one-row and one data-attribute-equals-one-column, a single record and all its attendant attributes can be conceived of as a single document. For example, you could store information about an image and a video in two different documents as shown here:

Figure 1

Figure 1. Example of documents that can be stored for different asset types in a document database.

Only the data applicable to an asset needs to be listed in its document. Fields can also be inserted or deleted at any time. While the structure of the document (in other words, the schema) can be enforced to ensure that specific fields are present for specific types of assets, data validation is optional and documents are not restricted to a predefined form.

Figure 1 shows image and video metadata documents as abstract entities containing fields and values. In reality, documents are encoded using a specific serialization format, with the most popular currently being Extensible Markup Language (XML) and Javascript Object Notation (JSON). Both formats are used because they accommodate ordered, multipart metadata of arbitrary complexity. The following examples show how the same information can be encoded in XML and in JSON: 

Figure 2

Figure 2. Example of XML encoding

Figure 3

Figure 3: Example of JSON encoding

XML is particularly useful if the ordering of the metadata properties is significant. For example, XML parsers will always enforce the fact that the FirstName element appears before the LastName element and that John Doe appears before Jane Smith.

JSON can also preserve some ordering information, but not all. The sequence of photographer names will be preserved because they are stored inside a JSON array (denoted by square brackets).  Object properties, however, like FirstName and LastName do not have an inherent sequence and JSON parsers can change the ordering of these properties at will. In other words, the JSON object { “FirstName”: “John”, “LastName” : “Doe” } is equivalent to { “LastName” :”Doe”, “FirstName” : “John” }.

The flexibility of documents means that any type of asset can be stored in the database and the designer does not need to anticipate the form of the metadata in the future. As new metadata becomes available, it can be added to the documents with the assurance that even the most complex and deeply nested information can be represented without loss.

Document databases tend to suffer in supporting relationships between documents. For example, consider the case where you have contact information about a photographer. The most natural way to capture the relationship between the asset, the photographer and the contact information is to embed all the metadata inside a single document. Using an XML encoding, the document could look like this:

Figure 4

Figure 4. Example of image metadata document with embedded photographer contact information

This creates a problem when the same photographer is associated with more than one image. If you repeat the photographer information in every image document, you waste storage space and create a challenge when information needs to be updated. Instead of updating the photographer’s information in one document, every document needs to be found and updated.

A solution to this problem is to create a single document for each photographer and to embed references to the photographer documents inside the asset documents:

Figure 5

Figure 5. Examples of Asset and Photographer documents.

In this case, the asset document contains an ID for the photographer, which can then be used to query the database to find the corresponding photographer document. Unfortunately this solution creates performance problems when you need to query a large number of documents.  For example, if you need to display hundreds of images, each listing all the contact information about the photographer, the system first needs to retrieve the image documents, examine each to determine the photographer id, and then make a separate query for each related photographer document. These query operations tend to be slow and point to a fundamental problem with document databases: they work best when the information is contained within a single document. If information needs to be drawn from related documents, performance suffers.

In the next post I will look at a third option for implementing a flexible data model: graph and semantic databases.

Note: This is the third in a series of blog posts discussing the need for flexible data models when managing digital asset metadata. The series is based on Demian Hess’ article “Managing digital asset metadata”, Journal of Digital Media Management, Vol. 3, No. 2 (November 2014).

Demian Hess About Demian Hess

Demian Hess is Avalon Consulting, LLC's Director of Digital Asset Management and Publishing Systems. Demian has worked in online publishing since 2000, specializing in XML transformations and content management solutions. He has worked at Elsevier, SAGE Publications, Inc., and PubMed Central. After studying American Civilization and Computer Science at Brown University, he went on to complete a Master's in English at Oregon State University, as well as a Master's in Information Systems at Drexel University.

Leave a Comment