Note: This is the first in a series of blog posts discussing the need for flexible data models when managing digital asset metadata. The series is based on Demian Hess’ article “Managing digital asset metadata,” Journal of Digital Media Management, Vol. 3, No. 2 (November 2014).
I recently worked with a government agency that was building a massive repository of digital assets. The project’s data architect was struggling to define a table structure to hold all the associated asset metadata. When I asked whether he had considered using so-called “No SQL” technologies, like document- or graph-databases, he said he had never considered this approach and wasn’t sure how to incorporate it. That’s when I realized there was a real need to familiarize users and technologists in the Digital Asset Management (DAM) space about these options.
Over the last several years, developers have increasingly shifted away from relational databases toward NoSQL solutions. The change is not simply a move away from a specific technology. It represents a change in the way we model and manage information. In the past, we always created fixed data schemas that precisely defined all of the data that was going to be collected. These schemas defined the tables we built in our relational databases and in turn were the basis of our validation strategies. If data did not fit the schema, it was rejected as incomplete or incorrect.
Digital asset metadata provides a good example of why this approach no longer works. First, asset metadata is extremely varied. An image file requires different metadata than a video file and an asset delivered to a video streaming service needs different metadata than an asset sold directly from a company’s website. Second, data requirements keep changing due to evolving business models. We no longer know what information will be available or what information will be required.
Fixed schemas are not possible in such a fluid data environment. We need flexible models to manage this data—models that allow different assets to have different metadata, and which recognize that metadata requirements cannot be precisely known and will evolve over time.
To understand the need for and advantages of flexible data models, it is helpful to consider a traditional data model as it would be implemented in a relational database, such as this table containing technical metadata about a digital asset.
|AssetId||File Name||File Size||Mime Type|
Table 1. Examples of attributes common to all file-based digital assets
The table works well because all file-based assets share a common set of attributes. In other words, every asset fits neatly into every row of the table with no empty columns or ambiguities over how to populate the data.
The situation gets murkier when you move beyond basic file information. Different types of assets have different properties, such as height and width for images, frame rate for video, number of tracks for audio, etc. Each new piece of data requires adding a new column to the table.
|AssetId||File Name||File Size||Mime Type||Height||Width||Color Space||Tracks||Frame Rate|
Table 2. Example of a table that attempts to hold format metadata for all asset types
As you can see, many of the columns are left blank since they only apply to a subset of assets. The large number of empty columns creates confusion for users. Is a given column empty because it is not applicable or because it is not available? Is height and width appropriate for a video, or should it only be used for images?
An alternative approach is to create specialized tables for every type of asset. This has the benefit of limiting the size of the main Asset table, but proliferates tables that are difficult to query. Furthermore, as new asset types are discovered, you need to restructure the database by adding new tables or new columns. Database restructuring requires expensive and disruptive changes in queries and application-layer logic.
Figure 1. Entity Relationship Diagram showing specialized entities for different asset types.
The limitations of this database design become more apparent when you consider that we have only begun to scratch the surface of the metadata that needs to be managed. We also need rights and restrictions metadata, workflow metadata, descriptive metadata, and all the other information needed by the myriad business processes that interact with your assets.
The fundamental flaw is that we are attempting to define all the attributes for every type of digital asset in our data model in advance. In other words, we are imposing an inflexible data model.
In the rest of this series, I will show you how to implement a more flexible data model using different technologies, including traditional relational databases, document stores, and graph databases.