A sturdy web content management system is critical and one of the essential factors for the success of ecommerce oriented sites. A lot of emphasis is placed on having the right architecture, infrastructure and IT development centered around the CMS. But at the same time an efficient business process that organizes, structures and delivers content through various channels is equally important. An effective content management life cycle that tracks and keeps relevant content while removing content that is no longer needed makes the system more manageable. This article discusses on how to keep the CMS well oiled and efficient by removing unused and unwanted content from it.
One of the interesting consulting assignments I faced was when one of our clients had asked our help to help them identify and remove old and unused content from their web content management system. While this looked to be an easy task from the outset, the challenges I encountered were pretty different and sometimes a little challenging.
What is Content?
There are usually two types of content that resides in the CMS eco system.
a) Content that is displayed on the site. Static files such as images, videos, pdf files etc. and Content instances belonging to a specific content type/schema fall under this category.
b) Supporting content: Content that is used to organize and support the content which is displayed on the site. Examples are Projects or Folders where content resides in, Channels where the content is published, Content Type Definitions / Schema, Configuration files for different processes etc.
Why retire content?
I have encountered users who ask why this is considered to be a problem and why it is important. To highlight both the Whys, I usually give a simple analogy. We use computers to perform our job obligations. They usually have a specific hard drive and memory capability. As we start using these systems, we end up storing all sorts of files. Some are stored on the desktop, and some are organized in different folders and hard drive partitions. When we notice the system is slow one of the tasks performed is a disk cleanup. We delete all the unwanted files including emptying recycle bins and reorganize the remaining folders and files to make the system more usable.
In the case of content management, before we get into the why, let us take a step back and understand the content management process. Maintaining content in any web content management system has the following steps:
a) Enter content (additions, deletions and updates).
b) Review, version, translate, publish and unpublish content.
c) Work with the IT team for maintenance and for new project ventures (comprises tasks a and b mentioned above).
I have only highlighted the basic operations. Different organizations have different team and project structures but ultimately they end up doing the tasks mentioned above.
In my experience when a new initiative is started most of the discussions and processes are centered on the content that needs to be entered, but I rarely hear anything about the content that needs to be retired. The initiative starts at a full pace and once the desired content is published and passes all QA and End user acceptance testing validations, everything is good to go. But hardly does anyone take notice of the content that had to be replaced or removed. Here are some of the cases I have observed:
- Content was unpublished but was not marked for retirement or deletion.
- Orphaned content – If content instances A and B are related or referred within each other and if content instance A needs to be deleted, either content instance B needs to refer to A’s replacement content or should be deleted along with A. Deleting a content with its references intact is recipe for trouble.
- Content was replaced but the replaced content was created in a different folder with the same name.
- Content was versioned for a specific need but after the needs were met the version was not reverted back.
I can go on citing a few more examples but the message I am trying to convey is content that is not maintained causes not only technical problems, but can also impact troubleshooting and decision making in projects and initiatives. Sometimes we end up running in circles while trying to determine if the content can be safely retired and deleted. While content that is not deleted may not look huge from a single project or initiative, multiply it by the number of projects or initiatives launched and suddenly you are looking at a huge digital landfill ! And this is when the problem starts. When the content authors are approached about the possibility of retiring their content, here are a few excuses I have heard from them:
- I don’t know whether this content is still used or not. I need to verify with the other team.
- Let me check with Bob who is on a vacation and so you need to wait till Bob gets back.
- The person who knows about this content has left the firm. I need x number of people for y number of hours to investigate. Do you have a schedule and a time entry code for our team to charge this effort? And I need an approval from your manager and my manager to proceed with the work.
- I don’t know. Let us keep it because I don’t want to be held responsible if it breaks something.
….and so on and so forth.
In the end, the CMS ends up carrying a huge deadweight around its neck, which many a times ends up in eating disk space and affecting response times. Too much of data makes the system unusable for users. CMS systems need to be kept leaner and the philosophy is simple – keep content that is needed; remove content that is not needed. If in doubt, either stage the content to a retired folder along with the other content that can be retired/deleted or mark the targeted content for analysis. Such content can be reviewed during the yearly content review process.
When to retire content?
The short answer is that it is a daily process. It should be part of the daily operation process (for maintenance teams) and part of the project lifecycle for new initiatives and projects. One of the questions that need to be asked during the launch of an initiative is to determine whether any existing content would get replaced or updated. If yes, identify them and make it part of the Launch process. If for some reasons the content needs to be removed after x number of days, flag the content and either revisit it after x days or take it up during the yearly review process.
Where to retire content?
There are usually two ways to perform this task depending on the nature of the CMS.
Some CMS systems allow the content to be moved around different folders without impacting the URL or pages in the site. In that case a central archive or retired folder a can be created where content to be retired or deleted can be staged for x period. To make this more user friendly, the full folder structure should be created so that it would be easy for the users to understand where the content was retired from. For example, if the content in the folder /Corporate/Communications/CEOSpeech/2008 and in /Sales/Revenue/AsiaPacific/2009 needs to be removed or staged for sometime before deleting, create the same folder structure under the Retired folder and move it there. The Retired folder structure would now look like as follows:
Some CMS systems are not tolerant regarding moving content within the folder structures without impacting the URL or the sitemap. In that case every single content type or schema should have an attribute for indicating whether it can be deleted or archived and this attribute needs to be checked. A report or a script that can be developed or run which searches for all content that has this particular attribute set and delete them after unpublishing them from the Sites/Channels.
How to retire content?
Some organizations have a workflow or other utilities that unpublishes the content. Depending on the nature of the content and where it is present, the Content Administrator can come up with a script to perform mass unpublishing and physical deletion/archival of content. Otherwise the content team can manually perform those steps as well. The basic steps are:
a) Identify content to be retired.
b) Unpublish content (if the content instance is published).
c) Remove channel(s) associated with the content instance, if any.
d) Move the content instance to a folder marked for retired content.
Mark the content instance by creating an attribute (if it does not exist already) that would indicate that it is ready to be retired.
e) Delete the content present in the retired folder after x number of days.
f) For content that is not part of any channel, make sure it is not referred in any other wrapper or container content types.
Content can be static and dynamic. The rate at which a content instance is updated or replaced depends on the nature of the content. A deal or a promotion can be either static or dynamic depending on different factors. Long standing deals, promotions and offers can remain in the system for a long period. Static content such as policy statement or executive bio remains unchanged for a period of time. Static files such as digital media (images, flash, video files) and documents (word, excel, pdf) are replaced or updated either regularly or periodically. In this case the client I was working with had a pretty heavy usage of imagery which ran into roughly 600 to 800 updates per week. One of the side effects of the update was that many of the static files ended up with multiple versions which impacted disk space.
Depending on the nature of the business, the volume and volatility of content and the CMS operation processes involved content retirement can either be a simple and straightforward effort or can be a long drawn complicated affair. But there are some simple guidelines which can be followed to make it more streamlined such as:
a) Include content retirement as part of the project milestone. Any project that involves content in any manner needs to include it as one of the closing steps to get the project successfully signed off. Allocate time and resources for the initiative.
b) Organize content in an efficient manner so that it is easy to access it whenever needed. For example let us say that the marketing team has a couple of folders (projects. They can organize all dynamic content within separate folders suffixed by year. During 2012 they can get rid of all content present in the 2008 folder (if it exists) or by whatever criteria they come up with.
c) Either mark the content that can be removed as ‘Retired’ or move the content to a separate folder that is earmarked for retirement rather than deleting it physically unless the chances of revisiting the content later is nil.
d) Have specific CMS reports designed and executed that inform the usage of content across different project folders and channels as well as the Content Type of the content instances. The reports can be designed to run periodically (daily, weekly, monthly, quarterly, half yearly, annually etc) .
e) Establish a yearly process to do the following:
- Revisit all items that are in the parking lot (content that has been marked as to be deleted but not yet deleted).
- Delete content that is either marked for retirement or that is in the retirement folder.
- Review the findings of the CMS reports and implement an appropriate action based on it. For example if 0 instances of a specific content type show up, and if it has been determined by the business and the IT team that there won’t be any such content created, the content type can be removed from the CMS.
Who retires content?
The simple answer is everyone who has edit level access. That being said, in some organizations, not all users are given the capability to delete content. In such cases, they can move the content to the Retired folder to which all users must be provided access. The yearly review process would then make sure that the content from the Retired folder is deleted after checking with the relevant teams.
There is obviously no silver bullet to mitigate the problems related to a bloated content management system. But if adequate steps mentioned above are taken and is adapted as part of the process the CMS can stay leaner and efficient for the business users.