Archive for the ‘Generic’ Category

Retiring Web Content-What, Why, When, Where, How & Who

Thursday, February 16th, 2012

Introduction:
A sturdy web content management system is critical and one of the essential factors for the success of ecommerce oriented sites. A lot of emphasis is placed on having the right architecture, infrastructure and IT development centered around the CMS. But at the same time an efficient business process that organizes, structures and delivers content through various channels is equally important. An effective content management life cycle that tracks and keeps relevant content while removing content that is no longer needed makes the system more manageable. This article discusses on how to keep the CMS well oiled and efficient by removing unused and unwanted content from it.

One of the interesting consulting assignments I faced was when one of our clients had asked our help to help them identify and remove old and unused content from their web content management system. While this looked to be an easy task from the outset, the challenges I encountered were pretty different and sometimes a little challenging.

What is Content?
There are usually two types of content that resides in the CMS eco system.

a) Content that is displayed on the site. Static files such as images, videos, pdf files etc. and Content instances belonging to a specific content type/schema fall under this category.

b) Supporting content: Content that is used to organize and support the content which is displayed on the site. Examples are Projects or Folders where content resides in, Channels where the content is published, Content Type Definitions / Schema, Configuration files for different processes etc.

Why retire content?
I have encountered users who ask why this is considered to be a problem and why it is important. To highlight both the Whys, I usually give a simple analogy. We use computers to perform our job obligations. They usually have a specific hard drive and memory capability. As we start using these systems, we end up storing all sorts of files. Some are stored on the desktop, and some are organized in different folders and hard drive partitions. When we notice the system is slow one of the tasks performed is a disk cleanup. We delete all the unwanted files including emptying recycle bins and reorganize the remaining folders and files to make the system more usable.

In the case of content management, before we get into the why, let us take a step back and understand the content management process. Maintaining content in any web content management system has the following steps:

a) Enter content (additions, deletions and updates).

b) Review, version, translate, publish and unpublish content.

c) Work with the IT team for maintenance and for new project ventures (comprises tasks a and b mentioned above).

I have only highlighted the basic operations. Different organizations have different team and project structures but ultimately they end up doing the tasks mentioned above.

In my experience when a new initiative is started most of the discussions and processes are centered on the content that needs to be entered, but I rarely hear anything about the content that needs to be retired. The initiative starts at a full pace and once the desired content is published and passes all QA and End user acceptance testing validations, everything is good to go. But hardly does anyone take notice of the content that had to be replaced or removed. Here are some of the cases I have observed:

  • Content was unpublished but was not marked for retirement or deletion.
  • Orphaned content – If content instances A and B are related or referred within each other and if content instance A needs to be deleted, either content instance B needs to refer to A’s replacement content or should be deleted along with A. Deleting a content with its references intact is recipe for trouble.
  • Content was replaced but the replaced content was created in a different folder with the same name.
  • Content was versioned for a specific need but after the needs were met the version was not reverted back.

I can go on citing a few more examples but the message I am trying to convey is content that is not maintained causes not only technical problems, but can also impact troubleshooting and decision making in projects and initiatives. Sometimes we end up running in circles while trying to determine if the content can be safely retired and deleted. While content that is not deleted may not look huge from a single project or initiative, multiply it by the number of projects or initiatives launched and suddenly you are looking at a huge digital landfill ! And this is when the problem starts. When the content authors are approached about the possibility of retiring their content, here are a few excuses I have heard from them:

  • I don’t know whether this content is still used or not. I need to verify with the other team.
  • Let me check with Bob who is on a vacation and so you need to wait till Bob gets back.
  • The person who knows about this content has left the firm. I need x number of people for y number of hours to investigate. Do you have a schedule and a time entry code for our team to charge this effort? And I need an approval from your manager and my manager to proceed with the work.
  • I don’t know. Let us keep it because I don’t want to be held responsible if it breaks something.

….and so on and so forth.

In the end, the CMS ends up carrying a huge deadweight around its neck, which many a times ends up in eating disk space and affecting response times. Too much of data makes the system unusable for users. CMS systems need to be kept leaner and the philosophy is simple – keep content that is needed; remove content that is not needed. If in doubt, either stage the content to a retired folder along with the other content that can be retired/deleted or mark the targeted content for analysis. Such content can be reviewed during the yearly content review process.

When to retire content?
The short answer is that it is a daily process. It should be part of the daily operation process (for maintenance teams) and part of the project lifecycle for new initiatives and projects. One of the questions that need to be asked during the launch of an initiative is to determine whether any existing content would get replaced or updated. If yes, identify them and make it part of the Launch process. If for some reasons the content needs to be removed after x number of days, flag the content and either revisit it after x days or take it up during the yearly review process.

Where to retire content?
There are usually two ways to perform this task depending on the nature of the CMS.

Some CMS systems allow the content to be moved around different folders without impacting the URL or pages in the site. In that case a central archive or retired folder a can be created where content to be retired or deleted can be staged for x period. To make this more user friendly, the full folder structure should be created so that it would be easy for the users to understand where the content was retired from. For example, if the content in the folder /Corporate/Communications/CEOSpeech/2008 and in /Sales/Revenue/AsiaPacific/2009 needs to be removed or staged for sometime before deleting, create the same folder structure under the Retired folder and move it there. The Retired folder structure would now look like as follows:
/Retired/Corporate/Communications/CEOSpeech/2008
/Retired/Sales/Revenue/AsiaPacific/2009

Some CMS systems are not tolerant regarding moving content within the folder structures without impacting the URL or the sitemap. In that case every single content type or schema should have an attribute for indicating whether it can be deleted or archived and this attribute needs to be checked. A report or a script that can be developed or run which searches for all content that has this particular attribute set and delete them after unpublishing them from the Sites/Channels.

How to retire content?
Some organizations have a workflow or other utilities that unpublishes the content. Depending on the nature of the content and where it is present, the Content Administrator can come up with a script to perform mass unpublishing and physical deletion/archival of content. Otherwise the content team can manually perform those steps as well. The basic steps are:

a) Identify content to be retired.

b) Unpublish content (if the content instance is published).

c) Remove channel(s) associated with the content instance, if any.

d) Move the content instance to a folder marked for retired content.
Or
Mark the content instance by creating an attribute (if it does not exist already) that would indicate that it is ready to be retired.

e) Delete the content present in the retired folder after x number of days.

f) For content that is not part of any channel, make sure it is not referred in any other wrapper or container content types.

Content can be static and dynamic. The rate at which a content instance is updated or replaced depends on the nature of the content. A deal or a promotion can be either static or dynamic depending on different factors. Long standing deals, promotions and offers can remain in the system for a long period. Static content such as policy statement or executive bio remains unchanged for a period of time. Static files such as digital media (images, flash, video files) and documents (word, excel, pdf) are replaced or updated either regularly or periodically. In this case the client I was working with had a pretty heavy usage of imagery which ran into roughly 600 to 800 updates per week. One of the side effects of the update was that many of the static files ended up with multiple versions which impacted disk space.

Depending on the nature of the business, the volume and volatility of content and the CMS operation processes involved content retirement can either be a simple and straightforward effort or can be a long drawn complicated affair. But there are some simple guidelines which can be followed to make it more streamlined such as:

a) Include content retirement as part of the project milestone. Any project that involves content in any manner needs to include it as one of the closing steps to get the project successfully signed off. Allocate time and resources for the initiative.

b) Organize content in an efficient manner so that it is easy to access it whenever needed. For example let us say that the marketing team has a couple of folders (projects. They can organize all dynamic content within separate folders suffixed by year. During 2012 they can get rid of all content present in the 2008 folder (if it exists) or by whatever criteria they come up with.

c) Either mark the content that can be removed as ‘Retired’ or move the content to a separate folder that is earmarked for retirement rather than deleting it physically unless the chances of revisiting the content later is nil.

d) Have specific CMS reports designed and executed that inform the usage of content across different project folders and channels as well as the Content Type of the content instances. The reports can be designed to run periodically (daily, weekly, monthly, quarterly, half yearly, annually etc) .

e) Establish a yearly process to do the following:

  1. Revisit all items that are in the parking lot (content that has been marked as to be deleted but not yet deleted).
  2. Delete content that is either marked for retirement or that is in the retirement folder.
  3. Review the findings of the CMS reports and implement an appropriate action based on it. For example if 0 instances of a specific content type show up, and if it has been determined by the business and the IT team that there won’t be any such content created, the content type can be removed from the CMS.

Who retires content?
The simple answer is everyone who has edit level access. That being said, in some organizations, not all users are given the capability to delete content. In such cases, they can move the content to the Retired folder to which all users must be provided access. The yearly review process would then make sure that the content from the Retired folder is deleted after checking with the relevant teams.

Conclusion:
There is obviously no silver bullet to mitigate the problems related to a bloated content management system. But if adequate steps mentioned above are taken and is adapted as part of the process the CMS can stay leaner and efficient for the business users.

Taxonomies, Content Management and Governance

Friday, July 8th, 2011

Good governance is on everyone’s minds these days.  It’s a concern that extends well beyond the Washington Beltway.  As applied to managing your enterprise content, including taxonomies, it is not just an abstraction.

Good governance drives the overall performance of your content program, including:

How easy it is for users to find information
How users look for information
How users store and retrieve information
How to clean up redundant content
What metadata is available
What templates are used
The need for a well-planned and well-run governance program will only increase.  The growth of unstructured information, demands for greater efficiency and cost savings, and privacy concerns are all motivations.

Are you wondering how to set up a governance program?  Are you questioning whether your existing content governance is right?  Avalon and our partner PPC are sponsoring a free webinar series that will help you Cultivate Content Management Success through Planned, Managed, and Implemented Taxonomies. For more information and to register, click here.

How to explain MarkLogic to a business user

Monday, June 13th, 2011

It is no secret we here at Avalon are enamored with MarkLogic technology. Our consultants have regular discussions that involve topics like the best way to use Java code with XQuery or how to integrate HTML5 with WebSockets to create a multi-publisher capability for MarkLogic (and no, we do not use pocket-protectors or wear hats with propellers). Now I understand these are important topics that yield very cool applications but they don’t really resonate with a business user. The typical business user (IMHO) who is being introduced to MarkLogic sometimes has a hard time wrapping his or her head around what the heck it is. When I encounter this confusion I point them to a simple analogy:

I do love old-school SNL.

So how is this analogous to MarkLogic?… very simple. Business users typically understand technology on a 1 to 1 basis. They understand that the search engine is used for searching documents and the content management system is used to change content on their web site and the database is used to store, well… data. MarkLogic simply does not fit the 1 to 1 model in the way most business users have been trained to understand technology, it is a “disruptive technology”. MarkLogic is really a platform to build countless applications to leverage any unstructured content. So what does that mean? Think about all the content/”stuff” you have that is valuable but would not naturally be a fit to be managed in a spreadsheet/database (e.g. it would probably not make sense to put your meeting notes, videos, mp3s, family photos, or this blog into a spreadsheet/database). So lets take a look at some practical MarkLogic use case examples:

Publishing - This is clearly MarkLogic’s sweet spot. After all… who has more unstructured content than a publisher? Now publishers not only have a good way to store and manage their books/magazines/journals/etc. but they can now easily create content “mashups”. What is a content mashup? Think of a student being able to buy individual chapters (or paragraphs for that matter) across multiple books instead of wasting money on content that he/she doesn’t need.

The “S” word - If you go to the MarkLogic website, you will not (at least at the time of this blog post) see Search as one of the categories under their solutions tab. This is really too bad as MarkLogic is an extremely powerful search engine. For instance, we were engaged by a large Association recently that was already using MarkLogic for publishing. This Association realized the power of MarkLogic’s search capabilities and asked us to develop a roadmap for replacing their Lucene/Solr search implementation with MarkLogic (I called it Lucene bypass surgery). They not only saw the value of using MarkLogic for search but how they could reduce costs from collapsing a layer of infrastructure, reducing support and training costs, and eliminating risk from an overly complex system. If MarkLogic ever takes on the other search vendors head to head – watch out Endeca, Autonomy, Fast, etc., etc.

Web Content Management (WCM) - WCM on MarkLogic is simply a natural extension of how to leverage your content and software investment. Avalon has been working with MarkLogic on developing a simple WCM interface to abstract all of the technical mumbo jumbo and put a straight-forward WYSIWYG interface to manage a web site with content stored in MarkLogic. More info here: http://avalonconsult.com/solutions/tools/wcm_for_marklogic

If I told you I would have to kill you - Life would be much more simple for our security agencies if al-Queda and the Mexican drug cartels would establish data centers, we could just hack into them and know what they were up to. It seems the bad guys tend to shy away from structured data (can’t imagine why). Now I don’t claim to know the exact use case(s) of how the US “three letter agencies” use MarkLogic but it is obviously valuable to be able to manage and analyze a ton of unstructured “intel”.

This is just a small sample of the applications MarkLogic can power. Geospatial, mobile, and metadata applications are just a few others that deserve attention for a MarkLogic solution.

So for all you business users out there. Don’t stress when someone in your organization comes up to you and says “I have an idea for a product that will serve our (insert your unstructured content need here) and it might also work as a (insert your other unstructured content need here). MarkLogic is the “New Shimmer” of technology… it simply works well for multiple applications.

Taxonomies and Content Management

Friday, June 3rd, 2011

My formal introduction to Vignette Content Management began with this statement:  ”Vignette has three hierarchical organizations for content.”   Put another way, you need three taxonomies to make Vignette work its best.

So what does that mean?  One definition of taxonomy is: “A defined hierarchy of categories; a tree-like structure of terminology that defines how categories relate to one another. Taxonomy provides a conceptual framework for discussion, analysis, or information retrieval.”

Vignette uses taxonomic structures for organizing where content is stored and managed, how it is navigated on websites, and  how it is categorized or classified for a variety of uses.   Other content platforms leverage taxonomies.  The facets in a faceted search experience are another example of a taxonomy.

A taxonomy is a powerful model for organizing information.  In the vast sea of information found inside every enterprise, the correct use of taxonomies can make content findable and ultimately help you, your employees, partners, and customers become more productive.

Interested in learning more?  Avalon and our partner PPC are sponsoring a free webinar series that will help you Cultivate Content Management Success through Planned, Managed, and Implemented Taxonomies. For more information and to register, click here.