Sunday, April 29, 2007

Building & Implementing a Specialized CMS

I have been involved in building a CMS for the telecom and web content industry. I admit, it hasn't been easy. The key to designing a content management system (CMS) is designing the content meta data and taxonomy. The next most important aspect to keep in mind is to design the system for optimal content re-use. And to do a good job of this, I quickly realized one must first understand the industry for which it is intended!

Choosing a CMS:
An off the shelf CMS that exactly matches your needs? I don't think there is any such thing. Unless the requirement is straightforward, a CMS implementation will need to be customized to some extent. If things are still not working out, then build your own!

Here's my impression on some of the CMS products I briefly checked out.

    CRX (by DAY)
    JSR 170 level 2 enabled which is why probably the UI seemed complex with very little abstractions provided. To be of use in the field, it needs an UI to be built.

    Apache JackRabbit
    Reference implementation of JSR 170. It is an underlying engine with a set of APIs. One needs to build an user interface over it to make it usable.

    Alfresco
    Appeared to be more of a document management system, oriented towards enterprise content management and collaboration. It is JSR 170 level 2 certified. Provides an online demo if you want to check it out. Check a review.

    Magnolia
    Though a JSR 170 compliant CMS (based on Apache JackRabbit), the interface is very basic and it is suitable for HTML based web sites only. It seems inadequate for any moderately complex CMS requirement.

    Vignette
    This seems to be a very comprehensive CMS package. We did contact them for more information/evaluation. Check a review from CMSWatch here.

    Volantis
    Volantis solution is customized for the mobile industry and it does have good foothold there. Though it can store a few contents for small sites, it is NOT a CMS.

    OpenSesame
    OpenSesame is a RDF storage and retrieval engine based on Java. RDF has the potential for becoming the base for next generation content management applications.

    The good:
    1. Can maintain complex relationships
      1. schemas
      2. properties as triples
      3. anything can reference anything
    2. Flexible query to retrieve contents
      1. n levels of depth
      2. inferencing
    3. Standard way of representing content

    Difficulties:
    1. Not enough stable open source tools
    2. Query language complex
    3. Lot of rework

    OpenCMS 6.2
    In our case, we decided to build our CMS on Opencms. OpenCMS had quite a few paradigms already built in - authentication, access control, export and import of data, a templating engine, search through lucene and a useful user interface to wrap it all together. It gave us quite a head start and being extremely flexible, we could do almost anything that we wanted to change or add. Though it has an embedded workflow engine, it was very basic. We plugged in osworkflow into it.


Now comes the hard part of designing the content types and the toil in migrating content into it! Some learnings over the time:

Content Reuse:
Design for reuse. Reuse will increase consistency, reduce maintenance cost, will let you rapidly re-categorize or change taxonomy if required. If content needs translation, it will reduce translation effort. But overdoing resuse will increase complexity and will make it difficult to manage content. Remember, content may need different attributes in different contexts!

Taxonomy:
The hierarchy used to organize content. It is very difficult to decide this. I often felt the need to categorize it into multiple taxonomy. Or implement a search based system. I'm sure if we fix up a taxonomy, we may need to revisit and reclassify once in a while as new classification criteria may crop in or the importance of criteria may change! The lesson learnt is don't try to get the
perfect taxonomy, but prepare your applications to take in changes in taxonomy easily.

Metadata:
Metadata describes a content. Build in a set of predefined metadata names instead of leaving it loose and letting content authors to define metadata vocabulary. Otherwise you'll end up with too many terms not fitting all contents because different authors very likely will come up with different terms for similar attributes at different times.


Of course, after that, building the rest of the pieces for content delivery, tracking, billing and reporting.

2 comments:

Pankaj said...

hey gr8 post..although you could've been a bit more explanatory..for eg what is the JSR 170 level?

Tanmay said...

@Pankaj Thanks for your comments.

The JSR 170 is a standards specification document for CMS APIs in Java developed through the regular Java Community Process (JCP). A search on Google gives many links about the JCP process, the various different JSR specifications available and the specification API document for JSR 170.

I probably should have mentioned this info earlier on in the post, but here it is anyway!