CMS architecture: Part 1 – Still Breathing

I’ve been doing a lot of thinking about the architecture of content management systems (CMS) recently. Little wonder, that’s my full-time focus at the moment. By “architecture” I mean pretty much everything to do with the planning and development of a CMS. This blog post is the first in a series that explores some of the elements to think about if you’re going to create a CMS from scratch.

This is unashamedly going to be at an advanced level – I’m not talking about a simple system just to keep a few pages updated. I’ll try to keep as technology-agnostic as possible, but I will be coding at least part of this system to ensure what I say is technically feasible.

The areas I’m going to tackle, in no particular order, and almost certainly incomplete, are:

Data storage

Any serious CMS need a database, but is a relational database (MySQL, Postgresql, SQL Server) a better choice than a NoSQL database? What about extensibility, making complex queries possible for reporting purposes, performance, versioning? How about scalability and data security?
System security

Unless you want everyone to be able to do everything you need to be able to secure aspects of the system. So you need user accounts with authentication mechanisms. Securing individual parts of the systems (particular modules, or specific related data) needs to be possible, and what about SSL? There’s also the question of authenticating 3rd party systems, for example users of APIs.
Extensibility

WordPress, which I love, has a fantastic API which enables developers to write plugins for almost every conceivable use. Plugins are cool, and the hook and filters that power them are a must. But what about cutting a little deeper than that; allowing entire subsystems and modules to be swapped out? What about an API?
Output

Obviously a CMS will have some form of HTML output. But how do you architect the system so the sweet spot between allowing front-end developers a large degree of control over the HTML and the system producing what it needs to run? How about themes and templates? Repurposing content is going to be come increasingly important, and so how do you handle microformats and data schemas? What about alternative outputs: PDF, XML, JSON etc? Then there’s the tiny matter of internationalisation.
Assets

Assets are a big part of any CMS. Storing files securely is just one aspect of this, but how do you handle versioning and repurposing of assets (PDFs also available in Word and ODF, for example). And with images getting more complex with high-DPI displays, how do you handle resizing imagery?
Performance and scalability

Caching is key, but what do you do when you grow from 1 server to 10, to 100, to 1000?

I don’t pretend to have all the answers to this stuff, it’s just an area that interests me and I want to explore. If I end up with an experimental CMS at the end of this that handles a few of these thorny issues then I’ll be a slightly better developer than I am now. And even if I don’t I’ll still have done some serious thinking about these issues.

Data storage

System security

Extensibility

Output

Assets

Performance and scalability