SDL Tridion and JCR: A marriage made on Java

I have been quiet on my blog for sometime which I never like because I often feel the urge to type and to rant. However, I have been channeling my mental efforts into looking at the combination of SDL Tridion R5.3 and a JCR repository.

The Content Repository API for Java (JSR-170) is a Java API specification that allows for uniform access to content repositories. Content Management Systems incorporate JCRs to store their content and metadata and a number of vendors are using a JCR as part of their offering. So, I have decided to look at how SDL Tridion could connect to such a repository.

Looking at the architecture and the most likely place where you would integrate a JCR, I looked at the Content Delivery side of SDL Tridion R5 (R5.3 SP1 to be exact). R5 uses a distributed content delivery model –where the CMS and Websites are typically on separate systems – and this is a typical place where customers integrate other 3rd party products in to the Content Delivery environment. Typically these integrations include things like search but rarely how content is stored. R5 has two basic options for storing content; a database or a file system and these cover pretty much any requirement you could have. The Broker is the layer which abstracts this from the presentation layer (typically a website). In addition to being able to present content to a webpage, the Broker is also responsible for storing the content in the Content Data Store which is normally a database or file system.

At the bare facts the Broker is a Java API and as such we can modify for our own means. Typically you would extend the functionality but you can also replace functionality the existing with your own. In this case I intend to replace the storage mechanism to use a JCR instead of the standard file system or database. To make a complete change, I would have to create 10-15 extensions to extend all the functionality of Content Delivery, in this case I will not do that but I will simply create one extension.

So why would you want to do this? There are three basic use cases for this:

You have chosen a JCR as a content repository for your website, maybe inherited from another CMS
You are publishing content to a single JCR from multiple CMS systems
You are an SDL Tridion Consultant with too much private time on your hands

To implement my new storage classes for the Broker I am going to publish blog posts from my R5 environment to a JCR. The aim of this, is to prove this in as simple a way as possible. I am not an expert Java programmer nor am I an expert in the JCR API or the JCR implementation which I will use and as such I will trample in the face of best practices and laugh in the direction of standards. But seriously, the perfect implementation is not the point, making it work so that someone else can take the experience forward to something better is the idea.

The Content Management part

In R5 all content is defined by a Schema. A blog post schema would normally include content fields like title, body, date & time and then also metadata fields such as author, trackbacks, categories, keywords etc. My schema will include just two content fields, Title and Body. From this I want to publish a XML Dynamic Component Template.

So my schema looks as follows:

So I have a simple schema and I can now create some components against these schemas. I created four components that I will test with and just to be neat and tidy I put them in their own folder:

Any time you include content Components on some sort of presentation (static or dynamic page) you will need to combine the Component with a Component Template (CT) to create what is known as a Component Presentation (CP). The CP is a combination of the template and the content and more than one CT can be used to create different CPs from the same content. A typical usage would be on a news site where the component, our news item, would be represented as both a front page teaser and the full article. Both the front page teaser and the main article could have a different CT but the same Component.

My CT will just output some XML:

<?xml version="1.0"?>
<blogpost>
<title>First Post!</title>
<body>Welcome to my blog, I hope I can write lots of interesting things</body>
</blogpost>

And my CT code will look like:

<?xml version="1.0"?>
<blogpost>
     <title>[% WriteOut Component.Fields(1).value(1) %]</title>
     <body>[% WriteOut Component.Fields(2).value(1) %]</body>
</blogpost>

In this code I have paid no attention to how you would really work with XML so if I were to put, for example, un-encoded HTML characters in my content (e.g. &) then the whole thing will go belly up, so normally a template would be a little more robustly coded.

So now I can publish XML to my website as a Dynamic Component Presentation (DCP) and it is currently residing on the file system (the default option).The resulting XML DCP can now be used on my website.

Where to put the content?

I already indicated that I am going to publish the content to a JCR. The challenge and interest here was that I know nothing about how to use a JCR. Knowing that Day has their freely available JCR implementation, CRX, I downloaded it with the documentation. The documentation is pretty good and some time with a Diet Coke and a print out of the setup and developers guides meant I already had a good idea of what to do.

Installation was really easy and within a few minutes (plus some time to get a license key) it was up and running. With CRX you get management tools that enable you to browse and administer the repository so I set about defining the Node Types that I need to define my content. An explanation on how you define Node Types and their properties is best left to the documentation suffice to say that I will need a BlogPost Node Type to be able to store any content in the JCR.

Now, if I were to implement full storage for content from R5 then maybe I would need to model the Broker storage model in the JCR. Looking at this, this would be possible. The fully dynamic Broker is a relational database (file system is not as flexible as a database) and as such I can imagine that this would be possible. For the time being though, we are going to store some content in the repository which means I will create a Node Type that is similar to my Schema. In there I will place my content.

For organization purposes I created a BlogPosts Node in the JCR and it is within that Node that I will save my content. My Node Type, called BlogPost, is defined with the fields:

Title
Body
TCMURI
XML

Once I have done that I can then create multiple BlogPost nodes in my node BlogPosts.

Making them work together

For this I am going to make a custom storage binding. Effectively I am going to rewrite how _all_ XML DCPs are stored. Once content is published, the Deployer will deploy the content and in the process request the Broker to store the content. My class will be the one called to store the content.
To do this, my class will implement XMLComponentPresentationHome and as such will implement the following methods:

Create
Remove
Update
GetComponentPresentation

Hopefully these are all clear as to what they do, however, it should be noted that GetComponentPresentation is needed even when you do not plan to get DCPs from the JCR via the Broker – i.e. you are retrieving them via another method. The method is needed to decide whether or not you need to create a new Node or update an already published one. Without implementing this method fully you will always create a new component.

Storing my DCP

Opening a session to the JCR is the first thing I need to do. This is done on the first call to the class and is performed over RMI. RMI was the most logical choice for me and it did not take much to get CRX to work with RMI. However, you can choose from JNDI, HTTP or WebDAV rather than RMI if you wish.

To do this we create a repository object:

System.setProperty("java.rmi.server.useCodebaseOnly", "true");
ClientRepositoryFactory factory = new ClientRepositoryFactory();
Repository repository = null;
repository = (Repository) factory.getRepository(url);

SimpleCredentials creds = new SimpleCredentials("admin", "admin".toCharArray());
session = repository.login(creds, "crx.default");

Get the blogPosts node where we will store our data:

root = session.getRootNode();
blogPosts = root.getNode("BlogPosts");

From there I can store my content. I will need to load my content as a DOM, get the values and create a new node from those values.

OK, get the XML content and extract the values:

content = cp.getContent();
Document doc = stringToDom(content);
doc.getDocumentElement().normalize();
nodeList = doc.getElementsByTagName("title");
title = nodeList.item(0).getTextContent();
nodeList = doc.getElementsByTagName("body");
body = nodeList.item(0).getTextContent();

Then add the new node:

javax.jcr.Node blogpost = blogPosts.addNode(title, "BlogPost");
blogpost.setProperty("title", title);
blogpost.setProperty("body", body);
blogpost.setProperty("tcmuri", "tcm:" + publicationId + "-" + componentId);
blogpost.setProperty("xml", content);
session.save();

Now you may notice I am storing two other values with this content. I am storing “tcmuri” and a field called “xml”. The TCM URI is the unique URI of every item in SDL Tridion. I will need this later to work out whether or not a specific piece of content already exists in the JCR. I chose not to store the node with this as the title, to keep the JCR repository human friendly. The XML field is a copy of the complete XML content which I have stored in this example so that my GetComponentPresentation method works without having to parse the data back into an XML string so I can later parse it with an XSLT on my website. I therefore have both options when using the data again.

Now that content goes into my JCR, I need to see if I can update or remove it. The GetComponentPresentation method will need to be implemented. To get a DCP I will need to find it in the JCR and return a CP object.

OK, so open a workspace and a query manager and then use an XPath to find my item based upon the TCM URI that I am wanting up get:

Workspace workspace = session.getWorkspace();
QueryManager qm = workspace.getQueryManager();
Query query = qm.createQuery("//BlogPosts/*[@tcmuri='tcm:" + publicationId + "-" + componentId + "']", Query.XPATH);
QueryResult queryResult = query.execute();

Then, select the first node (I can only ever have one published) and return a CP object:

NodeIterator nodes = queryResult.getNodes();
Node n = nodes.nextNode();
Node blogpost = blogPosts.getNode(n.getName().toString());
Property content = blogpost.getProperty("xml");
return new XMLComponentPresentationImpl(publicationId, componentId, componentTemplateId, content.getValue().toString());

Now it can be decided whether or not my DCP already exists in the JCR and if it does exist I update it in more or less the same way as I did when I created it, only I start with an existing node.

To remove, I once again retrieve the existing node and then remove it:

QueryResult queryResult = findJcrComponent(publicationId, componentId);
NodeIterator nodes = queryResult.getNodes();
Node n = nodes.nextNode();
Node blogpost = blogPosts.getNode(n.getName().toString());
blogpost.remove();

I can now publish, re-publish and un-publish my XML content from the database. It would only be one small step further to use the content on my website.

Conclusion

It is clear I have implemented a far from perfect integration; I just built something to see what would happen. It was never really going to be a problem to integrate SDL Tridion and a JCR; it was just something that has never been done. Both have the correct integration points to make any integration possible so it is certainly possible to do pretty much anything you can imagine.

Whilst doing this it became clear that under certain circumstances you should either extend or override the functionality of the Broker. These cases would be general to any extension of deployment of SDL Tridion’s Broker storage functionality and not just to the JCR

Case 1: Publishing all XML DCPs to the JCR

This means over riding the existing Broker functionality to move every XML DCP to the JCR. The issue I have most here is that I did not solve the problem that multiple types of XML DCPs have different XML structures and if you would like use the values from the XML (rather than storing the entire XML) then you will need to work out how to scale the parsing of the XML so that new XML structures are able to be stored in the JCR. Of course, storing the entire XML string will mean that you do not need to do anything special but you will may also want to consider storing any associated metadata too.

Case 2: Publishing some XML DCPs to the JCR and others in the standard Broker storage

Here you would extend the storage rather than replace. Catch those you want to place into the JCR and then allow the parent classes to continue processing all the others as normal. Again you might need to make the processing of the XML scalable but there is a reduced need to do so. Typically, if you are publishing only some DCPs to the JCR then you know which DCPs and therefore it can be a little more hardcoded.

Case 3: Storing the XML DCPs in both the standard Broker storage and in the JCR

If you want to store in two places, then extend the Content Deployer. It is one stage before the Broker and allows you to deploy the DCPs into an additional location. The code would be the same, just in a different class. Processing the XML also has the same challenges as before.

Case 4: Replacing the Content Data Store with the JCR

I would like to see this, even if just for fun. However, the task is not necessarily so easy. The Broker data model is not massive but there are a large number of classes to override and develop on so it would take a reasonable amount of time to deliver and I am not sure on the overall benefits to fit an existing proprietary data model into a JCR.

In addition to the above I learnt that I can still program Java, it was a bit of a shock that it all went so easily but I was helped greatly by the good documentation from both SDL Tridion and Day. I am sure there are also better ways to do this but this one worked for this experiment.

I think publishing content to a JCR makes allot of sense. Whether or not it makes real sense to move the entire Broker data model over to a JCR is up for debate. Why use a JCR when a database works just as well? Reality says you will probably only publish to a JCR when you are sharing the JCR with another application and therefore the Broker data model is not applicable as you will need to fit into an existing or common structure. I would like to see a project to create such an implementation for the community, maybe, with this post I encourage others to help with this. If you want to, then contact me via my contact page.

On a final thought, I would like to hear your opinion on the merits of this experiment and maybe other use cases that I have not thought of. Feel free to post comments in the provided space below.

Download the sources

SDL Tridion to JCR example

6 comments / Add your comment below

Adriaan Bloem says:

2 May 2009 at 12:26 pm

That’s so… pointless (I think your use case #3 is probably the most likely). But cool 🙂

Gertjan Assies says:

2 May 2009 at 3:14 pm

Nice Article Jules, JCR has been on my list for a while now to see what we can do there, thanks for this

Frank van Puffelen says:

3 May 2009 at 4:42 pm

Great experiment Jules. I’d love to hear from the first person that actually does something like this because they need it, not because it can be done.

It sounds to me like mapping Tridion schema’s to CRX node types is the biggest challenge (after making the initial connection that is). How would you go about keeping them in sync?

Jules says:

4 May 2009 at 9:16 am

@Frank, looking at the APIs briefly, this would depend on the version of the API that is used. JSR-283 seems to have the appropriate methods to manage them programatically should you need to: http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/api/jsr283/nodetype/package-summary.html.

But of course you would only need to do this if Tridion was the Master of the schema, if the JCR was shared, you would have to manage schema changes in a more controlled way.

Michael Marth says:

4 May 2009 at 11:21 am

Jules,

very cool experiment. I can think of some use cases for this, e.g. to be able to run Apache Sling applications on top of Tridion-managed content or to be able to use Jackrabbit’s Webdav and upcoming CMIS capabilities.

Re the issue of schema mapping: you are right that jsr-283 will make it possible to change node types programmatically. In the meantime one could also consider:

– if your content is XML you could store it in the JCR as node type XML directly (try uploading an XML-document into CRX via Webdav, it will get exploded automatically)

– it is surprisingly often sufficient to *not* use node types, but just store all content in nt:unstructured nodes. The structure is just defined by the node hierarchy (and possibly some properties that act as “identifiers”). This approach looks a bit alien to relational-model people (where one cannot do anything before defining tables), but works quite well in my experience. For example the discussion forums on dev.day.com are structured solely through hierarchy.

Cheers
Michael

Jules says:

4 May 2009 at 3:47 pm

@ Michael, thanks for the comment 🙂

Certainly the automatic exploding of XML means that the schema/structure changes would be avoided as you do not need to parse any XML when deploying. I would like to know more about that so I guess I will open the docs again 🙂

Unstructured storage seems very flexible and can mimic taxonomy based structures that we commonly come across.

6 comments / Add your comment below

Leave a Reply to Gertjan Assies Cancel reply