In the next of my video series I am going to cover the basic publishing flow from Content Management to Content Delivery. Last time we looked at connecting the server types and this time we are diving into what flows within.
Recently, Chris Morgan of Building Blocks (a UK SDL partner), penned an article about scaling out the deployer. It is a good article and it is good to see more scaled publishing scenarios being implemented with customers.
In SDL Tridion 2009 particularly and 2011 less so, the deployer can be a bottleneck when you are trying to publish large volumes of content to your websites. Since the outset of SDL Tridion, customers have been growing in volume demand for updates to their websites. Moreover, the timeliness of the updates is become more and more important. The major difference between 2009 and 2011 is that 2011 is multi-threading, meaning that a single deployer can deploy more than one item at a time. Does not sound like much, but the multi-threaded change allows an ever larger amount of content to be published. However, it does need to be setup well and more over tested. If you want the best out of your publishing you must test the setup well.
Typically, when you ran up against the deployer bottleneck in 2009 you implemented multiple deployers. This you can still do in 2011 to get yet more throughput (because you add multi-threaded, multi-deployers). However, there are various things you need to take into account in order to ensure that you don’t run into problems.
If you read Chris’ article you get the idea of how you do this. However, I must make some corrections. The overall setup is fine but as Chris notes you can run into race conditions of multiple deployers trying to update the same content at the same time. For this reason, the setup described is not supported; you will get failures. However, there is hope! To avoid the race conditions you need to modify the setup as described in the blog post.
What is described in the post is that you have multiple deployers that are deploying all content from all publications. Instead you must have multiple deployers that are in their own right dedicated to publishing content of given publications. For instance, if you have 10 deployers you must split the publications you are publishing over each of the ten and not have any duplication. For example:
This configuration you do in your publishing targets on the Content Manager and for simplicity sake each of your deployers can be configured 100% the same.
The general view of publishing that most of us see is the publishing queue, a long list of jobs that get processed and change between such statuses as “waiting for publish”, “In Progress” and, if you are unlucky “Failed”. However, there is allot of additional information lurking under the hood that could be considered pretty useful.
There are allot of use cases that might be applicable to you using the information stored with all publishing jobs so it might be worth while picking up the documentation and taking a look. Between 2009 and 2011 .NET APIs there have been allot of improvements but the main one is that in 2011 you can access all data from API calls rather than loading up the publish transaction XML and surgically picking out what you want to know.
One use case that I know the best here is measuring the performance of the content being rendered. For a customer, we wanted to know how quickly everything was being rendered on a per item basis. We also wanted to gather data for longer term analysis (e.g. are we improving the overall performance on a day to day basis). To do this, we extract the data from the queue for a given day in a pipe separated format for import into excel. Overtime we have built a very complete picture of the growth and performance of publishing.
Now before I dive into code I have to declare that I am not a programmer, I am a Technical Account Manager which was likened last week to being retired from being a consultant. So my skills are not as good as some people I could mention (in fact those that implied that I was retired from useful things). However, it works! And for this that is the most important thing.
So, to look into the queue items you need to do the following steps:
In detail this looks something like…
We need to start a session, get the list, get the XML document and then the nodes. This is the only XML related thing you have to do which is the plus over the 2009 API.
Session TridionSession = new Session(RemoteUser);
XmlElement QueueTransactionElement = TridionSession.GetList(typeof(PublishTransaction), QueueFilter);
XmlDocument QueueItems = QueueTransactionElement.OwnerDocument;
QueueNodes = QueueItems.SelectNodes(“tcm:ListPublishTransactions/tcm:Item”, GetNamespace(QueueItems.NameTable));
Lastly we loop the QueueNodes in a nice For loop.
To get the transactions we need the TCM URI of the publish transaction (job) and then get the transaction object:
transactionId = tpq.QueueNodes.Item(i).Attributes[“ID”].Value;
PublishTransaction publishTransaction = (PublishTransaction) TridionSession.GetObject(transactionId);
So now we can see what we can get from a transaction. Let’s start with the basics and let’s assume we are just looking at successful publishing jobs.
So starting with some general details:
The Transaction ID: publishTransaction.Id
The ID of item being published: publishTransaction.Items.First().Id – This reveals the development path of the API, this is only ever one item but still there is an collection of items.
The Item Type being published: publishTransaction.Items.First().Id.ItemType
The title of the item being published: publishTransaction.Items.First().Title.ToString()
The priority: publishTransaction.Priority.ToString()
The purpose: publishTransaction.Instruction.ResolveInstruction.Purpose.ToString() – This is either publish or unpublish. According to the documentation, it should have a “re-publish” state but I can’t see to get this to work
Who published: publishTransaction.Creator.Title.ToString()
dateTransactionStart = publishTransaction.Instruction.StartAt;
dateLastStatusChange = publishTransaction.StateChangeDateTime;
tsDuration = dateLastStatusChange – dateTransactionStart;
The tsDuration is now how long our job took to complete from start (the time it went into the queue) to the end (the time it’s status was changed to “success”). If you submitted allot of jobs at once, then for some this time would be long because it includes queuing time.
The job itself
Within the transaction is the context. The context is holding the actual job itself; so for instance it contains details on the items the job resolved to.
To get the context is available as publishTransaction.PublishContexts
We can then…
Get the count of the processed items: transactionContext.ProcessedItems.Count
The publication: transactionContext.Publication.Title.ToString()
The Publication Target name: transactionContext.PublicationTarget.Title.ToString()
The processed items
Then within the context we have processed items which we can loop around and get yet more details:
The processed item id: processedItem.ResolvedItem.Item.Id
The time it took to render: processedItem.RenderTime.Milliseconds
The template id it was rendered against (if applicable): processedItem.ResolvedItem.Template.Id
The item type of the processed item: processedItem.ResolvedItem.Item.Id.ItemType.ToString()
We can of course do things like add all the render times up and make some more numbers and if we subtract it from our duration I mentioned higher up, we can get an estimate on how much time was take to deploy (everything else but rendering).
As you can see there is a wealth of information in the publishing transaction data and this was just the detail I needed for my purposes. I suspect there is allot more in there and playing around with the API is somewhat like Digital Archeology. To help you out I’ve added the scripts I use for measuring publishing on SDL Tridion 2011 SP1 which you can download, play with and even use to collect your readymade statistics!
Download QueueView2011_v0.3. This is an alpha release and requires additional work to make it production ready.
This is a topic that is raised with me from time to time, mostly because of my connection with a large product company that I work with on a daily basis. A colleague prompted me to write this down properly, so how do you integrate a Product Information Management system, or PIM, with SDL Tridion.
It is fair to say that SDL Tridion, like most CMS systems, is not a PIM and should not be used as one. It is a Web Content Management System and as such its purpose is to allow any organization to create, manage and publish marketing content. PIMs hold a specific type of content which is product data, they vary in what they store but almost certainly the minimum that a PIM stores is product combination or SKU. The number of SKUs an organization has will depend on the amount and type of products. Simple products, e.g. cups tend to have fewer SKUs that say laptops; but in both cases it could amount to many thousands or many millions of possible product combinations. Add to this, SKUs change over time as products are updated or changed as manufacturing parts change or just new products are added.
When looking at integrating a PIM you need to have a careful look at the content stored and how that should be used. Additional content that could be stored in the PIM could be things like product description and this content will need to be looked at in detail. Is it needed on the website? I it translated? Is it localized? On your website you will want to present a combination of this product content and your marketing content; in different places on your website this content will vary in what the mixture is. But, overall you will want to show a uniform brand and content experience.
How you integrate this content together should be a matter of careful decisions and I will run through three simple scenarios and some basic pros and cons that will help guide that choice.
Importing Product Data into Tridion
Importing product data into Tridion should be seen as the bottom rung of choices for an integration. We assume that the PIM is the master of the product content, if we import this content there will be two copies of the same content so in this scenario we should really decide if we are going to do something with this imported content (e.g. Translation). If we are, this scenario might make sense but, if importing this PIM content into Tridion it only really makes sense if the number of products is low and the number of updates (the delta) is also low.
Integration at publish time
Rather than importing content, we can consider the approach of merging the product content with our marketing content when it is published. For this our marketing content must be tagged or reference in some way to related it to which product it belongs. The templates then render the combination of marketing content and product content together as the pages are published to the website. To relate content to product there are two choices; 1) a product taxonomy in Tridion either created by hand or imported from the PIM or 2) a manual entry of the product ID in the content metadata either completely by hand or helped along with a lookup calling out to the PIM. I personally prefer the taxonomy approach but the automated import of a complex hierarchy will have to be thought out well in advance.
Integration at runtime
The last major choice would be to integrate the marketing and product content at runtime. This requires use to have some application logic on our website and we’ll still need a way of linking the content we are looking at to the product; we could still use our product taxonomy idea but we could also use other ways such as URL. In essence, we do the same level of integration as we did at publish time only one stage later.
Did I miss some pros or cons or another scenario completely? Let me know in the space below!
A while back I wrote about how to fall in love with publishing in SDL Tridion. It’s true; you will fall in love with it when it is working well. In the article I describe the more end user aspect of publishing but typically there are allot of aspect to publishing that the end user cannot control but that are controlled by the IT or hosting organization.
So there I am, its 5 PM on a Friday and I am trying to get out that article that I have written in the afternoon. I am waiting for it to publish and it is taking longer than I want because I have better things to do. It’s Friday after all! I have to wait in that queue for my job to be processed after all the other jobs that were submitted before mine. But it is a queue, right? That’s what happens, I join the back and I wait until I get to the front for my turn. (At least that’s how we do it in the UK. Also, we mumble.)
So what determines how long the queue takes?
Certainly not the user! The user determines how many jobs are in the queue but not how long it takes to complete the jobs. The duration is mostly determined by a) the task, b) the templates, c) the servers and d) the configuration.
I am sure many of you have queued in the bank.( I am sure that is also why many of you turned to online banking.) There are 5 people in front of you but how long are you waiting? 5 minutes or 10 minutes? The answer is, you have no idea… it depends on what those people want from the bank. Some might want information, some cash and others might want a loan. Each person represents a complexity that will take x amount of minutes of handling by the servers (and yes and I expressly used the word “server” rather than “clerk”). Publishing Jobs submitted by the users are the same, in that the jobs vary in complexity and that cannot be determined by looking superficially at the job (or person) in question.
The complexity is determined in two ways. Firstly, what the user tried to publish and secondly, how the data model is constructed. Now I know I said the user does not determine the queue duration and now I contradicted myself. But, I don’t believe that it is the user’s job to determine whether or not something should be published they need to publish what they need to publish. However, it is important to note that some Tridion items, when published, can take more items along for the ride. A Structure Group, for example, has pages and nested Structure Groups which need to be published also.
The data model determines our relationships between items. So when I publish an item, additional items will be taken because they complete an item. This typically is a small number of items, but the data model could need attention if publishing a single item leads to excessive numbers of additional items.
When I get to the front of the queue at the bank, I am most likely going to be presented with some forms to fill in. Those legal documents that lets me get the money to buy a car or get a new credit card. How long it takes me to fill out those forms will determine how long it will be before I am finished. The smaller and simpler the form, the better! My publishing job will execute templates to create some sort of output (e.g. HTML, XML, Java or ASP.NET code). The templates take time to execute and many templates may have to be executed for one job. The larger and more complex these templates, the longer it will be before I see my publish job completed.
The speed the servers work at and the amount of simultaneous activities they can complete affects the overall speed of publishing a job. The servers must therefore be scaled to meet the load requirements of the environment. Much like the bank, the overall throughput effects the waiting time of any job, with a single server I will wait the longest, with multiple parallel servers the waiting time will be reduced. In most scaled environments the tendency will be to separate out publishing from other server functions (database, management interface), this dedication of a task means that the server can concentrate on the same repetitive task without being interrupted with other business and therefore improve the overall publishing throughput of this server.
The configuration options with SDL Tridion, allow you to manipulate how the queue is managed; it in essence all publishing jobs are equal but with Publishing Priorities and Filters some jobs are more equal than others. Priorities can be set by the end user at the time of publish or as a rule on a given Publishing Target. The priority (high, normal or low) allows the most important tasks to go first (or the least important tasks last) and works like any priority system would do.
Filtering adds an extra dimension to this and the overall way items are removed from the queue for publishing. Many banks have separate desks for different tasks. If you go to deposit some money you use a different desk to the desk where you get a loan. Filtering does the same thing, in that it allows us to specify certain servers to complete certain jobs depend upon its configuration. Filtering is possible by Publication (e.g. German Website), Publication Target (e.g. Live) or Priority (e.g. High) – or a combination of multiple filters with multiple values. So for certain areas of your business you could, for example, dedicate servers to complete just those tasks; so in times of lots of house buying, we have more servers on the loans desk and we divide our throughput unevenly across our customers.
4 actions to improve to improve publishing
I have not encountered a single organization yet who could not do with faster publishing. Even when you think it is as fast as it can be, there will still be room for improvement somewhere. In summary, I have four points you can act upon that can help you love publishing that little bit more…
It has been released and I decided to list out five things that I think are very important to the release of 2011. Mostly talk has been about the new Content Manager Explorer (and it’s cross browser functionality) but this is just the surface. Underneath there are many changes of which just five is a small snippet of what you can find in the box…
To meet the demands of a large scale enterprise the deployer is now much more scalable than before allowing organizations to constantly grow their environment to meet the demands placed upon it by a growing content organization. The scalable deployer allows multiple processes to simultaneously process deployments as well as updating the publisher on how much load they can handle to avoid overloading.
The new Content Delivery storage layer is based upon the Java Persistence API (JPA) and its concrete implementation, Hibernate. With this you are able to expand the single Content Delivery storage layer to encompass multiple different data sources (e.g. product information or user generated content) into one single layer.
Online Marketing Explorer
Drive Customer Impact with the new Online Marketing Explorer. Giving an overview of your marketing activities with a centralized model of campaigns, reporting and actionable insights.
Content Services is the new RESTful webservice on Content Delivery based upon the oData and oAuth standards. You now have your published content available through a webservice to any application, mobile app, affiliate, white label site, 3rd party the list goes on…
The new fully .NET event system is modular. Does not sound like much but you can add one or more separate event systems to the same Content Management environment. Each event system can work alone hooking into different CM activities or can work together as part of coordinated event driven activities.