Bulk replacement of URLs in Liferay web content

21.02.2013.

In a larger organizations portals can easily reach hundreds or thousands of web articles. These web articles will contain organization specific news, blogs, wiki entries and so on. In organizations they will frequently contain hyperlinks leading to various documents that can be scattered across different repositories, shared folders and other web sites inside the organization.

Unfortunately such hyperlinks are subjects to change, due to the various reasons (such as changing DNS names, moving to different port, enabling SSL, moving from the file system based access to HTTP based, reorganizing directory structure etc. The effect will be that many hyperlinks will become broken, thus limiting usability of such content. The usage example in a web article will look similar to this:

Click <a href=http://somelink…>here</a> to download the document.

Now, imagine that the port or the hostname of the target system has changed. You will now have to fix all urls which lead to the document original. Of course, it can be done manually if there are no too many occurrences of it. But what if you have hundreds or thousands of occurrences? In file-based web sites you could do a bulk search-replace script using tools like find and sed, but Liferay does not support such bulk search and replace mechanism. Therefore it is useful to give a look “under the hood” to see how are the web articles are stored in the Liferay database. Although it’s not recommended to modify Liferay database by yourself, it can be very powerful if you know what you’re doing.

If you examine Liferay database schema, you will notice that the information about web articles is stored in the JOURNALARTICLE table and that the web article content is stored in the CONTENT column. To solve the bulk replacement problem, the simple SQL can be used:

UPDATE journalarticle

SET content = regexp_replace(content, 

'oldURLPrefix','newURLPrefix');

It is recommended to perform such operations while the portal is offline, in order to avoid caching issues. Different database systems have different capabilities regarding regular expressions, but on most of them it is possible to construct similar commands.

So to conclude, although Liferay is a really powerful platform, most likely it will not always suit your needs. In such situations, it’s worthwhile to take a look at what’s under the hood, isn’t it?