Enfinity URL rewrite mechanism pitfalls

03.08.2012.

Due to the long history of its architecture, Intershop has traditionally encoded lot of additional information in its URL – items like server group, currency, locale, site and pipeline entry point. This led to hyperlinks that were very long and unsuitable for publishing on the advertising material (such as leaflets) or shown to users on commercials. After all, it is far better to use URL like http://mywebshop.com/iphone than a long sausage like http://mywebshop.com/is-bin/INTERSHOP.enfinity/SRV/MyStore-Site/en_EN/-/USD/ProductDetails-View?uuid=9afd1478-1cc8-4a8b-b56d-a368a7cce937.

As the importance of Internet search increased, and especially due to the popularity of Google search engine and its PageRankalgorithm, one of the recommended practices for Search Engine Optimization (SEO) was to generate short, descriptive links that can be more easily parsed by search engine crawlers. To solve that issue, Enfinity Suite (starting with version 6.2) offered an improved mechanism for URL rewriting which makes it possible to rewrite the URLs and make them “search-engine friendly” and more informative.

Since Intershop supports more than one channel (site) on the same infrastructure, it is recommended that the rewrite rules are planned and developer together with site templates and business logic. Once the site is in production, rule tweaking should be avoided as much as possible, as wrong rules could make the certain areas of your site unreachable and break the storefront functionality.

The main goal of this article is to indicate certain pitfalls and mechanism restrictions that developers should have in mind while defining rewrite rules.

URL Rewrite Mechanism

URL rewrite mechanism in Enfinity Suite provides two general features:

  1. Descriptive URL Rendering – transforms standard URLs into custom URLs which are sent to the client. Those URLs are typically shorter, more meaningful and without parameters specified and are therefore usually called descriptive URLs
  2. Descriptive URL Expansion – transforms descriptive URLs sent by a client back into standard Enfinity URLs so that Enfinity system could address the correct processing pipeline.

As part of the standard API, Intershop provides a URL rewrite handler which makes it possible to define or optimize descriptive URLs via predefined pattern-matching and text processing rules without a need for custom Java code. (If desired, it is possible to extent default mechanism with custom rewriter, but this is not in scope for this blog).

Both rendering and expansion are characteristically done in a number of steps.  When writing descriptive URLs, usually as a result of url() call in ISML, once domain has been checked a rewriting mechanism calls the compact() method  which renders the descriptive URL based on a provided set of rules. In order for compact() method to finish with success the descriptive URL has to be unique and should allow to be transformed back into standard Enfinity URL form. Otherwise, a fallback URL rendering will be called and descriptive URL will not be shortened.

When descriptive URLs (rendered as described in the steps above) are being accessed by HTTP agents (browsers or crawlers) Intershop has to expand the descriptive URLs into a standard form to be able to produce the appropriate response. Since incoming descriptive URL received by the web adapter is not of standard (long) format, web adapter will not recognize it and will forward it to the pre-configured SLDSystem/URLMapping-Resolve pipeline which will invoke expand()  methods of all configured URLRewriteHandler implementations. The first implementation that recognizes provided URL will expand it. Since all received URLs will be processed, in multi-site environment, it is very important that the set of rules provided matches only the domain of the site on which the rules have been defined on.

Rule-Based Rewriting

In addition to standard URLRewriteHandler, Enfinity Suite also provides the RewriteRuleHandler which renders and expands URLs based on a given set of rules. It retrieves the rules from a plain text configuration file named urlrewrite.properties. When enabled in site settings, this handler rewrites all the incoming and outgoing URLs as defined in the properties file. The rules are typically a combination of regular expressions and placeholders that define a structure of URLs to be matched (standard to be rendered and descriptive to be expanded) and also a form into which URLs should be rendered/expanded. Based on a rule type, it is possible to choose whether to match only the patch component of the descriptive URLs or the whole URL (typically used when infrastructure hosts more than one sales site).

Rule-based rewriting restrictions and pitfalls

E-commerce channels based on Intershop usually contain numerous links and have a lot of publicly accessible pages, so defining URL rewrite rules usually takes a lot of planning and testing, since wrong URLs can easily break one or more sites.

There are various limitations and pitfalls in rewriting mechanism. Knowing them in advance can shorten the time spent in your implementation and debugging. First of all, there are certain documented (but usually forgotten) limitations in Enfinity Suite URL rewriting mechanism:

  • Rewriting is not supported for HTTPS protocol. This is usually not a problem since secure areas should not be indexed at all since they are used for payment or personal data entry.
  • URLs that trigger POST request SHOULD NOT be rewritten. URL Rewrite mechanism doesn’t know how to work with POST parameters and is not meant to be used with POST requests in the first place. That means that even though your URL that triggers a POST request will be correctly rewritten in its standard form, all the parameters sent with the request will be lost during the URL expansion and the generated standard URL will be contain the parameters, which can have a significant impact on site functionality. One possible solution is to use URLs that trigger GET requests instead, which encodes the parameters in the URL. For example, ISML forms should be written like this:              
<form class="search_form" method="get" action="#URL(Action('ViewStandardCatalog-Browse'))#" name="phonesearch">
  • URLs in <ISINCLUDE> ISML tag should never be rewritten as it will break the application logic (those URLs are used internally by the application)
  • Reserved and escaped characters should be avoided in rule definitions and URL data that is going to be rewritten. Even though the escaping mechanism provides the basic escaping, there is no guarantee that it will work correctly in every internal layer and all external systems.

Beside those restrictions, it is a good recommendation to clean the code before starting with the implementation and testing of rewrite rules. It is much easier to implement a set of rules if the dictionary data names and aliases are unified, variable names are meaningful and if you don’t use excess and useless query parameters.

Also note that for every rewrite rule it is possible to enable debug mode separately. It will output to the console/logs the process of rendering and expanding every URL that is processed by the rule being debugged.

Another important thing to be careful about is cross-site conflicts between channels hosted on an Intershop installation. Such cross-site problems arise from the fact that while shortening URLs knows what rules to process (since it happens in specific site context), URL expansion cannot be bound to a site context. Instead, RewriteRuleHandler has to  examine every URL expansion rule. Also, already mentioned urlrewrite.properties file is defined on a cartridge level, which means that every cartridge can have its own set of rewrite rules (and when cartridges are assigned to site that gives every site its own set of rules). In case where channels are deployed in multichannel environment with URL rewriting enabled on more than one site, you should be careful not to pick up URLs from other sites. For example, we encountered the following rule (amongst others) in one of deployed sites:

rule.overview_xx.shortPathMatch = ^[^.]$ 

This rule matches every URL that ends with “/”, with any hostname. Since it matches URLs without matching their hostnames (and a lot of URLs end with “/”), most of the URLs on our site were matched and processed by the rewriting rules defined on a completely different site. That made quite a mess since users were directed to a completely wrong page wrong site when we turned on the URL rewriting on the affected site. Even though the problem seems trivial, it can take a lot of time to figure it out, especially when looked over from the global site perspective and not solely URL rewriting.

The solution to this problem is to use attribute shortURLMatch instead of shortPathMatch. This way a whole URL including hostname can be defined, which reduces the risk of matching URLs from a different sites to minimum. For example, a rule:

rule.overview_xx.shortURLMatch = ^http://[^\\.]*shop.xyz.net(:[0-9]+)?/$

will always match URLs whose hostname ends with “shop.xyz.net”. Unless there is a site on the same installation with the very same suffix, there is no possibility that it will ever be caught by your own rewriting mechanism.

Beside conflicts between sites it is also possible to have conflicting/overlapping rules on the same site, in the same set of rules. All the rules in urlrewrite.properties file are ordered alphabetically, which means rule.ab.select will be processed before rule.cd.select. To avoid conflicts, a good practice is to use numbers in rule naming (as mentioned before in this blog entry). For example, by naming rules like:

rule.10catalog.select
rule.20catalog.select

it is possible to ensure that first rule will be processed before the second one. Another technique is to place more specific rules before more general rules, so they don’t match the URL before it even gets to more specific rules. The most general rules should be used as a fallback in case no other rule is matched (usually URLs with no parameters).

Except using rewrite rule engine supplied by Intershop, for more specific URL rewriting it is worth considering writing your own URL rewrite handler. With careful planning and testing, existing rule based URL rewriter has been sufficient for the requirements that we have encountered.