SEO Canonicalisation Issues – How to Spot Them
The subject of canonicalization is literal too big a topic to cover in just one article and as such I will avoid the topic of parameters, sessions ID's and other whole scale duplicate content issues that each would require an article on their own and instead I will focus on some less obvious canonicalization issues that sometimes are tougher to spot.
Canonicalisation is basically the process of determining an authoritative page where two pages exist on two different URLs. With some form of canonicalization, these pages could both be indexed by a major search engine and end up competitiveness against each other for a given query that is targeted to each particular page. One example of a canonicalization issues is the trailing slash. Webmasters may opt to place a trailing slash at the end of their URLs, or this may be pre-determined by their particular CMS. However, this URL:
Another duplicate content issue is that of the non-WWW version of a page competing against the WWW version of a page. Both versions of this URL can be indexed and thus all of the pages on your site could potentially be competitive against one another. Once a search engine crawler accesses your website through a non-WWW version of your URL it may continue to crawl your webpage following this pattern ie it may crawl every page on your site via the non-WWW versions of your URLs. Two further examples of canonicalisation problems include the index version of your webpage eg index.html or index.php, and the https: or secure pages on your website.
All of these issues require tailor solutions. In some instances you can merely set up a 301 redirect to redirect any authority contained in one page onto another. In some instances you can disallow certain pages (eg https) via robots.txt. In other instances, you may need to pursue a different strategy altogether eg the canonical tag or no index tag, or parameter handling via Google Webmaster Tools.