Thursday 26 August 2010

How to Use the Canonical Tag

Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduce Duplicate Content Clutter

The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

Today, Google, Yahoo and Microsoft (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone. Webmasters rejoice! Worried about duplicate content on your site? Want to know what “canonical” means? Read on for more details.

Multiple URLs, one page

Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated…), the marketing department might want tracking codes added to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort through 1,000 URLs.

This can be a problem for a couple of reasons.

* Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.

* Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value the page could have if all 10 links pointed to a single URL.

Using the new canonical tag

Specify the canonical version using a tag in the head section of the page as follows:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

That’s it!

* You can only use the tag on pages within a single site (subdomains and subfolders are fine).
* You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

* Links to all URLs will be consolidated to the one specified as canonical.
* Search engines will consider this URL a “strong hint” as to the one to crawl and index.

Canonical URL best practices

The search engines use this as a hint, not as a directive, (Google calls it a “suggestion that we honor strongly”) but are more likely to use it if the URLs use best practices, such as:

* The content rendered for each URL is very similar or exact
* The canonical URL is the shortest version
* The URL uses easy to understand parameter patterns (such as using ? and %)

Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines.

For instance, this tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.

If tags conflict (such as pages point to each other as canonical, the URL specified as canonical redirects to a non-canonical version, or the page specified as canonical doesn’t exist), search engines will sort things out just as they do now, and will determine which URL they think is the best canonical version.

For more info visit
http://searchengineland.com/canonical-tag-16537
http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
How to Use the Canonical Tag

No comments:

Post a Comment