What exactly is a canonical URL and why would you need one?
It can be a tough job maintaining a website and all its content.
Especially when the site gets big. As the number of developers and content editors working on a site goes up so does the complexity.
Keeping track of all the technical and content changes is a full-time job.
Then people start linking to your site some with HTTP some without. Some with www some without.
In the end, different URL’s will show the same content.
For a busy SEO, the canonical URL is a useful tool to control which URL is the “preferred” URL.
Helping the search engines know which URL is the one they should be using.
Why use a Canonical Link?
URL’s are great but they do have one problem. It is easy to have many URL’s that point to the same page.
For example, these URL’s all point to the same page:
This means that the content that returns for this URL:
Is the same as this URL:
To a search engine like Google, these are two different pages even though they point to the same content.
Duplicate content can affect your search engine rankings. As Google is not sure which URL to show.
It is best practice to tell the search engines that these two pages are the same and to identify the original.
This is why we use a canonical link.
In 2009 Google, Yahoo and Microsoft said that they would support the canonical link. Allowing you to make it clear to the search engines which page is the original and preferred.
Let’s look at some ways you may get duplicate content.
Why would you have Duplicate Content?
Duplicate content can happen for many technical reasons. Part of technical SEO is to identify which page should be set as the “preferred” page with the canonical tag.
Here are some examples of technical issues that can cause duplication:
- As we have seen in the example above any URL parameters can make a duplicate page
- An error in a CMS can produce the same content on many URLs.
- HTTP to HTTPS can cause duplicate content if the HTTP site is not redirected to HTTPS.
- PDF or print versions of the site can cause duplication of content
- www and non-www URLs will point to the same content. For example,
Search Engines and Canonical URL
So what happens when there is no canonical link?
When Google sends its web crawler (Googlebot) to your site to read the content it will index all the pages it finds. If it discovers identical content then it will index all those pages.
When a user then searches on Google it will show the result that best matches the query. Splitting the search traffic across the URLs.
This reduces the effect of inbound links and link juice to the page.
So how do search engines use the canonical link?
When a search engine discovers a canonical link they will honor strongly the URL in their search results.
When not to use a Canonical Link
We mentioned earlier about having a canonical link when you are redirecting:
- HTTP to HTTPS
- www and non-www URLs
In both of these cases, it is better to set up a 301 redirection.
Choosing whether to redirect or to use a canonical link can be a tough decision.
As a rule, if you want to permanently redirect the user. For example, you want all your pages to be HTTPS then use a 301 redirect.
Yet, if you want both URLs to be available then use a canonical link.
If you needed another reason Google has said that they prefer 301 redirects.
One reason is that Google can choose to ignore a canonical link if they choose to.
How to set up a Canonical URL
You can set up the canonical link in two ways:
- The first is using a canonical link element in the
<head> section of your website.
- The second is using an HTTP header returned from your webserver.
Let’s look at how we can set up each of these:
This is useful if you can make changes to the
<head> of your webpage.
Here is an example of using the canonical link tag from within the HTML.
We are going to set up this URL:
So that it has a canonical link tag pointing to the preferred version:
This will tell Google to return the preferred page in the search results.
<link rel="canonical" href="https://example.com/page.html" />
The other option is to use an HTTP header. This option is useful when you do not have access to the HTML or the
<head> section of a webpage.
Instead, you can use the webserver or CDN to return a LINK header.
So this time on this page:
We can return an HTTP header showing that the preferred link is:
HTTP/1.1 200 OK
Link: <https://example.com/page.html>; rel="canonical"
Both the HTTP and HTML setups are valid and there is no one way to add a canonical link. Choose the option that makes the most sense for your website.
Can a canonical link point to itself?
Yes, and you should point a canonical link to the same page if it is the original. Using the same example from above on the page:
We could have the following HTML:
<link rel="canonical" href="https://example.com/page.html" />
This is good practice as there are often many links to your homepage written in different ways.
Auditing your site for Canonical Links?
To check if your site has canonical links set up you can use Chrome Dev Tools.
To do this right-click the page and choose the inspect tool. Click on the Elements Tab. Use the search bar to search for “canonical”.
Here is a screenshot of what the BBC canonical tag looks like:
If you can’t find the canonical link in the HTML then it may have been set as a header. Lush, the cosmetics website uses HTTP headers. To find it when in Chrome Dev Tools open the “Network” tab.
Click on the first link (the HTML page) in the list and click the network tab.
You are looking for a Link header in the “Response Headers” like this:
Wrapping Up, What is Canonical URL?
We have covered how there are many ways to access the same HTML via different URL’s such as:
- HTTP or HTTPS
- www or non-www
- URL’s with parameters
- pages with index.html at the end
A canonical link is a way we can control the best URL and the one we want search engines to use.
Search engines such as Google read the canonical link and promote this higher than any other link in the index.
If the URL should be permanently moved such as from HTTP to HTTPS or www or non-www use a 301 redirect.
This is the preferred solution for search engines as they may ignore the canonical link if they choose to.