Robots txt for SEO
Robots.txt is a little text file that exists on almost all websites.
The good news is this little file can give your site an SEO boost!
Yet, many people are not aware this file exists.
Once you know what to look for, robots.txt has great power and is easy to understand.
Often, optimizing this file is at the end of the list, in favor of more exotic SEO practices.
Yet, an optimized robots.txt file helps SEO and it is a simple change that you should make on your site.
Simply put, an optimized robots.txt can help Google crawl your pages faster.
Let's take a look at how you can optimize your robots.txt file.
What is Robots.txt?
Robots.txt is a file that is read by a “robot” when visiting your site.
These robots are software programs such as spiders or crawlers. Search engines created these crawlers to read the contents of your site. Googlebot and Bingbot are two examples of these types of crawlers.
When Googlebot visits your site it reads all the pages. Once finished it will add them to the Google Index, which adds the pages to the Google Search results.
Googlebot will first have a look at your robots.txt file before it visits any other pages on the site.
This little text file tells the crawler what the “house rules” are. Rules like:
- What pages is Googlebot allowed to read
- What pages is Googlebot not allowed to visit
- To show Googlebot where to find all the pages of the site
The crawler will then follow all the rules listed.
To find the robots.txt file all you need to do is add /robots.txt
on the end of a website.
Here are some examples of /robots.txt
files:
- https://google.com/robots.txt
- https://youtube.com/robots.txt
- https://facebook.com/robots.txt
- https://wikipedia.org/robots.txt
- https://yahoo.com/robots.txt
- https://amazon.com/robots.txt
- https://netflix.com/robots.txt
- https://reddit.com/robots.txt
- https://blogspot.com/robots.txt
Have a look at some of the examples above to get familiar with the types of rules each site has.
Why use a Robots.txt for SEO?
So why should we use this file and how does it give us any SEO boost?
When Googlebot crawls your website it has a budget.
Once this budget runs out Googlebot leaves your site.
We need to be efficient with where the Googlebot can go, so it can visit as many pages as possible within the budget.
For example, many Wordpress sites have an admin page at /wp-admin
such as https://example.com/wp-admin.
This admin page has no use to the customers of the site and is only used by the admins to make changes to the site.
Yet, Google will still spend time looking at this page. Unless you tell it not to.
This is where the robots.txt comes in.
We can disallow
areas of a website. Areas that you know have no interest to a customer. Areas that Google should not waste the budget on.
Next, let's take a look at an example of a robots.txt
How to Set Up a Robots.txt
So now we have a good understanding of what a robots.txt file is and why it is useful. Let's now take a look at an example file.
User-agent: *
Allow: /
Disallow: /wp-admin
Sitemap: https://example.com/sitemap.xml
This is a very simple robots.txt file that you would see after a WordPress installation.
Let's break down each rule and look at what we are telling the crawlers.
User-agent: *
this allows you to target specific crawlers. The * means all crawlers.Allow: /
this tells the crawler which pages to visit. This rule allows the bot to crawl all pages on the site.Disallow: /wp-admin
with this line we are stopping the crawler from visiting the admin area of the site.Sitemap: https://example.com/sitemap.xml
this last line is a link to the sitemap URL. Crawlers use the sitemap to find all the pages on a site. Notice how this is a full URL including thehttps://
rather than just a path like theAllow
andDisallow
rules.
There is one more rule you may see which looks like this:
Crawl-delay: 10
This is telling the crawler at what speed it should crawl the site. Yet, most crawlers do not look at this rule and will ignore it. Even Googlebot will ignore this rule.
As a site grows the robots.txt file can get large. You can add as many Allow
and Disallow
rules as you need.
As the size of the file increases so does the complexity. So we need to make sure to test the file to see if there are any errors.
Testing your Robots.txt
Once you have created your robots.txt file you will want to test it to make sure that it is valid.
Any errors in your file will stop the crawler from following the rules.
So it is a good idea to test your robots file after every change.
We have a Robots.txt file checker tool available that will check that crawlers like Googlebot can read it.
To check that your robots.txt is valid, copy and paste the rules into the text box and click the “TEST ROBOTS.TXT” button.
This will test the contents for the following errors:
Pattern should either be empty, start with "/" or "*"
: This error will show if any of yourAllow
andDisallow
rules start with something other than a “/” or “*"."$" should only be used at the end of the pattern
: If any of your rules have a “$” before the end of the line you will see this error.No user-agent specified
: If you fail to specify any user agents then you will see this error.Invalid sitemap URL protocol
: Valid protocol should beHTTPS
,HTTP
orFTP
. Anything else and you will see this error.Invalid sitemap URL
: If the sitemap URL is not a valid absolute URL you will see this error.Unknown directive
: If the test finds a line that starts with an incorrect rule. Such as a misspellingAlow
instead ofAllow
then it will cause this error.Syntax not understood
: Every line must have a colon “:". If a line does not then you will see this error.
There is more information on how to fix these errors on the Robots.txt tester tool page.
Robots txt for SEO, Final Thoughts
This blog post has given you a quick overview of exactly what the robots.txt is. A small text file that sets the rules for crawlers on your site.
It’s a powerful and often overlooked SEO best practice. Giving you control over what the crawlers do when they visit.
By optimizing the robots.txt you will not waist the crawl budget. Remove areas of your site that are not interesting to your customers. And you do not want to show in Google Search results.
Once you have created a robots.txt file make sure there are no errors by checking it with the Robots.txt tester tool.
This will give you feedback on any errors and how to fix them.
Once it has passed all the tests add it to your site at the root, for example, https://example.com/robots.txt.
Crawlers will start using it on the next visit.