Robots.txt is a little text file that exists on almost all websites.
The good news is this little file can give your site an SEO boost!
Yet, many people are not aware this file exists.
Once you know what to look for, robots.txt has great power and is easy to understand.
Often, optimizing this file is at the end of the list, in favor of more exotic SEO practices.
Yet, an optimized robots.txt file helps SEO and it is a simple change that you should make on your site.
Simply put, an optimized robots.txt can help Google crawl your pages faster.
Let's take a look at how you can optimize your robots.txt file.
What is Robots.txt?
Robots.txt is a file that is read by a “robot” when visiting your site.
These robots are software programs such as spiders or crawlers. Search engines created these crawlers to read the contents of your site. Googlebot and Bingbot are two examples of these types of crawlers.
When Googlebot visits your site it reads all the pages. Once finished it will add them to the Google Index, which adds the pages to the Google Search results.
Googlebot will first have a look at your robots.txt file before it visits any other pages on the site.
This little text file tells the crawler what the “house rules” are. Rules like:
What pages is Googlebot allowed to read
What pages is Googlebot not allowed to visit
To show Googlebot where to find all the pages of the site
The crawler will then follow all the rules listed.
To find the robots.txt file all you need to do is add /robots.txt on the end of a website.
This is a very simple robots.txt file that you would see after a WordPress installation.
Let's break down each rule and look at what we are telling the crawlers.
User-agent: * this allows you to target specific crawlers. The * means all crawlers.
Allow: / this tells the crawler which pages to visit. This rule allows the bot to crawl all pages on the site.
Disallow: /wp-admin with this line we are stopping the crawler from visiting the admin area of the site.
Sitemap: https://example.com/sitemap.xml this last line is a link to the sitemap URL. Crawlers use the sitemap to find all the pages on a site. Notice how this is a full URL including the https:// rather than just a path like the Allow and Disallow rules.
There is one more rule you may see which looks like this:
This is telling the crawler at what speed it should crawl the site. Yet, most crawlers do not look at this rule and will ignore it. Even Googlebot will ignore this rule.
As a site grows the robots.txt file can get large. You can add as many Allow and Disallow rules as you need.
As the size of the file increases so does the complexity. So we need to make sure to test the file to see if there are any errors.
Testing your Robots.txt
Once you have created your robots.txt file you will want to test it to make sure that it is valid.
Any errors in your file will stop the crawler from following the rules.
So it is a good idea to test your robots file after every change.