Robots txt File Checker
Use our robots.txt file checker below to test that your robots.txt file is working.
Copy and paste your robots.txt file in the textbox below. You can find your robots file by adding /robots.txt
to your website. Such as https://example.com/robots.txt
.
To create this tool we analyzed over 5000+ robots files. We found 7 common errors during our research.
Once we found these errors, we then learned how to fix them. Below, you will find detailed instructions on how to fix all the errors.
Continue reading to find our why we built this tool and how we completed the research.
Why we Built the Tool
When a crawler visits your site such as Googlebot it will read the robots.txt file before it looks at any other page.
It will use the robots.txt file to check where it can go and where it can't.
It will also look for your sitemap, which will list all the pages on your site.
Each line in a robots.txt file is a rule that the crawler should follow.
If the rule has an error, then the crawler will ignore the rule.
This tool provides an easy way to quickly check if the robots.txt file has any errors.
We also give you a list of how to fix it.
For a more detailed look on how important the robots.txt file is have a look at the Robots txt for SEO post.
How we Analyzed 5000+ Robots.txt
We grabbed a list of the top 1 million websites according to Alexa.
They have a CSV you can download with a list of all the URLs.
We found that not every site has or needs a robots.txt file.
In order to get over 5000+ robots.txt files we had to look at over 7500 websites.
This means that in the top 7541 websites on the internet 24% of the sites have no robots.txt.
Of the 5000+ robots.txt files we did analyze we found 7 common errors:
- Pattern should either be empty, start with “/” or “*"’
- “$” should only be used at the end of the pattern
- No user-agent specified
- Invalid sitemap URL protocol
- Invalid sitemap URL
- Unknown directive
- Syntax not understood
We will go through each of these errors and how to fix them below.
But, here is what we found from our analysis.
Of the 5732 robots.txt files we analyzed only 188 had errors.
We also found that 51% had more than one error. Often the same error repeated.
Let's look at how many times each error occurred:
Error | Count |
---|---|
Pattern should either be empty, start with "/" or "*"' | 11660 |
"$" should only be used at the end of the pattern | 15 |
No user-agent specified | 461 |
Invalid sitemap URL protocol | 0 |
Invalid sitemap URL | 29 |
Unknown directive | 144 |
Syntax not understood | 146 |
As you can see the Pattern should either be empty, start with "/" or "*"'
is the most common error.
Once we had the data we were able to understand and fix the errors.
Pattern should either be empty, start with “/” or “*”
This was the most common error we found during the analysis and this is not that surprising.
This error is referring to both the Allow
and Disallow
rules. These rules are the most common found in a robots.txt file.
If you get this error it means that the first character after the colon is either not a “/” or “*".
For example, Allow: admin
would cause this error.
The correct way to format this would be Allow: /admin
.
The wildcard (*) is used to allow all or disallow all. For example, it is common to see this when you want to stop the site from being crawled:
Disallow: *
To fix this error make sure that you have either a “/” or “*” after the colon.
“$” should only be used at the end of the pattern
You may have a dollar sign in your robots.txt file.
You can use this to block a particular file type.
For example, if we wanted to block the crawling of all .xls
files you could use:
User-agent: *
Disallow: /*.xls$
The $ sign tells the crawler that this is the end of the URL. So this rule would disallow:
https://example.com/pink.xls
But Allow:
https://example.com/pink.xlsocks
If you do not have the dollar sign at the end of the line, for example:
User-agent: *
Disallow: /*$.xls
This will cause this error message. To fix move to the end:
User-agent: *
Disallow: /*.xls$
So only use the $ sign at the end of the URL to match filetypes.
No user-agent specified
In your robots.txt you must specify at least one User-agent
. You use User-agent
to identify and target specific crawlers.
If we wanted to target only the Googlebot crawler you would use:
User-agent: Googlebot
Disallow: /
There are quite a few crawlers in use:
- Googlebot
- Bingbot
- Slurp
- DuckDuckBot
- Baiduspider
- YandexBot
- facebot
- ia_archiver
If you wanted to have different rules for each you can list them like this:
User-agent: Googlebot
Disallow: /
User-agent: Bingbot
Allow: /
You can also use a “*” this is a wildcard and means that it will match all crawlers.
User-agent: *
Allow: /
Make sure that you at least have one User-agent
set.
Invalid sitemap URL protocol
When you link to your sitemap from your robot.txt file you must include the full URL.
This URL must be an absolute URL such as https://www.example.com/sitemap.xml
.
The protocol is the https
part of the URL. For a sitemap URL you can use HTTPS
, HTTP
or FTP
. If you have anything else you will see this error.
Invalid sitemap URL
You can link to a sitemap from your robots.txt file. It must be a full (absolute) URL. For example, https://www.example.com/sitemap.xml
would be an absolute URL.
If you do not have an absolute URL like this for example:
User-agent: *
Allow: /
Sitemap: /sitemap.xml
This will cause this error. To fix it change to the absolute URL:
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Unknown directive
When writing a rule there are only a fixed number of “directives” you can use. These are the commands you type before the colon “:". Allow
and Disallow
are both directives.
Here is a list of all the valid directives:
- Sitemap
- User-agent
- Allow
- Disallow
- Crawl-delay
- Clean-param
- Host
- Request-rate
- Visit-time
- Noindex
If you have anything else outside the list above you will see this error.
From our research the most common cause of this issue is due to a typo in the spelling of the directive.
Fix the typo and retest.
Syntax not understood
You will see this error when there is no colon on the line.
There must be a colon on every line to separate the directive from the value.
This would cause the error:
User-agent: *
Allow /
To fix add the colon (spot the difference):
User-agent: *
Allow: /
Place the colon after the directive to fix the issue.
Wrapping Up, Robots txt File Checker
This tool can help you check for the most common errors found in robots.txt files.
By copying and pasting your robots.txt file in the tool above you can test to make sure that your file is error free.
We check for 7 errors including:
- Pattern should either be empty, start with “/” or “*"’
- “$” should only be used at the end of the pattern
- No user-agent specified
- Invalid sitemap URL protocol
- Invalid sitemap URL
- Unknown directive
- Syntax not understood
Once you find out which line the error is on you can fix it using the tips provided.