You wake up in the morning and pour a cup of coffee. The first job is to check your email.
And there it is, a message from Google Search Console telling you that you have:
Submitted URL Blocked by Robots txt
You open up the search console and see this:
Ready? Caffeinated? Let's get this sorted.
What is a Robots.txt file?
What is a robots.txt and why do you have one?
When a web crawler such as Googlebot visits your site it starts the visit by looking at your robots.txt file.
Find this file at the root of your domain like this:
Here are a few examples that you can take a look at:
The robots.txt file is a list of rules that Googlebot follows when it visits your site.
The rule we want to focus on here is the
Disallow rule this tells Googlebot not to visit an area of your site.
For example, this robots.txt would tell Googlebot to ignore the admin area:
Notice also the User-agent here is a ‘*’. This is a wildcard meaning to match all web crawlers which would include Googlebot.
If you wanted to have rules for Googlebot only you could add this:
This means that
Disallow: /admin will only be followed by Googlebot.
So now we know how to block Googlebot reading parts of your site let's fix this error.
How to Solve the Submitted URL Blocked by Robots txt Error
To solve this issue we need to open up Google Search Console and go to the coverage section.
This will show a list of any errors found when Google last crawled your site.
In the example below, you can see that this site has two pages with the “Submitted URL Blocked by Robots txt” error.
Click on the error and it will take you to a screen listing all the pages with that error.
To investigate a page we can click on it, which opens the side menu.
This displays the links to tools that we will use to investigate the error.
- Inspect URL - This allows you to run a manual Googlebot test of the page to see if it is available.
- Test Robots.txt Blocking - This allows you to test your robots.txt file. It also can test if the robots file is blocking a page.
Let's continue the investigation by using the “Test Robots.txt Blocking” tool.
This will open up a new page like this:
Google will cache your robots.txt file usually for 24 hours.
You can see above the robots.txt file that there is a date. This shows you when Google last fetched your robots.txt file and cached it.
The first thing we can do is check to make sure that the cached version is the same as the live version.
There is a link in the top right to open the “live robots.txt file”.
Click on this link and will open up a new tab with your live robots.txt file. Make sure that Google's cached version and the live version match.
If the live version is different then you can press the submit button on the “Test Robots.txt Blocking” tool.
This will bring up a menu:
Press the submit button again and it will fetch the live version giving you the message:
Success! Reload the Tester page in a minute to confirm the timestamp.
Close this menu and refresh the tester page.
You will see that the new robots.txt file and the date of the cache will have updated.
Now that we are sure that Google has the latest robots.txt file we can use the second tool.
Jump back to Google Search Console.
Click on “Inspect URL”.
This will open the inspect page giving you details about any issues Google is having.
In the example above, the robots.txt file has blocked the indexed page and Google cannot update it.
Now that we have updated the robots.txt click on the “Test Live URL” button to see if the new robots.txt file has helped.
As you can see below Googlebot is still blocked by the robots.txt file from reading the page. So we need to investigate if there is a rule in the robots.txt file that is blocking Googlebot.
Copy the URL and jump back to the “Robots.txt Tester” tool.
There is one more feature this tool has it allows you to test a URL against the robots.txt file.
Paste the URL into the text box and hit the “Test” button. The text box is a path so make sure to remove your domain name from the URL. You want to enter this:
In the text box.
When a rule in the robots.txt file is blocking Googlebot then it will highlight the rule in red.
If you have a rule that matches and the rule is causing an error. Then you should fix this by removing the rule from the file.
Once you have removed the rule you can upload the new robots.txt file to your web server.
At this point, you can use the Inspect URL tool in Google Search Console to check the URL again.
You should see this:
What if I want to Stop Indexing
You may also get this error when you have submitted a page in your sitemap that you do want to block.
This can happen when you have a disallow rule in your robots.txt file and the same URL is in your sitemap.
If you don't want the URL to appear in the Google Index then you need to remove it from the sitemap.
To easily locate your sitemap look for the sitemap rule in you robots.txt.
If there is no
Sitemap: rule in your robots file then take a look at our find your sitemap guide.
Once you have located the sitemap you can search for the URL.
If it is, remove this URL from your sitemap and then resubmit your sitemap to Google Search Console.
It's Still Not Fixed
It is possible to block Googlebot from reading your entire site if your server is returning an error.
When Googlebot requests your robot.txt file it must return a 200, meaning that the request was successful.
But if your server return the error 503 then Googlebot cannot read your robot.txt file and Googlebot will not crawl any of your site.
We can use the Inspect URL tool in Google Search Console to check for this.
Usually, you use this tool to inspect a URL that you want to index. Here we are using it to test that Googlebot can read the robots file.
Enter your Robots.txt file into the search box:
It will show that the “URL is not on Google” this is fine we want to test the live URL.
Clicking the “TEST LIVE URL” button will make the Googlebot fetch the page. We want to make sure that it can.
Once it's fetched and if all is ok you will see:
If Googlebot cannot read the file then you need to find out why by clicking the “VIEW TESTED PAGE” button. This will give you more details about any error found.
Submitted URL Blocked by Robots txt, Final Thoughts
We have covered the common and not so common solutions to this problem:
- The indexed URL is unavailable because of a rule in the robots file
- The URL was indexed by mistake and should be removed from the sitemap
- The robots file is being blocked by the web server for Googlebot
Using the tools provided by Google Search Console we can zoom in on the problem:
- Inspect URL to make user that Googlebot can read the site
- Robots txt Tester tool to check the Google cache and the rules are correct.
With this information, you can fix the pages and remove the errors.
For a help with other common Google Search Console errors check out the submitted URL has crawl issue page.