Robots txt File Example: 10 Templates To Use
We are going to look at 10 robots.txt file examples.
You can either copy them to your site or combine the templates to make your own.
Remember that the robots.txt effects your SEO so be sure to test the changes you make.
Let's get started.
1) Disallow All
The first template will stop all bots from crawling your site. This is useful for many reasons. For example:
- The site is not ready yet
- You do not want the site to appear in Google Search results
- It is a staging website used to test changes before adding to production
Whatever the reason this is how you would stop all web crawlers from reading the pages:
User-agent: *
Disallow: /
Here we have introduced two “rules” they are:
- User-agent - Target a specific bot using this rule or use the * as a wildcard which means all bots
- Disallow - Used to tell a bot that it cannot go to this area of the site. By setting this to a
/
the bot will not crawl any of your pages
What if we want the bot to crawl the whole site?
2) Allow All
If you do not have a robots.txt file on your site then by default a bot will crawl the entire website. One option then is to not create or remove the robots.txt file.
Yet, sometimes this is not possible and you have to add something. In this case, we would add the following:
User-agent: *
Disallow:
At first, this seems strange as we still have the Disallow rule in place. Yet, it is different as it does not contain the /
. When a bot reads this rule it will see that no URLs have the Disallow rule.
In other words, the whole site is open.
3) Block a Folder
Sometimes there are times when you need to block an area of a site but allow access to the rest. A good example of this is an admin area of a page.
The admin area may allow admins to login and change the content of the pages. We don't want bots looking in this folder so we can disallow it like this:
User-agent: *
Disallow: /admin/
Now the bot will ignore this area of the site.
4) Block a file
The same is true for files. There may be a specific file that you don't want ending up in Google Search. Again this could be an admin area or similar.
To block the bots from this you would use this robots.txt.
User-agent: *
Disallow: /admin.html
This will allow the bot to crawl all the website except the /admin.html
file.
5) Disallow a File Extension
What if you want to block all files with a specific file extension. For example, you may want to block the PDF files on your site from ending up in Google Search. Or you have spreadsheets that you don't want Googlebot to waste time reading.
In this case, you can use two special characters to block these files:
*
- This is a wildcard and will match all the text$
- The dollar sign will stop the URL matching and represents the end of the URL
When used together you can block PDF files like this:
User-agent: *
Disallow: /*.pdf$
Or .xls
files like this:
User-agent: *
Disallow: /*.xls$
Notice how the disallow rule has /*.xls$
. This means that it will match all these URLs:
https://example.com/files/spreadsheet1.xls
https://example.com/files/folder2/profit.xls
https://example.com/users.xls
Yet, it would not match this URL:
https://example.com/pink.xlsocks
Because the URL does not end with .xls
.
6) Allow Only Googlebot
You can also add rules that apply to a specific bot. You can do this using the User-agent
rule, so far we have used a wildcard that matches all bots.
If we wanted to allow only Googlebot to view the pages on the site we could add this robots.txt:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:
7) Disallow a Specific Bot
Like the above example, we can allow all bots but disallow a single bot. This is what the robots.txt file would look like if we wanted to block Googlebot only:
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow:
There are many bot user agents here is a list of common ones that you can create rules with:
- Googlebot - Used for Google Search
- Bingbot - Used for Bing Search
- Slurp - Yahoo's web crawler
- DuckDuckBot - Used by the DuckDuckGo search engine
- Baiduspider - This is a Chinese search engine
- YandexBot - This is a Russian search engine
- facebot - Used by Facebook
- Pinterestbot - Used by Pinterest
- TwitterBot - Used by Twitter
8) Link to your Sitemap
When a bot visits your site they need to find all the links on the page. A sitemap lists all the URLs on your site. By adding your sitemap to your robots.txt you are making it easier for a bot to find all the links on your site.
To do this you need to use the Sitemap
rule:
User-agent: *
Sitemap: https://pagedart.com/sitemap.xml
The above is from the PageDart robots.txt file. You can also list more than one sitemap if you have different sitemaps for each language.
The sitemap URL must be the full URL with https:// at the front for it to work.
9) Slow the Crawl Speed
It is possible to control the speed at which a bot will look at pages on your site. This can be useful if your web server struggles under high traffic.
Bing, Yahoo and Yandex all support the Crawl-delay
rule. This allows you to set a delay between each page view like this:
User-agent: *
Crawl-delay: 10
In the example above the bot would wait for 10 seconds before requesting the next page. You can set a delay from 1 to 30 seconds.
Google does not support this rule as it is not part of the original robots.txt specification.
10) Draw a Robot
This last template is for a bit of fun. You can add ASCII art to add a robot to your robots.txt file like this:
# _
# [ ]
# ( )
# |>|
# __/===\__
# //| o=o |\\
# <] | o=o | [>
# \=====/
# / / | \ \
# <_________>
If someone does come and have a look at your robots.txt file then it might make them smile.
Some companies already do this, Airbnb has an advert in their robots.txt file:
https://www.airbnb.co.uk/robots.txt
NPM has a robot in its robots.txt:
https://www.npmjs.com/robots.txt
Avvo.com has an ASCII drawing of Grumpy Cat:
https://www.avvo.com/robots.txt
But my favorite is Robinhood.com:
https://robinhood.com/robots.txt
Wrapping Up, Robots txt file example
We have looked at 10 different robots.txt templates that you can use on your site.
These examples include:
- Disallow all bots from the whole site
- Allow all bots everywhere
- Block a folder from the crawl
- Block a file from the crawl
- Allow a single bot
- Disallow all file types
- Disallow a specific bot
- Link to your sitemap
- Slow the rate at which a bot crawls your site
- Draw some are work in your robots.txt file
Remember that you can combine parts of these templates any way you like as long the rules are valid. To check if the robots.txt is valid you can use our robots.txt checker.