BLOG | categories | Technical SEO

Robots txt File Example: 10 Templates To Use

By Steve Founder of PageDart

We are going to look at 10 robots.txt file examples.

You can either copy them to your site or combine the templates to make your own.

Remember that the robots.txt effects your SEO so be sure to test the changes you make.

Let's get started.

1) Disallow All

The first template will stop all bots from crawling your site. This is useful for many reasons. For example:

The site is not ready yet
You do not want the site to appear in Google Search results
It is a staging website used to test changes before adding to production

Whatever the reason this is how you would stop all web crawlers from reading the pages:

User-agent: *
Disallow: /

Here we have introduced two “rules” they are:

User-agent - Target a specific bot using this rule or use the * as a wildcard which means all bots
Disallow - Used to tell a bot that it cannot go to this area of the site. By setting this to a / the bot will not crawl any of your pages

What if we want the bot to crawl the whole site?

2) Allow All

If you do not have a robots.txt file on your site then by default a bot will crawl the entire website. One option then is to not create or remove the robots.txt file.

Yet, sometimes this is not possible and you have to add something. In this case, we would add the following:

User-agent: *
Disallow:

At first, this seems strange as we still have the Disallow rule in place. Yet, it is different as it does not contain the /. When a bot reads this rule it will see that no URLs have the Disallow rule.

In other words, the whole site is open.

3) Block a Folder

Sometimes there are times when you need to block an area of a site but allow access to the rest. A good example of this is an admin area of a page.

The admin area may allow admins to login and change the content of the pages. We don't want bots looking in this folder so we can disallow it like this:

User-agent: *
Disallow: /admin/

Now the bot will ignore this area of the site.

4) Block a file

The same is true for files. There may be a specific file that you don't want ending up in Google Search. Again this could be an admin area or similar.

To block the bots from this you would use this robots.txt.

User-agent: *
Disallow: /admin.html

This will allow the bot to crawl all the website except the /admin.html file.

5) Disallow a File Extension

What if you want to block all files with a specific file extension. For example, you may want to block the PDF files on your site from ending up in Google Search. Or you have spreadsheets that you don't want Googlebot to waste time reading.

In this case, you can use two special characters to block these files:

* - This is a wildcard and will match all the text
$ - The dollar sign will stop the URL matching and represents the end of the URL

When used together you can block PDF files like this:

User-agent: *
Disallow: /*.pdf$

Or .xls files like this:

User-agent: *
Disallow: /*.xls$

Notice how the disallow rule has /*.xls$. This means that it will match all these URLs:

https://example.com/files/spreadsheet1.xls
https://example.com/files/folder2/profit.xls
https://example.com/users.xls

Yet, it would not match this URL:

https://example.com/pink.xlsocks

Because the URL does not end with .xls.

6) Allow Only Googlebot

You can also add rules that apply to a specific bot. You can do this using the User-agent rule, so far we have used a wildcard that matches all bots.

If we wanted to allow only Googlebot to view the pages on the site we could add this robots.txt:

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow:

7) Disallow a Specific Bot

Like the above example, we can allow all bots but disallow a single bot. This is what the robots.txt file would look like if we wanted to block Googlebot only:

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow:

There are many bot user agents here is a list of common ones that you can create rules with:

Googlebot - Used for Google Search
Bingbot - Used for Bing Search
Slurp - Yahoo's web crawler
DuckDuckBot - Used by the DuckDuckGo search engine
Baiduspider - This is a Chinese search engine
YandexBot - This is a Russian search engine
facebot - Used by Facebook
Pinterestbot - Used by Pinterest
TwitterBot - Used by Twitter

8) Link to your Sitemap

When a bot visits your site they need to find all the links on the page. A sitemap lists all the URLs on your site. By adding your sitemap to your robots.txt you are making it easier for a bot to find all the links on your site.

To do this you need to use the Sitemap rule:

User-agent: *
Sitemap: https://pagedart.com/sitemap.xml

The above is from the PageDart robots.txt file. You can also list more than one sitemap if you have different sitemaps for each language.

The sitemap URL must be the full URL with https:// at the front for it to work.

9) Slow the Crawl Speed

It is possible to control the speed at which a bot will look at pages on your site. This can be useful if your web server struggles under high traffic.

Bing, Yahoo and Yandex all support the Crawl-delay rule. This allows you to set a delay between each page view like this:

User-agent: *
Crawl-delay: 10

In the example above the bot would wait for 10 seconds before requesting the next page. You can set a delay from 1 to 30 seconds.

Google does not support this rule as it is not part of the original robots.txt specification.

10) Draw a Robot

This last template is for a bit of fun. You can add ASCII art to add a robot to your robots.txt file like this:

#        _
#       [ ]
#      (   )
#       |>|
#    __/===\__
#   //| o=o |\\
# <]  | o=o |  [>
#     \=====/
#    / / | \ \
#   <_________>

If someone does come and have a look at your robots.txt file then it might make them smile.

Some companies already do this, Airbnb has an advert in their robots.txt file:

https://www.airbnb.co.uk/robots.txt

NPM has a robot in its robots.txt:

https://www.npmjs.com/robots.txt

Avvo.com has an ASCII drawing of Grumpy Cat:

https://www.avvo.com/robots.txt

But my favorite is Robinhood.com:

https://robinhood.com/robots.txt

Wrapping Up, Robots txt file example

We have looked at 10 different robots.txt templates that you can use on your site.

These examples include:

Disallow all bots from the whole site
Allow all bots everywhere
Block a folder from the crawl
Block a file from the crawl
Allow a single bot
Disallow all file types
Disallow a specific bot
Link to your sitemap
Slow the rate at which a bot crawls your site
Draw some are work in your robots.txt file

Remember that you can combine parts of these templates any way you like as long the rules are valid. To check if the robots.txt is valid you can use our robots.txt checker.