Google Crawler User Agent
Do you know when Google crawls your site?
Knowing this is the biggest indicator of your site's importance.
The more important your site, the more often Google will crawl.
So it is fundamental that you track this metric in 2020.
But, how can you track when Google crawls?
To track this you need to track Googlebot, this is Google's web crawler.
We can track Googlebot by using its user agent.
Let's take a look.
What are the Google crawler user agents?
Googlebot has three user agents.
One for smartphones and two for desktop. They are:
Googlebot Mobile User Agent:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Googlebot Desktop User Agents:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
OR
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
Depending on your site you may only see one or you may see all three crawl your site.
The above user agents all have the Googlebot/2.1;
version letting you know that they are Googlebot.
You may also notice the W.X.Y.Z
above. Googlebot will replace this with the version of Chrome it is using.
So instead of seeing this:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
If Googlebot was using version 76.0.3809.100
of Chrome then the user agent would look like this:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/76.0.3809.100 Safari/537.36
So the W.X.Y.Z
is replaced with 76.0.3809.100
Recently, Google announced that they would be keeping Googlebot up-to-date with Chrome releases.
So this version number will continually update with the latest version of Chrome.
This is great news!
As it means Googlebot can run Javascript, which has always been a challenge in the past.
Google keeps a full list of all Google User Agents.
What exactly is a user agent?
What is a User Agent?
A user agent contains the name and version of the browser requesting a web page. It is called a user agent string.
Here is my user agent string:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
As you can see I am using an Apple Macbook and I am running Chrome version 78.
This user agent string is sent with every request.
When Googlebot crawls the web it also has a user agent string. This is what the Googlebot for mobile looks like:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z‡ Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
This user agent string shows us a few things:
- the device is a smartphone (Nexus 5X)
- the device is running Google Chrome
- the device is a bot (Googlebot)
There is also something called a user agent token. The user agent token can be used by the robots.txt file.
What is a User Agent Token?
So far, we have discussed user agent strings but there are also user agent tokens.
User agent tokens are used by the robots.txt file to control where a bot can visit on your site.
For more information on robots.txt check out our robots.txt file checker.
Google has four user agent tokens:
- Googlebot - Used by Google Search
- Googlebot-Image - Used by Google Images
- Googlebot-News - Used by Google News
- Googlebot-Video - Used by Google Video
This gives you the option to write rules for each of these bots.
For example, if we want Googlebot to read all the pages on our site we would add this to our robots.txt:
User-agent: Googlebot
Disallow:
Yet, you may have images on your site which we cannot add to Google Images. In this case, you can block Googlebot-Image
from viewing these pages:
User-agent: Googlebot-Image
Disallow: /personal
Unlike, user agent strings you cannot have rules for mobile and desktop.
If you don't have a robots.txt file then you can also add a meta tag to each page. The following meta tag will stop the Googlebot from looking at the page:
<meta name="googlebot" content="noindex">
you could write the same for Googlebot-Image:
<meta name="googlebot-image" content="noindex">
Now you have allowed Googlebot access to your site. Let's look at tracking Googlebot views using Google Analytics.
Tracking Googlebot with Google Analytics
With Googlebot running the latest version of Chrome it can run Javascript. We can use this and create a custom event to track the user agent and send it to Google Analytics.
To do this we need to add some javascript to your page that checks the user agent like this:
if (navigator.userAgent.toLowerCase().includes('googlebot')) {
…
}
This code is simply checking if the user agent includes the string googlebot
.
If the user agent does include googlebot
we can then send a custom event like this:
if (navigator.userAgent.toLowerCase().includes('googlebot')) {
ga('send', 'event', 'Googlebot', 'view', navigator.userAgent, {
nonInteraction: true
});
}
The above will send the event to Google Analytics each time Googlebot views a page.
We have talked about how we can identify and track Googlebot. I wanted to touch on bad bots. Bots that pretend to be Googlebot.
Let's look at this next.
Be Careful of Bad Bots.
Often, Googlebot has special access to web sites so that the content appears in Google Search.
This special access will allow Googlebot to crawl the pages and read the content.
Bad bots pretend to be Googlebot. They can then read your pages and take your content. This is web scraping.
It is possible to detect the real Googlebot by doing a reverse DNS lookup.
If this is too advanced, all it not lost. There are companies like onCrawl that will track bots for you.
The important take-away is that not all Googlebot requests to your site will be Googlebot. Some of the requests will be from these bad bots.
Wrapping Up, Google Crawler User Agent
We have seen that there are three Google crawler user agents.
For mobile this is:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
For desktop these are:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
OR
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
You can use the user agent token in your robots.txt to allow or disallow access to parts of your site.
Once set up you can track when Googlebot views using a Google Analytics event.
Remember, that bad bots will pretend to be Googlebot so they can view your content.
For advanced users, you can do a reverse DNS lookup to confirm that the bot is from Google.