How To Find The Sitemap of a Website
Can't find the sitemap of a website?
Sometimes it can be difficult to find the sitemap as there is more than one way to create one.
Then there are all the different file types XML, TXT, RSS and ATOM. Where do you start?
I will give you 5 ways to locate the sitemap on any site.
Before we jump in, let's define what a sitemap is. Knowing this will help us find where the sitemap is hiding.
You can't hide sitemap, we are on to you!
Once you have found the sitemap don't forget to submit it to Google and Bing.
What is a Sitemap?
A sitemap is a list of all the pages on your website.
Software that visits your website called a crawler uses this sitemap to find its way around.
A good example of a crawler is Googlebot. Googlebot visits your pages to read them and adds the content to Google Search.
This is why a sitemap is so useful. Rather than Googlebot searching for all the pages, the sitemap does the hard work. Making the Googlebot faster at finding your pages.
As well as having a sitemap you should also tell Google where you keep it. Googlebot can then use this sitemap on its next visit.
To tell Google, add the sitemap to the Google Search Console.
But, what do we submit exactly?
There are a few formats of sitemaps, like:
- XML
- RSS / ATOM File
- Plain Text file
The most common is XML as it is the most flexible. It allows you to link sitemaps together and link to language-specific pages.
This file looks like HTML for example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/foo.html</loc>
<lastmod>2018-06-04</lastmod>
</url>
...
This example lists the pages on the site and also when the page last changed using the lastmod
tag.
You can also do the same with RSS or ATOM feeds. These are XML feeds in a different format.
Almost all blog software can create an RSS or ATOM feed. But be careful with this.
Some blog software will only produce a list of recent blog posts. You should list all of the pages of your website in your sitemap.
The last type is a plain text file. Although this is less common it is still in use, here is a sample of the sitemap from Starbucks:
https://www.starbucks.com/menu
https://www.starbucks.com/menu/drinks
https://www.starbucks.com/menu/drinks/hot-coffees
...
The text file only has a list of the sites pages, one page on each line. There is no extra information such as the last changed date. It's the simplest but also the least flexible.
If you need to create a sitemap for a new site then use the XML format.
Many Sitemaps
When you visit large sites such as the BBC you will find that they have many sitemaps. This is because there is a limit of 50,000 pages and a file size limit of 50MB.
To link sitemaps together it is possible to have a master sitemap that links to all the others.
To do this you must use XML and the sitemap
tag as we have below:
...
<sitemap>
<loc>
https://example.com/hats/sitemap.xml
</loc>
</sitemap>
...
We have covered, what a sitemap is and how they split into many files. let's look at some ways you can find a sitemap.
How to Find the Sitemap?
There are 5 ways to find the sitemap on a site. These are:
- Use Robots.txt
- Manually
- Use Google Search
- Find RSS Sitemap in Source
- Google Search Console
Starting with the easiest to the most difficult. Let's start with robots!
Use Robots.txt
If you are lucky the sitemap will be in the robots.txt file.
Like the sitemap, the robots file is also used by crawlers.
The Googlebot crawler uses the robots.txt file to see where it can and can't go.
You will find this file by adding /robots.txt to the end of the URL. Such as:
https://exmaple.com/robots.txt
Each line in this file is a “rule” and the crawler follows each rule listed.
The crawler needs to know where to go and the Sitemap:
rule shows the crawler where all the pages are.
For example, if we look at the robots txt file from Airbnb we can see the sitemaps listed at the end of the file:
Sitemap: https://www.airbnb.com/sitemap-master-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-p2-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-p2_poi-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-homes_filters_expansion-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-homes_pdp-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-things_to_do_cities_and_categories-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-places_pdp-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-experiences_p2-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-experiences_pdp-urls-index.xml.gz
Sitemap: https://www.airbnb.com/sitemap-additional_things_to_do-urls-index.xml.gz
If you are lucky you now have your sitemap. If not, let's look at a manual check.
Manually
One way to find the sitemap is to try different common URLs. The most common sitemap location is at sitemap.xml
. So by adding this to the end of the domain name, you can test to see if the sitemap exists:
https://example.com/sitemap.xml
If you get a 404 then the sitemap was not found and we can try another filename.
sitemap.xml
is the most common but not the only one.
To get a list of other common sitemap filenames we looked at the sitemaps of 7000+ websites.
Here is a list of common filenames for the sitemap based on this research:
- /sitemap.xml
- /sitemap_index.xml
- /sitemap-index.xml
- /sitemap/
- /post-sitemap.xml
- /sitemap/sitemap.xml
- /sitemap/index.xml
- /rss/
- /rss.xml
- /sitemapindex.xml
- /sitemap.xml.gz
- /sitemap_index.xml.gz
- /sitemap.php
- /sitemap.txt
- /atom.xml
If none of these work add a capital letter for example, /Sitemap.xml
. Try adding a capital to any of the filenames above.
If you still have nothing then let's look at using Google Search.
Use Google Search
We can use some Google Search magic to search for XML files. We can also narrow down the search to a particular site.
To search the BBC for XML files we would use the search:
inurl:bbc.co.uk filetype:xml
Doing this returns a sitemap:
If you get a lot of pages in the results, then you can narrow these down by searching for sitemap in the URL, like this:
site:example.com inurl:sitemap filetype:xml
Don't forget that sitemaps can also be text files. Doing the same for Starbucks you can find the sitemap:
site:starbucks.com inurl:sitemap filetype:txt
The search above returns the sitemap:
If you still can’t find it let's see if it is an RSS feed.
Find RSS Sitemap in Source
You can use an RSS feed as a sitemap and many blogs will create this file by default.
To find the files look at the source code of the HTML page.
For example, if we open the Chrome browser and go to the SpaceX News page. We can Inspect the page source by right-clicking the page and choosing the Inspect option.
If you are on the Elements tab you can search the code for:
application/rss+xml
this shows that there is an RSS feed:
<link rel="alternate" type="application/rss+xml" title="" href="https://www.spacex.com/news.xml">
If there is no RSS feed and we still can't find the sitemap. There is one last place to look.
Google Search Console
If all the above have failed then get access to the sites Google Search Console.
To help Google crawl your website you can submit your sitemap to Google Search Console.
If there is a sitemap added to Google Search Console then you will find it listed by going to Index > Sitemaps.
Any submitted sitemaps will look like this:
How To Find The Sitemap of a Website: Final Thoughts
By now you have found the sitemap. Well done!
We have a shared understanding of what a sitemap is and that they come in 3 different formats:
- XML
- RSS / ATOM File
- Plain Text file
You have 5 ways to find the sitemaps as they are not always in the same place.
- Use Robots.txt
- Manually
- Use Google Search
- Find RSS Sitemap in Source
- Google Search Console
If these 5 ways did not work or you have another way to try then please comment below.