Today we are going to fix the “Submitted URL Seems to be a Soft 404” error.
If you have opened up the Google Search Console and you have seen something like this:
Then we are going to look at the most common causes of this issue and how you can fix them.
Grab a coffee let's get started.
What is a 404?
When you remove content from your website there are still links to this page on other sites.
When a user clicks on one of these links the request goes to your web server and the web server tries to find the page.
If the page no longer exists then the web server should return a 404 HTTP status code.
This is part of the HTTP standard and a 404 tells the browser that the page is no longer available.
Along with the 404 status code, your web server will also return a webpage. This lets the user know that the page is no longer available.
So how does a soft 404 differ?
What is a soft 404?
A soft 404 is when the user sees the webpage showing that the page is no longer available. Yet, instead of the browser receiving a 404 status, it receives a 200-level HTTP status code.
A 200-level HTTP status code tells the browser that the page request was successful. In other words, the web server found the page.
For a soft 404, two things are happening:
the user is being told that the page is not found.
the user's browser is being told the page is correct.
And this is the reason for the error in Google Search Console. The two are contradicting each other.
So if the user sees that the page is not available why do we need to fix it?
Why should you fix a soft 404?
If the page should be a 404 then it is bad practice to return a 200-level code.
A 200-level success code tells the browser and search engines that the page is correct. This means that the page will end up in Google Search rankings.
Google has a web crawler called Googlebot that it uses to read your pages. As you can imagine reading every web page on the internet is a big job. Googlebot can't get around to all the pages every day.
So Google assigns a crawl budget to each website. The crawl budget is how much time Googlebot will spend on your site. Once the budget runs out Googlebot moves on to the next site.
You don't want to waste your crawl budget reading pages that are no longer available.
To tell Googlebot that the page has gone you can either:
Send a 404 not found status code which lets Googlebot know the page is not found
Send a 410 Gone status code meaning that the page has gone and is not coming back