Google Explains Why Google Does Not Crawl & Index Every URL

Mar 22, 2022 - 7:31 am 12 by

Google Gecko Crawl Budget

John Mueller of Google wrote a very detailed and honest explanation on why Google (and third party SEO tools) do not crawl and index every URL or link on the web. He explained that crawling is not objective, it is expensive, it can be inefficient, the web changes a lot, there is spam and junk and all of that has to be taken into account.

John wrote this detailed response on Reddit answering why "Why SEO tools don't show all backlinks?" But he answered it from a Google Search perspective. He said:

There's no objective way to crawl the web properly.

It's theoretically impossible to crawl it all, since the number of actual URLs is effectively infinite. Since nobody can afford to keep an infinite number of URLs in a database, all web crawlers make assumptions, simplifications, and guesses about what is realistically worth crawling.

And even then, for practical purposes, you can't crawl all of that all the time, the internet doesn't have enough connectivity & bandwidth for that, and it costs a lot of money if you want to access a lot of pages regularly (for the crawler, and for the site's owner).

Past that, some pages change quickly, others haven't changed for 10 years -- so crawlers try to save effort by focusing more on the pages that they expect to change, rather than those that they expect not to change.

And then, we touch on the part where crawlers try to figure out which pages are actually useful. The web is filled with junk that nobody cares about, pages that have been spammed into uselessness. These pages may still regularly change, they may have reasonable URLs, but they're just destined for the landfill, and any search engine that cares about their users will ignore them. Sometimes it's not just obvious junk either. More & more, sites are technically ok, but just don't reach "the bar" from a quality point of view to merit being crawled more.

Therefore, all crawlers (including SEO tools) work on a very simplified set of URLs, they have to work out how often to crawl, which URLs to crawl more often, and which parts of the web to ignore. There are no fixed rules for any of this, so every tool will have to make their own decisions along the way. That's why search engines have different content indexed, why SEO tools list different links, why any metrics built on top of these are so different.

I felt it would be good to highlight this because it is useful for SEOs to read this and comprehend it.

Forum discussion at Reddit.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
- YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: October 11, 2024

Oct 11, 2024 - 10:00 am
Search Video Recaps

Search News Buzz Video Recap: DOJ May Breakup Google, Ranking Volatility, Quick View Theft, Google Web Creator Event, Google Ads & More Search News

Oct 11, 2024 - 8:11 am
Google Updates

Google Search Ranking Volatility Rumbling Again October 10th

Oct 11, 2024 - 8:01 am
Google Search Engine Optimization

Google AI Overviews Not Linking To Sites Hit By Helpful Content Update

Oct 11, 2024 - 7:51 am
Bing Search

Bing Tests Best List Of Carousel

Oct 11, 2024 - 7:41 am
Google

Google Search Missing Video Tab For Some Users - Bug?

Oct 11, 2024 - 7:31 am
Previous Story: New Google Ads Editor v2.0 Live With Performance Max Campaign Support & More