What Is Google's Indexing Limit?

Apr 19, 2006 • 11:40 am | comments (6) by twitter Google+ | Filed Under Google Search Engine Optimization
 

Found a nice thread at Cre8asite Forums that is good for anyone to catch up on. It concerns the indexing limits Google has for certain websites and pages. When does Google decide to stop spidering all your pages? Why does it just grab the most important pages first?

Having had good experience with getting large numbers of pages spidered. For some large part in my opinion, discovery date and how effectively linked internally and externally pages are what help assign some importance to certain pages and which keep Googlebot coming back for more daily.

The original poster of the thread is trying to understand by Google only spider 20% of his 800 pages. The first clue from the thread is his comment relating to the following: "we only seem to have about 100 pages indexed." Okay that is a start.

Next clue: "We have added about 2000 pages recently and changed the menu". Hmmm... that would probably have something to do with it.

And finally: "the only theory i can think of is the amount of links on a page as the menu alone is about 100 links"

Well, being that Google has spidered only 100 pages, and there is only 100 links in the navigation menu. I would say that Google is not having trouble listing any of the pages, it just can't find them all! This comes down to an information architecture problem relating specifically to the menu organization and IMO some beliefs that might be limiting the webmaster from fully utilizing the navigation. The first mythbuster, is that you can have more than 100 links in a navigation menu and get by just fine. The prevailing thought for a long time was that Google would only spider the first 100 links, and any more was risk for penalty. Not true anymore, times have changed. However, there is still inherent problems with more than 100 links, such as page size which can cap the amount of spiderable links and so on.

In order to understand a bit more about how Google spiders pages and which ones they favor the most. Admin on Cre8asiteforums, bragadocchio, posted some excepts from the Google document called Efficient Crawling Through URL Ordering.

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.

Bragadocchio goes on to define what is said a little more, "Importance metrics, like those defined in the paper, can be combined, so on a site that has a number of pages with higher pageranks, or more inbound links, those might help combat the weakness of a page like that when it comes to a importance metric based upon location and distance from the root directory."

Excellent thread, for continued discussion about Google Indexing Limits visit Cre8asite Forums.

Previous story: Is SEO Really That Easy?
 

Comments:

Chris Beasley

04/19/2006 05:24 pm

"Google would only spider the first 100 links, and any more was risk for penalty. Not true anymore, times have changed. " That was never true, people just misunderstood a usability guideline for penalty or SEO criteria.

SEOJunkie

04/19/2006 07:20 pm

>>being that Google has spidered only 100 pages, and there is only 100 links in the navigation menu. Well, I totally agree with Chris here.

Ben

04/20/2006 03:30 am

Chris, I agree, its why I stated it. Have heard from way to many people who believe that however.

Chetan

04/24/2006 09:54 am

I am facing same problem with my news blog site, google has only indexed 8 pages out of 68 posts. Can anybody review my problem fro the 'URL' http://www.software-outsourcing.netfirms.com/outsourcing-news/

Birendra kumar

05/03/2010 10:43 am

I am facing same problem with my news blog site, Google has not index my posts.

TigerJefferson

10/30/2012 06:18 am

You can simply enjoy the same benefits by using the services of a software outsourcing company in some third world country.

blog comments powered by Disqus