
We have known for a long time that Google can crawl web pages up to the first 15MB but now Google updated some of its help documentation to clarify that it will crawl the first 64MB of a PDF file and the first 2MB of other supported file types.
The 64MB and 2MB items might not be new, but I don't think I covered those before. I know I covered the Google will crawl up to 2MB of your disavow file but no other mentions of 2MB is in my coverage.
This help document was updated to now read:
When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file. From a rendering perspective, each resource referenced in the HTML (such as CSS and JavaScript) is fetched separately, and each resource fetch is bound by the same file size limit that applies to other files (except PDF files).Once the cutoff limit is reached, Googlebot stops the fetch and only sends the already downloaded part of the file for indexing consideration. The file size limit is applied on the uncompressed data. Other Google crawlers, for example Googlebot Video and Googlebot Image, may have different limits.
Then Google also updated this document to add the 15MB limit, but that was not new - it now says:
By default, Google's crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.
Google explained, that "While moving over the information about the default file size limits of Google's crawlers and fetchers to the crawler documentation, we also updated the Googlebot documentation about its own file size limits." "The original location of the default file size limits was not the most logical place as it applies to all of Google's crawlers and fetchers, and the move enabled us to be more precise about Googlebot's limits," Google added.
The more precise details are useful to know.
There is some confusion around the 15MB for HTML files or 2MB files for HTML files and I asked John Mueller who replied on Bluesky saying, "In short (gotta run), Googlebot is one of Google's crawlers, but not all of them." He added, "Google has a lot of crawlers, which is why we split it. It's extremely rare that sites run into issues in this regard, 2MB of HTML (for those focusing on Googlebot) is quite a bit. The way I usually check is to search for an important quote further down on a page - usually no need to weigh bytes." "Sorry for missing this - like I mentioned in the other thread, we have a bunch of different crawlers (I know SEOs focus on Googlebot, but there's life outside of textual web-search :-)), so we have the general limit + the Googlebot-specifics documented," he added later.
Forum discussion at X.
Update: John also commented on Reddit saying:
FWIW you can look at the distribution of HTML byte counts in the HTTP Archive Web Almanac (search for [http almanac Page weight] and scroll down to the "HTML bytes" section). The median on mobile is at 33kb, the 90-percentile is at 151kb. This means 90% of the pages out there have less than 151kb HTML.What generally happens when you go past a limit like this is that the rest just isn't used for indexing, the part above the limit is though. In practice, if you're working with Warren Peas making web pages, make sure to have important stuff - that you think people will want to find - in a reasonable place, and not only on the bottom. You should be doing this regardless, nobody's going to read 1000 pages of text (equivalent of 2mb) in search of something that's on page 1001. If you want to publish a novel, make it a PDF.
And:
If you're curious about the 2MB Googlebot HTML fetch limit, here's a way to check.
Thanks, @tamethebots.com !
— John Mueller (@johnmu.com) February 6, 2026 at 6:48 AM
[image or embed]
And the 2MB seems to be specific to HTML, John added, "this is specific to HTML & co, which is indexed for websearch. Files focused on machine processing can have other limits, for example, robots.txt also has other limits - as defined in the robots.txt docs). None of these recently changed, we just wanted to document them in more detail."
Update again - Google updated the docs once more to clarify it a bit more.

