With the Google monopoly remedies ruling from the other day, we have even more documents from the court mentioning more about Google's search index, spam score, PageRank, page quality, Glue and more.
This is all in addition to all the DOJ documents we covered earlier and that big search leak, which Google did end up responding to. We also covered yesterday the Google FastSearch bit on grounding for Gemini and user interactions and data from today.
Most of these were spotted by Marie Haynes, but I dug maybe a bit deeper to pull out more references that I found.
I should note, just because these court documents have these statements, it doesn't mean these are used in Google Search today and these statements were also given by non-Googlers:
Google Search Index
What is stored in Google's search index? Document ID, URL map, time stamps, spam scores, etc:
Super interesting information here on what is stored in Google's search index.
— Marie Haynes (@Marie_Haynes) September 3, 2025
- each document has a DocID
- there is a DocID to URL map
- each DocID has a set of signals, attributes or metadata, some derived from user data
These include:
- popularity as measured by user… pic.twitter.com/MlabMDu8r3
Spam Score vs Page Quality
Google determines what to crawl based not just on spam score but also quality and popularity signals:
Not getting crawled? It could be related to your spam score.
— Marie Haynes (@Marie_Haynes) September 3, 2025
Quality and popularity signals help Google determine how frequently to crawl web pages. pic.twitter.com/Fn8wfGBVdk
PageRank vs Webpage
PageRank is a key quality signal that is one component of the quality score but "most of Google's quality signal is derived from the webpage itself."
Now this is interesting!
— Marie Haynes (@Marie_Haynes) September 3, 2025
PageRank is a key quality signal that is one component of the quality score.
However, it turns out that "most of Google's quality signal is derived from the webpage itself." pic.twitter.com/3w6CBNIx8C
Glue
Glue logs the query and user data to help with signals and ranking:
Glue is a query log that collects data about a query and the user's interaction with the response.
— Marie Haynes (@Marie_Haynes) September 3, 2025
The data includes:
- text of the query, language, user location and device type
- what appears on the SERP
- what the user clicked on hovered over and how long they stayed on… pic.twitter.com/MnS1pTc4Vq
RankEmbed BERT
Google has RankEmbed BERT which is a learning ranking model that uses 70 days of search logs plus scores generated by human quality raters:
Oooh, next is RankEmbed, now called RankEmbed BERT.
— Marie Haynes (@Marie_Haynes) September 3, 2025
It's a deep learning ranking model that uses 70 days of search logs plus scores generated by human quality raters.
It has strong natural language understanding which allows it to more efficiently identify the best documents… pic.twitter.com/oxJKkCTRyr
What else did you find in the court ruling PDF?
Forum discussion at X.