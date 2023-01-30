Yandex had a boatload of its source code across all its technology allegedly leaked by a disgruntled employee and part of that was the source code for Russia's largest search engine - Yandex. As you can imagine, SEOs and others are diving in and seeing what they can learn from the source code.

I personally did not download the source code, so I did not go through it myself but I wanted to share what people did find via Twitter from their investigations of the source code.

Here's the alpha version of an explorer tool for the leaked #Yandex Search code.



It lets you browse through the ranking factors, view by tags, etc, and start to find connections.



Easy to add new features if there's anything you want to see!https://t.co/AjbYnrDl9P pic.twitter.com/pQ4scOkP6w — Rob Ousbey : @RobOusbey@mastodon.social (@RobOusbey) January 28, 2023

I downloaded the code, analyzed it and there is a lot of useful information for Google SEO as well. pic.twitter.com/RWrgnnlpj6 — Alex Buraks (@alex_buraks) January 27, 2023

Theoretically, what is the difference between algorithms used in Google and in Yandex?



They are quite similar:

- there is RankBrain analogue - MatrixNet;

- they are using PageRank (almost the same as in Google);

- a lot of text algorithms are the same. pic.twitter.com/Djjl8Bmjwn — Alex Buraks (@alex_buraks) January 27, 2023

According to Statcounter Yandex is close to Yahoo and Bing by market share: pic.twitter.com/5GKIvKIvAo — Alex Buraks (@alex_buraks) January 27, 2023

Main insights after analysing this list:



#1 Age of links is a ranking factor. pic.twitter.com/U47uWvEq9w — Alex Buraks (@alex_buraks) January 27, 2023

#3 Numbers in URLs is bad for rankings pic.twitter.com/ECgwGeGUfb — Alex Buraks (@alex_buraks) January 27, 2023

#5 Hard pessimization equal PR=0 pic.twitter.com/RRbhuJyZr1 — Alex Buraks (@alex_buraks) January 27, 2023

#7 Fun fact - there is a separate ranking factor for uplifting Wikipedia pic.twitter.com/799F8KFpkE — Alex Buraks (@alex_buraks) January 27, 2023

#9 Document age and last update both are ranking factors. pic.twitter.com/ay1GTMVEtJ — Alex Buraks (@alex_buraks) January 27, 2023

Right now I checked ~40% of the list, there are a lot more (about text relevancy, behaivor factors, page rank, internal links,etc).



Will continue this thread after some time. — Alex Buraks (@alex_buraks) January 27, 2023

The first thread got a lot of impressions (500k views for the moment, thanks for you retweets and likes!), so I decided to finalize.https://t.co/UQiQsnpWd2 — Alex Buraks (@alex_buraks) January 28, 2023

#2 Additionnaly: ranking factor for orphan pages.



You can easy find them via Screming Frog or other crawlers. pic.twitter.com/zIPwAelpD0 — Alex Buraks (@alex_buraks) January 28, 2023

#4 Number of search queries of your site/url is a ranking factor.



Obviously more = better. pic.twitter.com/xXQ6FMDghP — Alex Buraks (@alex_buraks) January 28, 2023

#6 If your url whould be the last for search session (user will find what he needs) - it whould impact rankings.



There are strict factors for this and predictible factors as well. pic.twitter.com/Zx3sBZORCs — Alex Buraks (@alex_buraks) January 28, 2023

#8 Special ranking factors for short videos (tiktok, shorts, reels) pic.twitter.com/oKPzL09MID — Alex Buraks (@alex_buraks) January 28, 2023

#10 Keywords in URL is a ranking factors.



As we can see from the description - the optimal would be include up to 3 words from the search query. pic.twitter.com/Q1euKWSiST — Alex Buraks (@alex_buraks) January 28, 2023

#14 One more ranking factor for content quality - broken embedded video on the page.



Embed videos - good for rankings.

Broken embed videos - bad. pic.twitter.com/2SUys65PHp — Alex Buraks (@alex_buraks) January 28, 2023

#16 If you backlinks anchors contain all words from the keywords - it's good for SEO.



If it is in a one link - it's more beneficial. Especially if the order of words is the same. pic.twitter.com/WrbESJ8Da5 — Alex Buraks (@alex_buraks) January 28, 2023

#18 The quality rank of texts on the domain is a ranking factor.



Pages with low quality content affect the entire domain. pic.twitter.com/MJUCTVB9CH — Alex Buraks (@alex_buraks) January 28, 2023

#20 Funny, there is a random as a separate ranking factor.



When you don't understant why some of page is on top - it could be just random (to test behaivor factors). pic.twitter.com/TGtzFrmBOV — Alex Buraks (@alex_buraks) January 28, 2023

#22 Backlinks from the top 100 best websites by PageRank impacts on rankings.



That's not news. pic.twitter.com/ikxldWLJqy — Alex Buraks (@alex_buraks) January 28, 2023

Wow, I just found the list with initial weights of Yandex ranking factors.



Do you need one more thread? 😁



P.S. final weights calculated by AI (matrixnet), but initial values are useful as well. pic.twitter.com/WeroYQy7Yu — Alex Buraks (@alex_buraks) January 28, 2023

That said, I've been digging into the codebase myself to find things of interest.



I'm doing this live, so I don't know how long it will take between tweets. — Mic King (@iPullRank) January 27, 2023

A lot of the code related to Yandex Search lives in the Kernel, ExtSearch, Search, and Robot archives, but again I won't be able to be comprehensive here until I've looked through everything. — Mic King (@iPullRank) January 27, 2023

Some really interesting things in the web_meta_factors_info/factors_gen.in file as it relates to content features and factors.



For instance, some things that we'd expect like a minimum expectation of the proximity of words in a title to the words in the query. pic.twitter.com/YRsrCpVsqU — Mic King (@iPullRank) January 27, 2023

Interestingly, there are a lot of scrapers in here Google News, Shopping, YouTube and even other Yandex services. — Mic King (@iPullRank) January 27, 2023

Hmm...this might be the structure of how Yandex stores documents in their version of a doc server.



Still looking for an idea of how they structure their inverted index. pic.twitter.com/1lwTbOirnx — Mic King (@iPullRank) January 27, 2023

Here's a protobuf of link factors. pic.twitter.com/1RM6o1xzRg — Mic King (@iPullRank) January 27, 2023

In the "link prioritizer code" they talk about decreasing the priority of links with the same text from the same host. In other words, don't count the links from duplicate content. pic.twitter.com/dQTUnScCUy — Mic King (@iPullRank) January 27, 2023

How did y'all come up with that number of ranking factors?



I see 481 factors just related to "Rapid Clicks" pic.twitter.com/sw5A3ia3Bk — Mic King (@iPullRank) January 28, 2023

Similar to the Googs, Yandex has multiple ranking models to choose from.



In this select_ranking_models.cpp file, they talk about having different models for different languages and locations. pic.twitter.com/m210tpOUDb — Mic King (@iPullRank) January 28, 2023

I'm gonna go watch TV, but I obviously have to add this to my book so I'm gonna add more over the next couple days — Mic King (@iPullRank) January 28, 2023

Been digging into how this robot archive is structured.



It looks like the Zora directory is where a lot of interesting things are happening. There's a limits.pb.txt file that stores the requests per second rate for the host and the IP address for 204k hosts. pic.twitter.com/0oulKm58dx — Mic King (@iPullRank) January 28, 2023

Here's where the Document and Query factors are collected and scored.



Looks like it goes to storage after this tho. pic.twitter.com/qJAiLfSrsU — Mic King (@iPullRank) January 29, 2023

Ok, real quick, top 5 most positively and negatively weighted ranking factors and their coefficients in the initial weighting in Yandex's document relevance calculation. Negatives first



#1 FI_ADV: -0.2509284637



This factor determines that there is advertising on the site. — Mic King (@iPullRank) January 29, 2023

#3 FI_QURL_STAT_POWER: -0.1943768768



Factor is the number of URL impressions for the request — Mic King (@iPullRank) January 29, 2023

#5 FI_GEO_CITY_URL_REGION_COUNTRY: -0.168645758



Factor is the geographical coincidence of the document and the country that the user searched from.



Ok, now for the top 5 positively weighted factors. — Mic King (@iPullRank) January 29, 2023

Here is a starting point for link related factors.https://t.co/fwP8TxuOrM — Christoph C. Cemper 🇺🇦 🧡 SEO (@cemper) January 30, 2023

Will this help you do SEO on Google? Probably not but hey, it is super interesting.

Ah, but once they find the optimal word count ...



BOOM — John Mueller is watching out for Google+ 🐀 (@JohnMu) January 29, 2023

Forum discussion at WebmasterWorld.