On January 27, 2023, Russia’s biggest and the world’s 4th largest search engine by market share, Yandex, had its source code repository leaked by a former employee.
The Yandex leak was shared in a file that contained what’s now believed to be 17,854 search ranking factors.
Naturally, the Yandex SEO leak became a trending topic in the SEO community.
The SEO industry is built on determining and optimizing for factors search engines like Yandex and Google use to rank web pages. Knowing what factors influence rankings and what weight each carries is literally the cheat code for ranking higher.
Many so-called ranking factors are assumptions drawn from algorithm updates, official guidelines, and the little information engineers at companies like Google drip out. The list of ranking factors itself is a closely guarded secret. That’s why this leak is so significant.
It’s now been a few days since the Yandex source code leak. Enough time to decipher all the insights different industry professionals have drawn from the leaked information. Before we share our major learnings, let’s put a few issues into perspective.
Yandex is Not Google
Much has been made of the vital detail that former Google engineers built the Yandex search engine. Other engineers have moved between the two companies since.
The assumption is that since Yandex was built by and has on its engineering team former Google employees, there will be some similarities in how the two search engines function.
The main interest in all this is, of course, to get some idea of how Google’s index works. Make no mistake, that’s the prize here.
But since the two search engines have evolved since Yandex was first built, it’s hardly Google’s clone any longer. Yandex and Google also both admit that their results are now largely driven by machine learning, which means they are less reliant on any assumed list of ranking factors.
What the Yandex Leak Has Taught Us
Yes, Yandex is not Google, but how the two work is not too dissimilar. So the leaked source code files offer important insights into how search engines in general work.
As Alex Buraks, an SEO expert who also translated some of the documents from Russian to English, has said, search results between Yandex and Google match 70 percent of the time.
So there is quite a bit to be learned from this leaked Yandex data even if you are primarily focused on Google SEO. The first insight is that Yandex has negatively and positively weighted ranking factors. Let’s start with the negative.
Negative Ranking Factors
We concentrate a lot on factors that we have to optimize for, perhaps neglecting those we shouldn’t optimize for as much.
As well as those are things we can’t do much about, like the age of the content, which this leak has shown to be a Yandex ranking factor. We will look at those factors that you can control.
Content freshness trumps content velocity
Yandex prefers fresher content. No surprises there. Yandex will punish your page if it’s older than 10 years or its publishing date can’t be determined.
There are two things to note here. One is you have to update your content regularly. Yes, you want to maintain a high content velocity, but it’s just as important to maintain a high standard of quality and to keep that content fresh.
The second is to ensure that, when you update your content, search crawlers can easily see it. We will talk about content some more when we look at the positive ranking factors.
Watch your commercial anchor text
Another negatively weighted ranking factor that stands out is inbound links with commercial anchor text. Anchor text is a big topic for SEOs but perhaps not from the perspective we have learned from these Yandex files.
With Yandex, you have to keep a close eye on how many of your backlinks have commercial anchor text. If their percentage exceeds 50 percent, they will dilute the juice of your good links. So pay attention to anchor text distribution when building links.
The biggest peeve for Yandex, though, are banner ads. Those get the heaviest penalty, which is sensible from a user experience perspective.
Positive Rankings Factors
Let’s now look at the positively weighted ranking factors.
Clean your site of poor-quality links
Even though Yandex does not define what bad links are, their percentage on your site can dilute the power of your good links.
We can all agree that links from low DR sites and pages that cover a topic unrelated to what your page is about are the bad links being referred to. Also not as valuable are links from websites that cover multiple topics.
Yandex obviously cares a lot about the quality of the links pointing to your site. Apparently, they care as much about your outbound links too. Enough to reward sites that link out to Wikipedia.
Not that it used to be, but link-building should no longer be a priority for Wikipedia. Perhaps change the focus to disavowing links from low-quality sites that should now be scrambling to link to them.
The age of your links is another positive ranking factor. Essentially, older links have more value. So you don’t want to leave it late to start building links. Unless of course if building links is not part of your strategy or you prefer to get them naturally.
More traffic equals higher-ranking power
We typically look at traffic as the prize for good SEO. But an insight from the Yandex leak is that traffic itself is a ranking factor. The more traffic you attract the more ranking power you will gain.
But a caveat is that if you go hard looking for traffic and target keywords you have little chance of ranking for, you are going to sabotage yourself.
That’s because another ranking factor is your site’s average ranking for all its queries, which means ranking for many keywords without necessarily getting traffic for them will hurt your site.
The advice is to only target keywords you are confident of ranking for. It’s OK to target lower volume keywords and build towards the higher volume ones as your ranking power grows.
Another interesting traffic-related ranking factor is the diversity of your traffic sources. You don’t want all your traffic to come from Yandex/Google. That will raise suspicion of possible sneaky SEO tactics
You would rather some of that traffic be paid but preferably direct or social. In fact, Yandex rewards direct and social traffic with higher ranking power.
Direct traffic, especially by a returning visitor, says there must be something you are doing right that keeps people coming back. That’s an important user signal and positive ranking factor, but there are others that Yandex also rewards your site for.
Search engines are happiest with your site when users end their search on your site. It shows that a user has found what they came to a search engine for.
More users ending their search on your site along with other user behavior signals like CTR and dwell time also validates the search engine’s ranking criteria.
Pogo-sticking behavior where searchers bounce from a page they have clicked through to and back to the SERPS shows the pages the search engine is ranking aren’t satisfying search intent. You would assume this is where the machine learning algorithm kicks into gear.
Twitter blue is a positive ranking factor
Maybe that subhead is a little misleading. But Yandex does indeed consider sites with verified social pages to be more trustable. So overall it is a smart strategy to up your social game.
Make your content as shareable as possible. Put some effort into your content, adding original images and, of course, making sure that the content is helpful and as comprehensible as possible.
People will naturally share and recommend your content if it’s good. But don’t only wait for your readers to share your content. Share links to your content on your social pages for those followers who don’t use Google as much.
Speaking of content quality:
Publish more quality content
Good quality content of course performs better on Google. But you certainly don’t want to have three pages with outstandingly thorough, well-optimized content and 20 very thin ones.
What will happen is that those poor-quality pages will dilute the rankability of the few good pages on your site. You want to have a higher proportion of good than bad content on your site, which means putting quality over quantity.
As well as sharing it on their social media, readers are also more likely to bookmark your content if it’s good. They may do that because they want to come back and read it when they’ve more time to digest it or if it’s too good they want to reference it in the future.
Yandex sees it as a quality signal and gives you higher ranking power if more people bookmark your pages.
Of course, you want to regularly go back into your old pages to improve their quality and update them for freshness. As we have discussed already, that’s a positive ranking factor in its own right.
Even though they have not declared it a ranking factor, we know Google wants your content to be helpful for users. They have gone so far as to publish their guidelines for helpful content, even advising deleting unhelpful content to improve your chances of ranking.
Yandex, as we also assume of Google, also rewards longer content. So you want to shoot for long-form articles that cover a topic fully. Avoid fluff, though, because word count will not matter much if the content is not helpful to users.
From an on-page optimization angle, there are several issues you have to pay attention to. Having numbers and too many slashes in your URL, for example, reflects badly on your site.
As we have known already, putting keywords in your URL is an on-page SEO best practice but now we know it’s a Yandex ranking factor. Or at least it used to be since that ranking factor is marked as deprecated on the leaked source code.
An odd positive on-page SEO ranking factor we have found from the Yandex leak is the number of capital letters in the SEO title.
Could a title in all CAPS be considered over-optimization? Maybe indirectly as it would likely hurt your CTR.
Prioritize good quality hosting
Your website’s host determines how accessible your web pages are. If your pages take too long to load or are constantly down, users will be annoyed. It turns out Yandex feels that has to be discouraged with a ranking penalty.
Another notable technical SEO issue you want to stay on top of is how many clicks it takes to get to your pages from your home page. You don’t want any pages that you want to rank to be buried deep inside the site.
Neither should you have any orphaned pages, so make sure to add internal links to every page on your site. Reduce crawl depth by making sure that every page is no more than two clicks from the homepage.
How Much of All This Applies to Google?
We do not know how many of the Yandex ranking factors we now know apply to Google. And, if any of them do apply to the search engine that really matters, how much weight does Google assign to them?
There is also Yandex’s response to this leak. The company was quick to downplay the effect of this leak on the integrity of its search results saying:
‘appears to be old fragments differing from the current version of the company’s repository’.
The company goes on to say some of the leaked ranking factors were only ever used for internal testing purposes, meaning they aren’t actually a part of their algorithm.
You would expect the company to come out with this line as it seeks to head off criticism over how this could have happened. The leaked data also shows 988 of the ranking factors as deprecated and 244 as unused.
Poring over the leaked data, Michael King noted in an article for Search Engine Land that Yandex possibly has tens of thousands of ranking factors, of which only a fraction have been exposed by this leak. The ranking environment, as is that of Google, appears to be quite dynamic.
That said, there are a lot of similarities between the two search engines’ ranking infrastructure. For example, Yandex has separate ranking factors for Your Money Your Life (YMTL) topics, which have become a major subject within the Google SEO community.
MatrixNet, which was later superseded by Catboost, was also similar to the early version of Google’s RankBrain algorithm.
Helpful Content and Good User Experience Equals Higher Rankings
More than learning how Google ranks web pages, this leak has helped broaden our understanding of modern search engines. It’s helped us understand how to target traffic and what according to search engines is helpful, user-first content and intuitive information presentation.
We have learned that Yandex, as possibly Google, penalizes CTR, user behavior, and link manipulation. This leak has also shown Yandex shares Google’s disdain for pages where adverts make it hard to consume your content.
The Yandex source code leak has, therefore, confirmed the one clue Google has never been shy to tell us about how they rank web pages. That its algorithms are purposely engineered to recognize and reward good user experience, and to punish the inverse!
Take a Look at the Yandex Source Code Leak
The Yandex source code leak makes for a fascinating look inside a search engine algorithm and how search engines rank and value content.
Several ex-Googlers actually work on the Yandex search team and so you can be sure that several of the factors outlined below are also part of Google’s algorithm.
The longstanding belief of 270+ search ranking factors was always suspected to be on the light side. And now with Yandex’s 1,922 search ranking factors being leaked, it practically confirms that 270+ is dramatically lower than the actual count.
At the end of the day, this is a treasure trove of information to help you in your SEO and content strategy. We will continue to adapt these strategies for how we produce content with the Content at Scale platform as well.