Algorithmes Google

Learning to Rank (L2R): Understanding It for Your SEO

Learning to Rank (L2R) uses machine learning to build ranking models for search engines. Discover how it works, why it makes universal ranking rules impossible, and how CTR influences rankings.

Définition de Learning to Rank (L2R)

Learning to Rank (MLR) is machine learning applied to building ranking models, such as those used in information retrieval systems (IRS).

The first approaches to a similar algorithm can be found at Altavista:

An Altavista patent on a Learning to Rank algorithm

As you can see in this patent, the algorithm determines the important features for ranking and then weights them. The algorithm checks whether it has obtained better results, then weights them again. And so on.

Understanding Learning to Rank for Your SEO

How do you weight hundreds of ranking factor signals to decide the score of a page for a given query? What if a page has good content but no backlinks? And what if a page has lots of backlinks but not good content? That is the whole principle behind Learning to Rank.

A Learning to Rank algorithm weights all the signals based on a given page. For example, an e-commerce site on a search intent query like “buy coffee machine” will have different ranking criteria from an informational page that explains how to use a coffee machine.

If all the ranking criteria do not change, their coefficients do. For example, if you are in a scientific stream, the coefficient of your maths grade will be more important than that of your English grade.

So, ultimately, the page for “buy coffee machine” will not need thousands of words to rank for the query, unlike the page that needs to detail and explain how to use the coffee machine.

Thus, one might imagine that the content importance criterion would have a coefficient of 1 for the first page, but with a coefficient of 9 in terms of popularity (backlinks). This could be the reverse for the second page, i.e. “how to use a coffee machine” would have a coefficient of 9 on content and a score of 1 for popularity.

Whether it is Google or an SEO professional, if someone claims that such and such a criterion is more important than another, in reality because of or thanks to this algorithm, nobody can know what needs to be done, even if they work at Google.

Learning to Rank is based on artificial intelligence — more precisely deep learning with neural networks (TensorFlow). Furthermore, nothing is written in black and white and nobody can claim which metric will be more important to take into account than another. Although statistically, commercial queries typically do not need revolutionary content — it is often even more appreciated to copy and paste the same thing as competitors so that internet users do not get confused for the same product reference. Google has understood this very well and does not penalise it. It looks for other ranking criteria to judge whether this page selling the same product is better than another.

You can imagine any ranking criterion; a Learning to Rank algorithm could do this for all signals. To name a few:

  • Popularity: trust, topical PageRank, spam mass, etc.
  • Backlinks from authority sites in .edu, .gov
  • TF-IDF, Salton cosine, word2vec, fasttext (is the content good?)
  • Presence of keywords in H1, H2 titles, in URLs…
  • TTFB, loading speed
  • Content duplication (which is not taken into account for purely commercial queries precisely)
  • Age of the page, of the domain
  • etc.

Learning to Rank Learns from Rankings

Learning to Rank, as its name suggests, is an algorithm that learns to rank pages. If it weights ranking criteria based on a given page and query, it could also analyse interaction behaviour.

For example, calculating whether your site is relevant by analysing the CTR (Click Through Rate) of a site in search results.

The CTR is the number of clicks relative to the number of impressions. An impression is a display. If a site is displayed 100 times for a query and its number of clicks is 10, the CTR is 10%.

When an internet user arrives on a SERP (search result), which site will they want to visit? Between rich snippets and the attractiveness of titles and meta descriptions, an internet user will click on one link rather than another. If nobody clicks on a site even though it is on the first page, Learning to Rank will learn that the site in question may not be relevant and will cause its ranking to drop. Or conversely, if everyone clicks on the link, it then climbs in the search results.

Even if this is much more nuanced and controversial, it is possible that it also analyses your time spent on the page after a click. If you do not correspond to an average time spent on a page that is consistent with the search intent of the query, then it seems that you are not relevant.

For example, for the query “how to grind coffee beans”, let us imagine that the average time spent on the ranked pages is 3 minutes — if the majority of your internet users leave your site within 20 seconds, there is a problem.

For reasons of computational cost, Learning to Rank would be an algorithm that only has an impact on pages that are in the top 10. It is therefore preferable to optimise CTR only from the moment your pages are in 7th, 8th, or 9th position. That is where working on meta descriptions, for example, will truly have an impact as a ranking factor.

I briefly discuss algorithm optimisations in the article on technical SEO, such as adding video to retain the visitor or adding schema markup to increase CTR.