Algorithmes Google

SEO Optimisation by Understanding BERT: The Complete Guide

BERT is Google's NLP algorithm that understands search queries bidirectionally. Discover how it works (attention, MLM, NSP), its real impact on SEO, and how to optimize your content for it.

Published on juillet 4, 2022 Reading 12 min By Stan De Jesus Oliveira

Définition de Optimisation du référencement (SEO) en comprenant BERT

BERT for Bidirectional Encoder Representations from Transformers is a machine learning algorithm dedicated to natural language processing.

This algorithm was released open source to the scientific community in 2018.

It was on 25 October 2019 that Google officially announced that BERT is now integrated into certain of its services for businesses (Cloud, TensorFlow).

On that same date, Pandu Nayak (vice-president of the Google search engine) stated that the Mountain View company uses it and described this change as the most important modification to the Google algorithm in 5 years (since RankBrain had been launched).

What Is BERT For in Google?

BERT is an algorithm enabling progress towards a semantic search engine. To better match search intents, to achieve better voice recognition, and globally to make Google a more sophisticated search and answer engine.

Google is increasingly moving towards comprehension and seeks to respond perfectly to the user. Users have moreover noticed this and are performing increasingly sophisticated, or “exotic” searches, to use the same words as Google engineers.

Intrinsically linked to semantic SEO, BERT allows several of the problems that had arisen in order to achieve such understanding of language and intent to be addressed.

This includes among other things:

Understanding “textual cohesion” and disambiguating expressions or sentences, particularly when polysemous nuances (which have multiple meanings) could modify the contextual meaning of words. As well as other linguistic problems such as homonyms, the resolution of grammatical anaphoras and cataphoras.
Understanding which entities pronouns refer to, which is particularly useful in long paragraphs with several entities. A concrete application: the automatic generation of featured snippets and voice/conversational search.
determining which named entities a text refers to
Predicting the next sentence.
Answering questions directly in the SERPs.
Coreference resolution.

How Does BERT Work?

BERT is a natural language processing (NLP) technique, based on neural networks.

B for “Bidirectional”, E for “Encoder”, R for “Representation” and T for “Transformers” says it all.

A transformer is an attention mechanism capable of learning the contexts between words in a text and even sub-words.

A Transformer consists of two distinct mechanisms: an encoder and a decoder. The first reads the input, while the second creates the task prediction.

Task prediction with the Google BERT algorithm

Here, we have an input, a sentence but one that omits words. And secondly, BERT predicts the missing words.

Unlike directional models that understand text input in a sequence (right to left or left to right), Transformer encoders are very different. Why? Because they can read a sequence all at once, hence the term bidirectional.

What transformer-based neural networks did not do:

The difference in NLP models between BERT, GPT and ELMo

The context of a word with BERT is bidirectional; it is both to the right and to the left of a sentence, of content. It learns the context of a word based on its entire environment.

In this example, we can see that when we give small things to BERT it is capable of understanding the meaning of words

In this example, BERT is therefore able to understand that depending on the context, the preceding sentence relative to the sentence that follows it makes no sense.

This allowed Google to give a content score more faithful to reality than the old context vector methods.

BERT: Attention!

BERT works with an attention mechanism which is the most important thing to understand. The algorithms are “aware” of everything happening around them but focus on the main point.

Focusing on what is important allows them to function in the same way as a human. Or rather to mimic one. This also allows Google to spend less in terms of computational cost. They are aware of what is happening around them but prioritise what is important.

Transformers are neural networks that are based on attention.

This requires “tokenising” texts by cutting them into pieces. In order to find the important words to determine the context of a word.

View of the attention mechanism of the BERT algorithm

Here, the thinner a line, the less attention is paid.

The sentence is as follows: “The girl ran to a local pub to escape the din of her city.”

If we play with the concentration percentage rate to see more clearly, here is what BERT focuses on in this sentence.

Focus on the attention of the BERT algorithm

The greatest attention here is “to” linked to “escape”.

This teaches you that “stop words” or “useless” words like “the”, “a”, “an”, are now important for Google’s algorithms and that it is therefore no longer relevant to do content spinning.

How BERT Was Pre-Trained

The technical workings of BERT

BERT is an artificial intelligence that acquired its knowledge through immense corpora so that it could be used at Google as a cutting-edge natural language processing technology.

Pre-training: MLM & NSP

Masked Words: MLM

Before introducing sequences of words into BERT, 15% of the words in each sequence are replaced by a [MASK] token. The model then attempts to predict the original value of the masked words, based on the context provided by the other non-masked words in the sequence. Technically, predicting the output words requires:

Adding a classification layer on top of the encoder output.
Multiplying the output vectors by the embedding matrix, transforming them into the vocabulary dimension.
Calculating the probability of each word in the vocabulary with softmax.

MLM stands for Masked Language Modeling.

Masked Word Prediction: NSP

To help the model distinguish between the two sentences in training, the input is processed as follows before entering the model:

A [CLS] token is inserted at the beginning of the first sentence and a [SEP] token is inserted at the end of each sentence.
A sentence embedding indicating Sentence A or Sentence B is added to each token. Sentence embeddings are similar in concept to token embeddings with a vocabulary of 2.
A positional embedding is added to each token to indicate its position in the sequence. The concept and implementation of positional embedding are presented in the Transformer paper.

Masked word prediction, NSP (Next Sentence Prediction)

Learn more: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding – https://arxiv.org/pdf/1810.04805.pdf

NSP stands for (Next Sentence Prediction).

BERT Fine-Tuning

BERT can be used for a wide variety of language tasks, while adding only a small layer to the base model:

Classification tasks such as sentiment analysis are performed in the same way as Next Sentence classification, by adding a classification layer on top of the Transformer output for the [CLS] token.
In question answering tasks, the software receives a question about a text sequence and must mark the answer in the sequence. Using BERT, a Q&A model can be trained by learning two additional vectors that mark the beginning and end of the answer.
In Named Entity Recognition (NER), the software receives a text sequence and must mark the different types of entities (Person, Organisation, Date, etc.) that appear in the text. Using BERT, an NER model can be trained by feeding the output vector of each token into a classification layer that predicts the NER label.

BERT is used for NER (Named Entity Recognition) tasks to identify entities in a text

Impact of BERT on Semantic Search

All of this could also teach you how to target “keywords”.
Source: https://blog.google/products/search/search-language-understanding-bert/

In this example, “to” provided a semantic meaning to the query on Google, so Google displayed different search results.

The impact of the BERT algorithm on semantic search and search results in general

Featured snippet example. Here is an example from Google showing a more relevant code snippet for the query “Parking on a hill without a curb”. In the past, a query like this would confuse Google’s systems. Google stated: “We were giving too much weight to the word ‘curb’ and were ignoring the word ‘no’, not understanding how essential this word was to appropriately answering this query. So we would return results for parking on a hill with a curb.”

Example of the impact of the BERT algorithm on semantic search

What does this mean for Google? How does it concern you?

Focusing on what is important prevents Google from having to bother analysing everything. If in your cooking article you make jokes, perhaps Google does not want to pay attention to them. For you, this concerns SEO optimisation because you should focus on what is important. Always and again, start content with what is important. Write things that are important. Do not produce gibberish. Avoid jumping from topic to topic. Structure your content.

This improves your content score and semantic search.

SEO Optimisation for BERT

The workings of semantic algorithms such as BERT, Knowledge Graph, NLP and others are significant clues to various semantic SEO optimisations.

To optimise BERT and to distinguish things, you must match the search intent. Because that is what BERT focuses on. You must also create FAQs as this is also connected to BERT.

But you should not optimise for BERT, you should not optimise for RankBrain, you should not optimise for Knowledge Graph or Knowledge Vault. At most you could distinguish semantic EAT optimisation, and even that.

No, you should do semantic optimisation.

What does this mean concretely for SEOs?
Keywords are dead (not really)! But SEOs must optimise for topics rather than keywords. For entities. Things. Forget strings of characters. At least do so progressively.

Create FAQs.

Focus on search intent. And well beyond the simple understanding between commercial and informational.

The simplest thing is to use semantic SEO tools.

But to go further, you can try doing fill-in-the-blank tests in your content to see whether it is obvious to understand the context of a sentence. This should be done on the beginning of your content — it is the most important and most sensitive part.
By calling on a technical SEO who masters Python, they can do this algorithmically and automatically.
Start here: https://colab.research.google.com/github/google-research/albert/blob/master/albert_glue_fine_tuning_tutorial.ipynb – Albert is the improved version of BERT.

Last tip regarding BERT: if you wish to calculate BERT scores for your pages you can refer to this page: https://www.anakeyn.com/2019/12/18/score-bert-referencement-seo/

It is also important to specify that BERT allows entities to be established through verbs. Thus, you must think about the triple of knowledge graphs such as the Knowledge Graph. When you mention an entity, is it linked to a verb?

Example: Tomáš Mikolov, […]
Tomáš Mikolov is the inventor of Word2vec, an NLP (Natural Language Processing) method; he is also an author on the FastText architecture, a library similar to Word2vec but which goes further by having n-gram composition at the character rather than word level. […]

Avoid writing sentences that only humans can understand by guessing. Always think about explaining things.
Moreover, this prevents Google from having to bother looking for what you are talking about, which costs a great deal in terms of resources. And therefore, this is also linked to your crawl budget. That is to say that good code quality, page speed, but also clear and descriptive language is important.

The Controversy Over BERT Optimisation for SEO

BERT is not an algorithmic update like Penguin or Panda because BERT does not judge web pages negatively or positively, but further improves the understanding of human language for Google Search. As a result, Google understands much better the meaning of the content of the pages it encounters, as well as user queries by taking the full context of the word into account.
BERT is mainly about resolving the linguistic ambiguity of natural language; it provides text cohesion that often comes from the small details of a sentence providing structure and meaning.

FAQ

What Does BERT Act On?

BERT acts on language understanding (NLP); its applications have various verticals such as refining query understanding, ranking search results, understanding the text of web pages, as well as hunting for featured snippets.

Difference Between BERT and RankBrain?

RankBrain was Google’s first artificial intelligence method for understanding queries in 2015. It also examines queries and page rankings. BERT does not replace RankBrain; it is an additional method for understanding content and queries. It is added to Google’s ranking system. RankBrain can and will always be used for certain queries. A query may use multiple methods, including BERT, to understand the query.

Can You Optimise for BERT?

It is unlikely. But it is possible to calculate a BERT score. I would say that this rather allows you to choose which content should be optimised, i.e. prioritising a page that was poorly done rather than truly optimising a page. Also, this teaches you about semantic SEO writing, semantic search and content optimisation for semantics. Rather think about how the mass of your knowledge could now provide you with valuable help for your holistic SEO mindset.