SEO sémantique

SEO in a Semantic Web: The Complete Guide to Semantic SEO

Semantic SEO means optimizing for entities, relationships and meaning — not just keywords. This comprehensive guide covers the Knowledge Graph, Wikipedia/Wikidata, BERT, RankBrain, MUM, vector spaces and how to optimize for Google's semantic search engine.

Published on août 11, 2022 Reading 44 min By Stan De Jesus Oliveira

Définition de L’optimisation des moteurs de recherches (SEO) dans un Web sémantique

Search engine optimisation in a semantic web consists of creating a network of content in a relevant and meaningful structure for each entity around a topic. Semantic SEO connects terms, entities, and facts together with factual accuracy and relational relevance. By focusing on entities and what revolves around them rather than on keywords, it aims to better satisfy the user’s search intent and to appear relevant on a topic.

Modern search engines like Google or Bing are semantic search engines that understand relationships between entities; they look for intentions and increasingly understand information on the Web by structuring it. Thus, creating a content structure already organised with clearly connected entities is important for the semantic search engine and therefore for semantic SEO.

The semantic web also aims to allow indexing robots to better grasp the meaning of web pages and to be more efficient in their indexing approach and therefore their information retrieval.

In an extreme vision, the world can be seen only through connections, nothing else. We consider a dictionary as the repository of meaning, but it only defines words in terms of other words. Information is truly defined only by what it is linked to and how it is linked.

There are billions of neurons in our brain, but what are neurons? Just cells. The brain has no knowledge until connections are established between neurons. Everything we know, everything we are, comes from the way our neurons are connected.

There is nothing else to signify.

Tim Berners-Lee

History of the Semantic WWW

Do you know a fascinating man named Tim Berners-Lee?

A British computer scientist, he is the principal inventor of the World Wide Web (WWW).

He chairs the World Wide Web Consortium (W3C), an organisation he founded.

The objective of this proposal is the sharing of computer documents, which Tim Berners-Lee had the idea of realising by associating the principle of hypertext with the use of the Internet.

It was in May 1990 that he adopted the expression World Wide Web to name his project.

Everyone knows the rest of the story — “the internet” as we know it.

But why do I mention him?

Since the birth of the W3C at the first international WWW conference is concomitant with a new idea from Tim Berners-Lee: the semantic web.

At that conference, he was already explaining that “the web needs semantics”: he was seeking to go beyond the logic of hypertext with the aim of connecting the Web to the real world through semantics.

The Idea Behind the Semantic Web

The Web was a sea of unstructured data from the start. That is how it was invented.

The idea of the semantic web is a system that allows machines to “understand” words and their relationships. Such “understanding” requires that relevant sources of information have been semantically structured beforehand.

The semantic web, or semantic web, is the purest definition of Web 3.0. Standardised by the World Wide Web Consortium (W3C), these standards encourage the use of standardised data formats and exchange protocols on the Web, drawing notably on the Resource Description Framework (RDF) model.

Applied to the Web and search engines, this model extends the hypertext link network of human-readable web pages by also inserting structured data that is much clearer for machines.

Indeed, there are two categories: human-readable documents and machine-readable data.

Here is an example of a classic HTML page:

<item>cat</item>

Here is an example of a semantic HTML page:
<item rdf:about="http://dbpedia.org/resource/Cat">Cat</item>

However, metadata indicating self-referencing has been widely criticised. A consequence of abusive SEO since 1999.

Semantics in JSON for robots vs HTML for humans

Google, a Semantic Search Engine

How did Google make their way towards a semantic search engine (structured search engine)?

It all began with the idea of a knowledge base (graph), an idea that actually dated back to 1997.

In the meantime, Sergey Brin wrote a patent on “extracting patterns and relations from scattered databases such as the world wide web” in 1999. This appears to be Google’s first attempt to organise data in a machine-readable form.

But the early stirrings began taking real shape in 2010 following the purchase of a knowledge base called Freebase, created by the company Metaweb. Freebase would form the beginning of Google’s knowledge graph, known as the Knowledge Graph.

Freebase is a knowledge graph that was created and structured manually by volunteer humans.

That said, it was in 2011 that everything started falling into place, with the birth of schema.org founded by Bing, Google, Yahoo and Yandex. The idea was to present webmasters with a single vocabulary. This is how structured data such as the JSON-LD format came into being to organise the information of web pages.

In the same year (2011) Google announced “Structured Search Engine” which can be translated as a structured search engine to structure Web information:

In 2013, a man named Amit Singhal, Search Advocate at Google, introduced the Knowledge Graph for a better understanding of the Web.

The Knowledge Graph: things, not strings
Amit Singhal

Here is a timeline of the major milestones of Google towards a semantic web:
Google timeline towards a semantic search engine
Given that everything started with Freebase and the Knowledge Graph, and that this is ESSENTIAL to understand, that is where we will begin.

How Is a Knowledge Graph Built?

Knowledge graphs, including Google’s called the Knowledge Graph, are made up of entities connected to other entities through relationships.

Simply explained, here is how a Knowledge Graph is built:
We have an entity called “Leonardo da Vinci”, he was born in 1452, he is an artist, he painted the Mona Lisa, etc.

Semantic SEO is purely a question of entities, connections, relationships.

In macro view, here is a small knowledge graph.

Node A and Node B are two different entities. These nodes are connected by an edge that represents the relationship between the 2 nodes. This is the smallest knowledge graph we can build — it is also known as a triple.

Semantic triple

For example, the Wikidata knowledge graph has approximately 100,000,000 nodes in 2022.

Although I will not teach you to build knowledge graphs as that is not the subject, it is still important to understand a little more about how this works.

If Node A = Putin and Node B = Russia, then it is highly likely that the edge is “president of”:
Subject Predicate Object - triple
A node, an entity, can obviously have several relationships. Putin is not only the president of Russia; he also worked for the security agency of the Soviet Union, the KGB.

Which gives us:

This is how knowledge graphs work: whatever their names, their specificities, they are composed of entities connected to other entities through a relationship that links them together.

What Is an Entity?

In SEO, an entity concerns topics that can be linked to search engine knowledge graphs, such as the Google Knowledge Graph.

Wikipedia has acted and continues to act as a primary trusted base for the Knowledge Graph. Thus, and to simplify, we can call an entity any topic that can be attached to a Wikipedia article page. This is obviously more complex in reality because the KG comes from other knowledge bases and they would also be capable of auto-generating triples through the Knowledge Vault.

Wikipedia Entities

Google has its own knowledge graph, the Google Knowledge Graph, but uses other knowledge graphs and/or knowledge bases such as those of Wikipedia and Wikidata to provide enriched knowledge panel snippets on search results (Knowledge Panel in English).

Wikipedia knowledge panels and Google Knowledge Graph

Google also uses Wikipedia for other things such as training its models.

For example, Google leverages the Wikidata Knowledge Graph for KELM and REALM.

The workings of the Knowledge-Enhanced Language Model (KELM) for semantics at Google

TEKGEN is a large training corpus of heuristically aligned Wikipedia text and Wikidata KG triplets, a text-to-text generator (T5) for converting KG triplets into text, an entity subgraph creator to generate groups of triplets to be verbalised together, and finally, a post-processing filter to remove poor quality outputs.

The result is a corpus containing the entirety of Wikidata KG in natural text form, which we call the Knowledge-Enhanced Language Model (KELM) corpus. It consists of ~18 million sentences covering ~45 million triplets and ~1,500 relations.

In short, what is it for? This has real-world applications for knowledge-intensive tasks, such as question answering.

Moreover, such corpora can be applied in the pre-training of large language models, and can potentially reduce toxicity and improve factuality.

Wikimedia and Google do have a form of partnership. Source: https://meta.wikimedia.org/wiki/Overview_of_Wikimedia_Foundation_and_Google_Partnership

Wikipedia and Wikidata

Wikipedia is one of the projects connected to Wikidata. Each Wikipedia article now has a unique identifier in the form of an IRI and constitutes an entity in Wikidata. Each entity is composed of several properties each having one or more values (triplets).

The value of these properties can be another entity, but also a string, a number, a date, etc.
The data thus structured is reusable in various formats (XML, JSON, Turtle…) and can ultimately be used to feed Wikipedia’s infoboxes, thereby avoiding having to manually modify them in all languages since every modification to Wikidata updates all infoboxes simultaneously.

Theoretically, you could display a Wikipedia page to designate the entity you are referring to in your text, or you could indicate the Wikidata entity URI for better semantic SEO.

<span property="birthPlace" typeof="Place" href="http://www.wikidata.org/entity/Q1731">

<span property="name">Dresden</span>

Composition of the Google Knowledge Graph

Composition of the Google Knowledge Graph
The Google Knowledge Graph, originally derived from the Freebase knowledge base, has today been greatly expanded, mixing datasets for artificial intelligence training, sites without structured data, Wikipedia, Wikidata and many other sources.

How Google Extracts Unstructured Data

Data types are multiple and include texts, video, audio (unstructured data) and structured data.

For extracting information needed to build the graph from unstructured texts, they need natural language processing (NLP) techniques:

Google therefore uses NLP technology to extract entities present in your texts.

For example, if we take the sentence:
“It is important to understand how the Knowledge Graph works and entities in order to understand why and how to optimise it for SEO.”

Here is what Google understands from our sentence:

Google NLP API
It will retrieve entities (named or not) from our text, and will match them against its knowledge graphs to understand the various relationships with entities present in your sentence.

Image from a Google patent showing entity extraction using NLP linked to the Knowledge Graph:

Entity extraction with NLP technology and the Knowledge Graph
It will then understand the sentiment coming from our sentence, how the sentence is built and the related category (BERT).
Sentiment and syntax of content with Google NLP

All Google NLP categories are accessible via: https://cloud.google.com/natural-language/docs/categories
Patents concerning entity extraction can be found at: https://gofishdigital.com/blog/entity-extractions-knowledge-graphs/

Google Knowledge Graph API

After NLP has extracted the entity from your text, you would presumably want to know what it understands about this entity.

Using only the NLP API you will get a short description. Using the one dedicated to the Google Knowledge Graph you will be able to discover what they know around an entity.

If we take the entity “Search Engine Optimization”, here is what it understands about search engine optimisation:

Using the Google Knowledge Graph API for the SEO entity
There is a probability index of what the most expected result is. In this case it understood what search engine optimisation is:

This is also linked to an entity “Search engine optimization metrics”

"articleBody": "A number of metrics are available to marketers interested in search engine optimization. Search engines and software creating such metrics all use their own crawled data to derive at a numeric conclusion on a website's organic search potential. ",

But I explicitly searched for the full word and not the acronym. Because at first glance, Google understands ‘SEO’ as:

"description": "Capital of South Korea", "detailedDescription": {"articleBody": "Seoul, officially known as the Seoul Special City, is the capital and largest metropolis of South Korea. According to the 2020 census, Seoul has a population of 9.9 million people, and forms the heart of the Seoul Capital Area with the surrounding Incheon metropolis and Gyeonggi province. ","url": "https://en.wikipedia.org/wiki/Seoul",

And in fact, if we go back to the NLP API, here is what it said about “SEO”:

The entity of the word SEO for Google
With the OpenGraph API you can explore all the details. We can therefore understand why Google had detected it as an organisation.
Entity score results for SEO with the KG API
It understands many things but the fact that it is an organisation is just a high score, but it also thinks it can be linked to many other things.

A character string like “SEO” is assigned to an ID in the knowledge graph. This applies to all entities.
For example: @id: /g/11fw71_nbj equals the string Jason Barnard. Jason Barnard being an entity.

You can also search via the Kalicube tool to see whether your Name (or other), is known to the graph without having to create a Google API key as I did just before:

Searching a KG entity with the Kalicube SEO tool

Otherwise, there is also the Merkle SEO toolkit including Google Knowledge Graph entity search, exportable to an Excel file.

Searching a KG entity with the Merkle SEO tool

You can also use my SEO tool to explore Google Knowledge Graph entities:

SEO tool to explore Google Knowledge Graph entities

The Google Knowledge Graph API is accessible at: https://developers.google.com/knowledge-graph#typical_use_cases

Knowledge Vault

The Knowledge Vault is a fusion of all data, combining knowledge graph, text (unstructured data), and structured data.

We use supervised machine learning methods to fuse these distinct information sources. The Knowledge Vault is considerably larger than any previously published structured knowledge repository, and has a probabilistic inference system that computes calibrated probabilities of factual accuracy. We report the results of several studies that explore the relative utility of different information sources and extraction methods.

Google

The Knowledge Vault could therefore allow Google to validate information and then, if this is the case, integrate it into their Knowledge Graph.

Summary of how the Knowledge Vault works:

How the Knowledge Vault works

Learn more: https://cikm2013.org/slides/kevin.pdf

Difference Between the Knowledge Vault and the Knowledge Graph?

KG consists of two parts: a knowledge base and an inference engine (deriving conclusions from a base of facts and a knowledge base). The knowledge base is a dataset with formal semantics that can contain different types of knowledge, for example rules, facts, axioms, definitions, primitive statements.

Vault cannot be classified as a true knowledge base, because it extends the idea of a pure semantic store with reasoning capabilities and therefore resembles more of a knowledge-based system.

This resource is interesting: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1054.8298&rep=rep1&type=pdf

Summary of how the Knowledge Vault and the Knowledge Graph work:

Knowledge Graph vs Knowledge Vault

Knowledge Based Trust (KBT)

Knowledge Based Trust or Knowledge Based Trust (KBT) focuses on the open web on “information accuracy”, not “PageRank”.

Knowledge Based Trust involves triplets, fact extraction, accuracy verification and text comprehension through disambiguation.

Knowledge Based Trust can be acquired by providing semantic content networks (masses of semantically linked content) that have strongly connected components in the article, based on different but relevant contextual layers.

Knowledge Based Trust. PageRank vs KBT
https://www.youtube.com/watch?v=Z6tmDdrBnpU
Above, you will see an example of a Knowledge Based Trust presentation by Luna Dong. She shows how a search engine can focus on “internal ranking factors” rather than external ranking factors.

A high PageRank alone cannot represent high quality and precision of content.

EAT

E-A-T for expertise, authority and trustworthiness, is a concept introduced by Google in 2014.

Everyone knows more or less what it corresponds to, so we will rather explore EAT in a semantic SEO context.

That said, if you do not have the basics of EAT, Google E-A-T is a Google CONCEPT to simply explain a multitude of algorithms. In the same way that one can say doing user experience is doing SEO, EAT is having a site that is “expert, trustworthy and authoritative”.

Here is how we could simply break it down:
The factors of Google’s EAT concept

Entities and Google E-A-T

The relationships between entities, people and topics are important to Google, because this is how they can algorithmically determine contextual relationships, the quality or strength of the relationship, and therefore, authority and expertise.

Example:
An article on “how to cure cancer” is written by a very well-known and award-winning oncologist expert. His name is an entity, linked to many mentions on the WWW about cancer.

So the entity and “EAT” can be considered linked because the power and quality of the entity linked to the cancer relationship being significant, this article will rank better than a journalism expert.
How Google can evaluate an author through semantics

Here we are not talking about backlinks. But we are talking about entity, mention and relationship. And therefore to a certain extent, EAT linked to semantic SEO.

Summary of Google’s Knowledge Systems

To synthesise what we need to see, here is what is important to remember:

Google uses the “Knowledge Graph”: a vast knowledge graph that connects entities through relationships
Google uses other knowledge networks such as: Wikipedia / Wikidata / CIA World Factbook and many others in addition to unstructured data.
Google extracts entities from your texts using NLP and connects them to different knowledge graphs, allowing it to “understand the entities and therefore the semantics of your texts”
Google can also extract the semantics of your page through the structured data you manually insert in JSON-LD.

Vector Space and Semantics

Representation of a vector space
For a while now we have been talking about the most important algorithms for understanding semantics. But as you probably already know, the word semantics is generally used by SEO practitioners to talk about a lexical field around a keyword to appear more relevant.

Algorithmically, this comes from the fact that by representing each word of the language in a vector space, it would be possible to capture the semantic meaning of a word. Indeed, by placing all the words of a language in a space, it is then possible to compare the vectors of words against each other by measuring the angle between vectors. This then allows predicting that the word “dog” is closer to the word “cat” than it is to the word “skyscraper”. A vector space would also allow answering equations such as king – man + woman = queen or the equation Paris – France + Spain = Madrid.

This is how the word semantics is generally used. It designates semantic proximity, i.e. the distance of semantically close words in the vector space.

This is therefore the recurring topic when one hears about semantic SEO. But it is only one branch of semantics among many others.

But an interesting point I wanted to share here is that during my reading of the original Google paper explaining the Knowledge Vault, mention is made that semantic representation in a vector space and semantics in graph theory (Knowledge Vault) could be correlated.

Here is the extract from the relevant paper (which I have slightly translated):

To illustrate that the neural network model learns a meaningful “semantic” representation of entities and predicates, we can calculate the nearest neighbours of various elements in the K-dimensional space. It is known from previous work (e.g., https://arxiv.org/pdf/1301.3781.pdf) that related entities cluster together in space. We see that the model learns to place semantically related (but not necessarily similar) predicates next to each other. For example, we see that the closest predicates (in the embedding space w) to the predicate ‘children’ are ‘parents’, ‘spouse’ and ‘place of birth’.

Source: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45634.pdf

If you are confused about the semantic representation of words in a vector space or if you wish to create models for technical SEO, I invite you to explore the TensorFlow docs.

https://www.tensorflow.org/text/guide/word_embeddings

Here is a short video showing Word2Vec using PCA (similar to T-SNE)

Here are the semantic algorithms using the representation of words in a vector space:

Hummingbird and Semantics

At the time, Google evaluated the topic of a page based 100% on keywords.

But since 2013, page ranking has been done more intelligently thanks to the Hummingbird algorithm.

Hummingbird consists of better understanding the topic of a page through term-based synonym identification and better understanding the entities being evoked.

To better illustrate my point, we can extract a Google patent related to this algorithm, where the way they try to understand human language is described fairly precisely:

Hummingbird and semantics, the impact on Google search results
The (likely) Hummingbird patent tells us that a co-occurrence measure is used to evaluate candidate term/synonym pairs based on how often these terms (or words or compound expressions) appear together or in associated user queries (for example, in consecutive queries within a query session) or that tend to appear together in associated query results.

Google can take into account many synonyms from a synonym database to see how well these fit into the context of the whole query.

But rather than continuing to describe this infographic on how their algorithms work, let us focus on an example they give us, still from the patent in question.
How Hummingbird changes SERPs with semantics
In this image, you can see the query “What is the best place to find and eat Chicago style pizza?” Google thus determines that the word “Place” is equal to the word “restaurant” depending on the context of the query. This is what you can see at the bottom right, with an index called confidence, evaluated here as high.

Semantic Search

Let us talk about the impact of what we have just seen through your keyword analysis.

A few years ago, you could have created multiple pieces of content about the same topic but expressed differently by internet users.

For example, one article on “the best places to eat a pizza” and another article on “the best pizza restaurants”.

Today, Google perfectly identifies that this is the same topic, making this unnecessary.

It will return the same results for these keyword variations.

And this is indeed the problem with using long-tail keywords. If a keyword is identified as low competition on your SEO tool, it is always essential to search that keyword on Google and interpret the results.

Another example: long-tail keywords such as “link building tips” and “link building techniques” are low competition, but Google will always show the same ranked pages regardless of your phrasing, as well as for the parent topic “link building”, and even “linkbuilding”.

The impact of the Hummingbird algorithm on SEO

What Is RankBrain?

RankBrain does many things for semantics.
The 1st version of RankBrain was confirmed by Google on 26 October 2015.
But to understand it, let us extract once again the most likely patent behind this algorithm:
The RankBrain algorithm behind a Google patent
Behind this patent, Google indicates that according to one implementation, a process includes receiving a query that includes at least three sequential query terms; determining that the sequential query terms represent a concept; and in response to the determination that the sequential query terms represent a concept, collecting query term substitution data for one or more query terms that appear in queries that include the concept.

Example:
An internet user searches: “New York Times Puzzle”
RankBrain decides: “Puzzle” = “Crossword”

Which gives:
The impact of RankBrain on SERPs
The difference between Hummingbird and RankBrain is not so easily perceptible at first; in fact it is mainly about arriving at the same idea: understanding how things are connected and why.

At its core, RankBrain is a machine learning system that builds on Hummingbird, which took Google from a “strings” environment to “entities and relationships”.

The continuous vector representation of a given title is close, in the vector representation space, to that of a similar title, even if their labels are different. For example, the sentences “the team is ready to win the football match” and “the team is ready to win the victory in the football game” have the same meaning but share almost no common vocabulary. However, they should be close to each other in the vector representation space, because their semantic encoding is very similar.

BERT – Introduction

BERT for Bidirectional Encoder Representations from Transformers is a machine learning algorithm dedicated to natural language processing (NLP). This is the algorithm that detects the famous “search intent”. But it does many other things as well.

BERT is capable of predicting which intent a user is going to focus on rather than another.
For example, in the sentence “She is eating a green apple”, the algorithm detects that the user is going to focus on reading “apple” after having read “eating”, rather than focusing on the adjective “green”.

How the BERT algorithm works
BERT tries to mimic human behaviour.

BERT is also capable of predicting which word is going to appear in a context. Perhaps even more interesting, BERT is capable of resolving ambiguities in a query through contextual analysis.
For example “the problem has no solution” and “heat the solution to 78 degrees”, the word solution does not have the same meaning depending on the context of the query.

Simplified composition of BERT:

The Masked Language Model (MLM) whose principle is to discover the probability of a missing word in a sentence.
The Next Sentence Processing (NSP) which as the name suggests must predict the next sentence of a sentence.

BERT is a transformer encoder and has been very successful for natural language processing tasks. They calculate vector representations in the natural language space that can be used in deep learning models. BERT is not a standalone algorithm; it is a family of models.
Since BERT (2018), verbs, adverbs, adjectives are also important for determining context. By identifying relationships between tokens, references can be established and thus personal pronouns can also be interpreted. Other natural language processing (NLP) tasks, such as question answering and sentiment analysis, come from the BERT family of algorithms.

BERT, NER and Semantics

Beyond the semantic representation of words and its ability to more easily resolve the context of polysemous words compared to Word2Vec, BERT is also capable of detecting entities with an NER token, NER being the acronym for Named Entity Recognition.

Thanks to verbs, relationships between entities can be established.
Adjectives, meanwhile, can be used to identify a sentiment around an entity.

Before natural language processing, Google depended on manually managed structured and semi-structured information or databases. With BERT, it is possible to extract entities and their relationships from unstructured data sources and store them in a graph index. A major step in data mining for the knowledge graph.

For this, Google can use data already verified from (semi-)structured databases such as the Knowledge Graph, Wikipedia… as training data to learn to assign unstructured information to existing models or classes and to recognise new models. This is where natural language processing in the form of BERT and MUM plays a crucial role.

Moreover, through natural language processing, Google is able to access a vast range of unstructured information from across the entire crawlable Web.

MUM

MUM: On 18 May 2021, Google’s Search Vice-President, Pandu Nayak, announced the arrival of Google MUM. A new algorithm that will be 1,000 times more powerful than BERT.

MUM is at the heart of Google’s question answering.
MUM is the acronym for Multitask Unified Model.

Summary of Google’s Functioning as a Semantic Search Engine

How Google works as a semantic search engine
Thanks to these advances, Google can examine a piece of content and understand not only the topic it covers, but also related sub-topics, terms and entities and how all these different concepts are interdependent.

What Is Semantic SEO?

All of this teaches us that semantic SEO consists of writing content optimised around topics, entities, and not only on the basis of keywords.

Beyond content, semantic SEO is also creating strategically linked content around a set of topics/entities covering the overall topic.

From an off-page SEO perspective, semantic SEO consists of creating a brand identity and/or creating mentions for EAT rather than simple links (backlinks).

If you are more technically minded about SEO, you can go even further, such as calculating the BERT score of your pages:

Calculating the BERT score of your content
You could also check whether your ten most used keywords actually correspond to the entities of the theme.

For this you could calculate entity keyword density.

And many other things.

Performing an Entity Audit

The first step to achieving entity optimisation is to perform an entity audit, in the same way that one does not write an article without having analysed the potential of a keyword.
It ensures that your website uses the appropriate and known entities associated with your theme and helps you write content on topics your website should really talk about. You can also do what is called a semantic analysis.

Audit Your Site

When auditing your website, you will want to find all the entities that are part of your theme. In the principle of a semantic cocon and connecting supply and demand, I would even say to cover the entities around your brand.

Example:

If you are an SEO agency. You can target the topic of semantic SEO; it is a topic you would want to talk about to rank better.

But if you are an SEO agency specialised in link building, perhaps you should rather do an entity audit around link building. Moreover, if that is the case and you have never talked about the entity Brian Dean, you are not technically doing semantic SEO.

When you try to discover entities, it is important to keep in mind that they should not be just any old generic keyword you would identify in a keyword research project.

Instead, they should all be nouns — ideas, places, people, things, dates…

Inform them of the facts associated with these entities.

After the Entity Audit

Once you have audited the known entities of your website, create nice mindmaps to link them logically. Then publish the content.

Pay attention to technical and on-page SEO.

All of this counts.

But above all think about internal linking. And work on the website architecture with your most targeted entities correctly organised.
Information architecture for SEO

Google Misidentifies Semantics

We saw at the start of the article that Google uses NLP to understand the meaning of a text using artificial intelligence connected in one way or another to the Knowledge Graph.

This NLP technology allows text to be understood and words to be linked to entities.
HOWEVER. This is not always interpreted in the right way or connected correctly.

To illustrate my point, InLinks conducted numerous studies across several sectors, and Google does not manage to connect entities as well as it could.

Here is an example of Google’s understanding in the finance sector:

Australia (seen 8 times) => detected by Google
Cryptocurrency (8) => detected by Google
Service (economy) (7) => NOT detected by Google
Currency (7) => NOT detected by Google
Investment (6) => NOT detected by Google
Asset (6) => NOT detected by Google
Market (economy) (6) => NOT detected by Google
Interest (6) => NOT detected by Google
Payment (6) => NOT detected by Google
Bitcoin (6) => detected by Google
Information (6) => NOT detected by Google
Finance (5) => NOT detected by Google
Profit (economy) (5) => NOT detected by Google
Money (5) => NOT detected by Google
Digital currency (5) => NOT detected by Google

The link to studies across different sectors: https://inlinks.net/en/industry-report

Structured Data

By using structured data, also called schema markup, you give Google precise semantic information about your content.

But, obviously, it is possible to add all kinds of other structured data.

By adding defined schemas you will also appear in rich results, which could increase your CTR, a probable ranking factor due to RankBrain.

But above all, you make your texts understandable to Google. You will avoid it misunderstanding the information.

Structured Data and Wikipedia

Rather than trying again and again to trick Google, I think it is better to help it.

Think about how we can help Google understand our site.

Intrinsically linked to internal linking and semantic SEO. Doing it means adopting a sustainable strategy.

Forget about waking up in the morning checking a new patent that will decrease a site’s visibility.

It is therefore impossible to overlook in this blog post Dixon Jones.

And more specifically his tool inlinks.net.

Dixon Jones, former member of Majestic, now focuses on semantic SEO rather than backlink optimisation.

His tool allows, among other things, automatically creating semantic structured data based on your web page and pulling out all the entities or definitions you use in your content, which gives for example this:
Adding semantic structured data for SEO (schema)
When you give Wikipedia definitions, you optimise your site for semantic SEO (this is what is called Wikification). This is why we took the time at the beginning of the article to explain how all of this was connected.

Imagine that your sentence contains the word Paris. In context, is Paris the city or the surname Paris as in Paris Hilton? Specify this in your structured data.

InLinks understands and clarifies the topic of your content and connects it to the Knowledge Graphs of the main search engines.

inLinks Knowledge Graph — Try inLinks for semantic SEO

The important topics of your pages are disambiguated and linked to entities that Google understands and uses in its web services such as search, news and voice search.
Allow your content to be understood by search engines.

But that is not all: inLinks.net also allows you to link your content for semantic SEO, enriched with semantic anchors, or to offer you entity clusters.

If you wish, I leave you to discover this tool right here.

Example: Imagine you are talking about the entity Tomáš Mikolov; you can then link, via a hyperlink, Tomáš Mikolov to an article where you talk about Tomáš Mikolov, but you can also insert his Wikipedia biography in your structured data.

How to Optimise for Semantic Search?

Of course, it is possible to optimise your content for semantic search and more broadly semantic SEO.

And this is indeed what you probably already do unconsciously by using semantic optimisation tools, as 1.fr, YourTextGuru, SEOQuantum and many others would do.

For example, if we take the example of the SEOQuantum tool, for the keyword “seo”, here is what it tells us in terms of semantic content optimisation:

Semantic content optimisation with the SEOQuantum SEO tool

The frequency of words generally used, to be added to your text, accompanied by an importance measure.

But also other things, such as named entities as well as verbs to include in your content.

Identifying verbs and entities for semantics with SEOQuantum

By using these tools, not only will your content become more relevant in Google’s eyes, but you will also be able to appear more easily on the keyword variations searched by internet users (Hummingbird & RankBrain).

Semantic Site Structure

Semantic HTML

Semantic HTML for natural referencing
Semantic HTML is the use of HTML elements that have meaning in the DOM structure of the page.

Search engine crawling robots can recognise semantic HTML elements.

With the use of semantic HTML, the main objective of a web page, the main content section, the “Additional Content” section with tags like <aside>, the author and navigation areas can be easily displayed with tags like <nav>.

This does not mean that your pages need to be perfectly structured semantically to be on the first page of search results.

This could provide clear instructions (instruction is different from directive) to Google on the distribution of PageRank from the reasonable surfer, for example. Although this is far from certain.

But above all, using a correct HTML structure for lists and tables will be useful for the search engine to select the relevant part of the content and interpret it correctly for a potential position 0.

Thus, it must be said that there is a link between the use of semantic HTML and hunting for featured snippets. And, given that Google’s functioning as an answer engine is intrinsically linked to semantics, this is important.

On the other hand, by doing so you will be much closer to the technical grail, i.e. having your number of crawlable pages equal to the number of crawled pages equal to the number of indexable pages equal to the number of indexed pages, since you will consume less in terms of crawl budget with semantic code.

Good code quality allows this.

Semantic Sitemap.xml

It has been proven that dividing sitemaps into small chunks increases indexing speed and the number of indexed pages.

The main reason for this is that these smaller sitemaps are downloaded more frequently by search engines.

Abstract: The semantic categorisation of your sitemaps in a sitemaps index file can facilitate the analysis of the coverage report in Google Search Console.
Semantic compatibility between the crawl queue, the internal site tree and the semantic sitemap index file could potentially be a facilitating signal for the semantic search engine.

In the same way that a URL structure linked to semantic linking is relevant, having a semantically structured sitemap is equally relevant.

Semantic Internal Linking

You have certainly heard of the semantic cocon, a technique, among others, for semantic internal linking optimisation.

Using the SEOQuantum, YourTextGuru, cocon.se, or InLinks tools, these tools allow you to create semantic cocons with content ideas:

Create a semantic cocon (internal linking) with SEOQuantum
SEOQuantum offers semantic/thematic internal linking
The idea here, beyond the concept of the semantic cocon, if we simplify it, is to do what is called a topic cluster. Take a topic, have an idea of the overall keyword being targeted, then create other content around the same topic to develop each point of the first topic, all linked together by internal links.

Still on the semantic optimisation tool SEOQuantum. It is also possible to optimise your semantic linking by calculating the semantic proximity between 2 links.

Calculating semantic proximity between 2 pages with SEOQuantum

Semantic Anchors

The use of semantic anchors means using anchors that are synonyms of the main targeted keyword. Intimately linked to thematic PageRank optimisation.

For example, if you are targeting a keyword like netlinking. Using anchors such as “link building”, “link building techniques”, “best link building tips”, “linkbuilding”, “how to do linkbuilding”, etc. Is perfectly ideal for semantic SEO optimisation.

This anchor text optimisation can be done both inside your content (i.e. your internal linking) and in your netlinking strategy (i.e. your external links), if that is possible obviously.

A Zippy case study noted that the use of varied anchor text was strongly correlated with Google search traffic. So much so that they retested their study many times:
Case study on the use of semantic anchors by Zippy
In any case, URLs with a greater number of anchor text variations from internal links are strongly correlated with more Google search traffic.

inLinks.net handles this part too, automatically.

Semantic Writing

It is possible to do semantic writing and entity writing from Google Docs. By clicking on the small icon at the bottom right:
Exploring Google Docs topics for semantic writing
Once clicked, you will have an overview of the main topics:
Identifying related topics with Google Docs
You can also click on “More” and browse all related topics:
Browse all related topics with Google Docs
What Google suggests here is for one of my articles on RankBrain. We can see that it mentions topics like click-through rate, bounce rate — this is extremely relevant. So I should not forget to mention them if that is not already the case.

You can also use InLinks which ALSO does semantic writing; here is an example:

Semantic writing with inLinks

Here is another screenshot of entity detection in content on a more complete screen:

Entity writing with InLinks
I strongly invite you to play with InLinks — there are so many things to say!

WordLift is another excellent tool for semantic writing. It will allow you to classify your content, create entities, generate structured data among other things.

Semantic writing with the WordLift SEO tool

Otherwise, I also invite you to take a look at semantic SEO tools. Semantic writing can also be done via YourTextGuru or SEOQuantum.

When doing semantic writing, the workings of natural language processing (NLP) must be integrated into your process.

Indeed, you can run tests with NLP models to see whether the entities and various phrases of the article are perceptible, whether the article focuses sufficiently on a topic, or whether the linguistic and emotional structure of the article is sufficiently acceptable compared to the relevant industry.

Checking the sentiment of SEO writing via the Google NLP API

Write Long Texts

Google will not be able to understand the semantics of your content if it consists of 200 words. Because it will not be able to disambiguate the ideas, topics and entities.

The longer the text you provide, the better it will understand it. The more relevant it will therefore be in its eyes. Moreover, the frequency of terms used will increase the relevance of your text.

The relationship between synonyms and the various query formats and TF-IDF analysis for semantic SEO. There is a strong link between semantic SEO and the different spellings of words, their synonyms. Users may use different words to search for the same topic. It is important to naturally use synonyms or similar words in a topic to satisfy all users in a context, in the context of relevant search intent, and to ensure that the search engine can reconcile difficult concepts.

Also, the longer your text, the more the internet user’s search intent will be fulfilled as you will potentially provide everything they could expect to have after reading it.

For example, an internet user searches “backlinks”. When they understand what they are and what they are for, they will surely wonder:

How can I get backlinks?
Can I buy them?
Etc…

You could directly answer these questions right after having defined what backlinks are.

There are an enormous number of reasons (so I will not list them all) why content should be long content. Provided you know why and do not produce gibberish.

Frequently Asked Questions.

Frequently asked questions, PAA (People Also Ask) in English, are, as the name suggests, related frequently asked questions displayed on each search result.
By answering them in your content, you will not only be able to get more organic traffic by appearing in this small box, but you will also be able to rank better thanks to this semantic optimisation.

AlsoAsked is a good tool for this.
The AlsoAsked tool for questions asked and therefore semantic SEO
https://alsoasked.com/

Semantic Off-Page SEO

Becoming an Entity

Becoming an entity on Wikipedia is the best-known way to be listed in Google’s Knowledge Graph and generally defines you as an entity.

If you cannot be present on Wikipedia, you may be able to become an entity in the knowledge graph by associating yourself with an existing entity on Wikipedia.

There are an enormous number of different techniques for doing this, which I will not detail here:

Being listed on Wikipedia
Becoming an Entity by association
Using an Edge strategy
And other approaches to becoming an entity

Your first strategic decision in semantic SEO is to know whether you want to try to be a fully defined entity in its own right.

Once you have an entity on Google’s knowledge graph, your actions will be continuously updated in the KG. If you are an organisation, the organic marketing of your new album becomes much easier than it would be for a record shop to market the same album. The knowledge base will simply be updated, displaying the new album. This immediately creates a short vector between the album and the band. And the relationship between the album, other albums and your organisation is created. But the record shop may have more difficulty and will need an aggressive strategy.

In any case, make sure that:

The information on your site should indicate who you are and what you do.
Add Schema.org author markup (your name with a link to your biography).
Obtain confirmation on several independent, trustworthy and authoritative third-party sites.

I also invite you to consult the list of sources that Google can use for Knowledge Panels, accessible here: https://kalicube.pro/trusted-sources

Sites recognised by Google

The Semantic Web, a Web Where Everything Is Connected.

Who is related to what and why?

Link yourself to people who are semantically close to you, or at least in the same theme.
Whether for backlinks or for mentions.

Understanding how a theme is interconnected with entities
Babbar is a very good tool for semantic netlinking. Thanks to its metrics such as thematic PageRank, induced strength and spot finder.

Frequently Asked Questions for Semantic SEO

What Is Semantics?

Depending on the context and its intensive language abuse, the word semantics has an enormous number of meanings. Each profession or specialisation within the same branch has its own definitions of semantics.

In the field of SEO, depending on the context of the sentence, it can be related to:

the meaning of words, simple or compound;
meaning relationships between words (homonymy, paronymy, synonymy, antonymy, polysemy, hypernymy, hyponymy relations, etc.);
the proximity of a word to others in a vector space;
the meaning of a thing as an entity;
the structure of a page in HTML code (markup languages).

What Is Semantics For?

Semantics allows the gaps in information retrieval to be filled because semantics allows Google and search engines in general to map information. The machine would therefore be capable, to a certain extent, of reasoning about the actual relevance of content rather than on lexical statistics like TFIDF and BM25.

Semantics: What Importance in SEO?

Semantic optimisation has long been controversial because search engine algorithms took years to interconnect everything. In 2013, when Singhal had presented the KG, it did not truly have an impact. In 2022, it matters. A recent patent, namely from 4 January 2022 (US-11,216,503), shows that Google can avoid sorting results based on the quality of matching documents for query terms but instead groups topics and relationships between entities as part of its decision on what to include in the SERPs.

Summary on Semantic SEO:

We have seen many things, but as you might have suspected, this is a guide for getting started in semantic SEO.

We could talk about semantic SEO for weeks without interruption, so I hope this constitutes only a base of resources for your future research. That said, if you have questions, comments or anything at all, do not hesitate to leave a word.

Does all of this mean that SEO has changed? That keyword analysis should no longer be done?

Imagine that you had listed the 100 best series in 2015.
A few years later, in 2022, you update this list and add new incredible series.

You need to make changes and remove some series because your article remains a top 100.

Some series will need to be removed from the list; this does not mean they have become bad.