Knowledge Graph

The Google Knowledge Graph and SEO

The Google Knowledge Graph is a massive database of 5 billion entities and 500 billion facts that powers Google's semantic understanding. Discover how it works, how to optimize your SEO for it, and how to become a recognized entity.

Knowledge Graph - Référencement sémantique

Definition of the Google Knowledge Graph

A Knowledge Graph in computer science is a knowledge base, using graph theory, that aims to connect each entity with other entities through a relationship, which we call a triplet or triple. The Google Knowledge Graph works in the same way. This allows Google to move from a character string environment to a genuine understanding of words by their machines.

Semantic triple

Today we will talk about everything that encompasses Google’s Knowledge Graph, the knowledge graph at the foundation of the semantic Web and thus of semantic SEO.

The History of the Google Knowledge Graph

The Knowledge Graph: things, not strings
-Amit Shingal

For years, Google’s search simply consisted of matching keywords to queries. For a search engine, words like [taj mahal] were just that — two words.

Thus, the idea of using a knowledge base would make it possible to truly understand words, their meanings, and their relationships.

At the initiative of Web 3.0 in its purest definition, Google attempted to structure information to make things (names, dates, things) understandable through relationships between other entities.

The Knowledge Graph is Google’s semantic database.

It is where entities are placed in relation to one another, assigned attributes, and placed in a thematic or ontological context.

Example taken from a Google patent on their Knowledge Graph

Knowledge Graphs are factual in nature because the information is generally extracted from more reliable sources, and post-processing filters and human editors ensure that inappropriate and incorrect content is removed. In the case of Freebase, the origin of the Google Knowledge Graph, it was created manually by volunteers and acquired by Google.

Google’s Knowledge Graph is not only rooted in public sources such as Freebase, Wikipedia, and the CIA World Factbook. It is also fed at a much larger scale, as it is able to evolve on its own. Today, it has amassed more than 500 billion facts about five billion entities. All entities are adjusted based on what people search for and what Google discovers on the Web.

Composition of the Google Knowledge Graph

What Is an Ontology?

In the context of knowledge graphs, ontology refers to a factual relationship between two entities, two nodes.

For example, a cat is a feline. Ontologically, this relationship linking the feline class to the animal the cat is factual and ontological.

In philosophy, an ontology is a theory about the nature of existence, about the types of things that exist; ontology as a discipline studies these theories.
Artificial intelligence researchers use this term in their own jargon, and for them an ontology is a document that formally defines the relationships between terms.

Individuals: things that can be named in the data
Classes: A collection of individuals
Properties: These form a link between an individual and a value
Relations: Defines how two individuals are related to one another
Axioms: An integral part of ontologies, they help us derive hypotheses from data and make inferences.

What Is an Inference?

The most typical type of ontology for the Web has a taxonomy and a set of inference rules.

In the field of artificial intelligence, an inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information.

In Google’s context, the Knowledge Graph is the knowledge base and the Knowledge Vault is the inference engine. Thus, the KV inspects the KG to understand the relationship between entities it may not yet have clearly defined.

An inference engine is based on the same main ideas as machine learning. That is, these types of systems do not have a clearly defined process but two different processes. The first is forward chaining, the second proceeds in the same way but in the opposite direction, called backward chaining. A bit like CBOW and Skip Gram in Word2vec, if that helps you visualise.

In short. A trivial example of how this rule would be used in an inference engine is forward chaining — the inference engine finds in the knowledge base all facts matching Human(x) and for each fact found, it adds the new information Mortal(x) to the knowledge base. So, if it finds an object called Socrates and it turns out to be human, it deduces that Socrates is mortal. In backward chaining, the system would be assigned a goal, for example to answer the question “Is Socrates mortal?” It would search the knowledge base and determine whether Socrates is human and, if so, assert that he is also mortal.

How Is the Google Knowledge Graph Built?

In semantics, an entity is described unambiguously by an identifier and notably characteristics (attributes or properties). While the identifier (URI), which usually consists of a sequence of numbers, is used by machines to identify the entity, humans recognise entities based on their characteristics.

To represent semantic structures, it is useful to use graph theory. This theory is at the foundation of the Knowledge Graph and many other things at Google.

Graphs are made up of nodes and edges. In the context of semantics, nodes represent entities and edges represent the relationships between entities. These relationships can also be assigned values as a “relational context”. For example, Larry Page and Steve Jobs are linked by the relationship (edge): “founder”.

A graph contains all relevant entities, regardless of their ontology. In addition to showing the existence of a relationship between entities, edges can also be used to indicate the values of these relationships, for example through their length and thickness.

A particularly thick connecting edge could represent an intense relationship between the two entities. The relationship distance, indicated by the length of the edge, can also be used to represent how closely the two entities are linked. It is also possible to create a link to vector spaces including Euclidean distances. This means that a graph structure can be created from statistical methods such as vector space analyses.

For the display of a knowledge panel, Google checks whether there is a data entry in Wikidata or a page on Wikipedia. If not, it may display one if the company has a Google My Business listing, but this does not create an entry in their Knowledge Graph.

In a scientific project in which a Google employee participated, entities are equated with Wikipedia entries. Indeed, Wikipedia articles play a central role as a source of information for many Knowledge Panel entries in the Knowledge Graph. Along with Wikidata entries, Google uses them as evidence of an entity’s relevance. No Wikipedia article and no Wikidata means you will not be an entity.

Source: https://research.google.com/pubs/archive/40749.pdf

It is possible to have a Knowledge Panel, i.e. a box to the right of search results, but this is not necessarily linked to the KG. Although sometimes social networks and other well-known sites can be integrated into it.

The importance of Wikipedia in identifying entities and their thematic context is studied in the scientific article Using Encyclopedic Knowledge for Named Entity Disambiguation. (http://www.cs.utexas.edu/~ml/papers/encyc-eacl-06.pdf)

One way for Google to identify the relationships between entities could be to analyse annotations and links in Wikipedia.

An annotation is the link from a mention to an entity. A tag is the annotation of a text with an entity that captures a topic (explicitly mentioned) in the input text.
The development of a semantic understanding for the interpretation of search queries and documents is closely linked to the ability to identify entities and the relationships between them, and to the ability to place them in a context or ontology. This is possible with the help of verified data sources such as Wikipedia. However, the enormous volume of search queries and documents created every day makes this process somewhat impractical. This is one of the reasons why Google has, for several years now, been driving the development of self-learning and machine learning algorithms.

What Does Google Consider an Entity?

Entities are particularly important for information retrieval systems, as they allow for the inference of additional information regarding the context of a search query, a sentence, or a text.

The unambiguous identification of entities is important for Google because it facilitates a number of tasks:

  • Interpretation of search queries
  • Provides clarity when analysing terms with multiple meanings
  • Identifying relationships between entities and their meaning in terms of ontologies or themes
  • Interpretation of documents
  • Identifying relevant entities in a thematic context

Theoretically, there is a long list of possible entity types, including:

  • Books
  • Educational institutes
  • Events
  • State institutions
  • Companies
  • Films
  • TV series
  • Bands
  • Organisations
  • People
  • Places
  • ….

A look at the entity types listed on schema.org gives us a comprehensive overview of everything that can be evaluated as an entity. It is not entirely straightforward to assess what Google actually classifies as an entity and what it does not.

In a patent description that Google references in one of its own patents, the following definition is found:

A named entity is a group of one or more words (a text element) that identifies an entity by its name. For example, named entities can include people (such as a person’s first name or role), organisations (such as the name of a company, institution, association, government, or private organisation), places (such as a country, state, city, geographical region, named building, etc.), artefacts (such as names of consumer products, such as cars), temporal expressions, such as specific dates, events (which can be past, present, or future, such as World War II; the 2012 Olympics) and monetary expressions.

https://www.google.com/patents/US20100082331

How Does Google Use the Knowledge Graph? What Is It Used For?

Google uses the Knowledge Graph to function as an answer engine and to better align globally with the user experience on their search results.

It is now capable of displaying Wikipedia knowledge panels (Knowledge Panel) because it understands the entity, but it can also generate even more complex results.

Semantics for Google's algorithms

I have identified the patent enabling this — it is called Generating Insightful Connections Between Graph Entities. patents.google.com/patent/US20140280044
But today, with all the new semantic algorithms, the Knowledge Graph is used for many other things, such as making the meaning and topic understood, among other things, when they visit a page.

The Google Knowledge Panel

Knowledge panels come from the information in the Google Knowledge Graph. They provide quick and factual information to internet users. It is the panel to the right of search results.

Example of a knowledge panel on Google (Knowledge Panel)

Knowledge panels can for example include:

  • Title and brief summary of the topic
  • A longer description of the topic
  • A photo or photos of the person, place, or thing
  • Key facts, such as the birth date of a notable figure or the location of something
  • Links to social profiles and official websites
  • Songs from musical artists
  • Upcoming episodes of TV shows
  • Lists of sports teams

On mobile, multiple knowledge panels can provide facts:

On mobile, multiple knowledge panels can be displayed

Carousels at the very top of the page show things like events, films, and TV shows that are intrinsically linked to your schema structured data.

SEO Optimisation for the Google Knowledge Graph

  • Become an entity — Wikipedia, Wikidata, and others — Obtain mentions.
  • Mention entities.
  • Define the words you use in structured data rather than letting AI natural language processing systems choose for you.
  • Add more structured data.

Regarding the author biography, I strongly advise trying to add as many relationships as possible so that Google understands who you are in the best possible way:

SEO: linking the entities of your biography to the Google Knowledge Graph

You can easily try this tool here: https://demo.nl.diffbot.com

Finding Entities Linked to the Google Knowledge Graph

Talking about entities linked to your topic is the basic principle of semantic SEO. To do this, you need to know the entities that Google knows and that gravitate around your topic.

Three main techniques are available to you for this.

The first is to use Wikipedia (preferably in English) and visit the related articles.

Predicting Knowledge Graph entities by looking at Wikipedia's related articles

The second is to use Google’s API, which is accessible to everyone by using Google Cloud and creating your own API key, in order to discover the exact composition of the KG:

Using the Google Knowledge Graph API for entity SEO

The third is to use an SEO tool such as Kalicube or Merkle, which extracts information from Google’s API in a more readable and easily manipulable format (exportable as an Excel file, for example):

Searching for a KG entity with the Merkle SEO tool

How to Use Google’s Entities to Optimise Your SEO?

Once you have found the most important entities for the pillar topic, continue inspecting what Google has considered relevant for the sub-topics of the entities in order to maximise what some call thematic authority.

To conclude this overview of the concept of thematic authority, look at how closely the search engine is linked to its KG:

The entities around the word shoe
Taking the word “shoe”, Google informs us about the most important entities for this query, and if we take a look, these are also the things it mentions in its semantic “bubbles” or even in its shopping tab (bottom left).

Knowledge Graph entities present in Google search results
Knowledge Graph entities are clearly present on the semantic results of the search engine. I therefore recommend using what it has defined as entities to create pages, semantic site structures… Then I invite you to explore the entities of the same entity and go as deep as possible in that way.

FAQ

Google Knowledge Graph: What Is It Used For?

Google uses a Knowledge Graph, i.e. a vast knowledge base, in order to understand the true meaning of words and their relationships. This also allows them to know whether the information on a web page is factual and truthful, in order to limit misinformation.

How to Appear in the Google Knowledge Graph?

If you want to become an entity in Google’s Knowledge Graph, you will necessarily need to have a Wikipedia page or be associated with a Wikipedia page.

What Is the Usefulness of the Knowledge Graph for an SEO Professional?

The Knowledge Graph is an invaluable source for creating a good SEO strategy.