Knowledge Graph

Knowledge Based Trust (KBT): How Google Evaluates Web Source Trustworthiness

Knowledge Based Trust is a largely unknown algorithm that's deeply connected to E-E-A-T and semantic SEO. Discover how it works, how it differs from PageRank, and what it means for your SEO strategy.

Définition de Knowledge Based Trust : évaluer la confiance d’une source Web

Knowledge Based Trust is a relatively unknown algorithm — and yet it is intrinsically linked to EAT (Expertise, Authoritativeness, Trustworthiness) and semantic SEO. It is therefore an important concept for SEO professionals to understand.

The paper “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources” was published in February 2015 — its title translates simply as: “Estimating the reliability of web sources based on knowledge.”

It was written by eight Google researchers, including Xin Luna Dong, a researcher at Google.
https://arxiv.org/pdf/1502.03519.pdf

Xin Luna Dong also gave an excellent seminar at Stanford on KBT and the Knowledge Vault:
https://www.youtube.com/watch?v=Z6tmDdrBnpU

What is Knowledge Based Trust Used For?

The primary goal of KBT is to combat misinformation — which is one of Google’s main challenges in displaying the best search results. The web is a mine of misinformation — that’s a fact. The idea behind KBT is to assign a “score (Accu)” that allows web pages to rank prominently only if the web source is genuinely trustworthy — and, crucially, does not spread misinformation.

Knowledge Based Trust score

The Trustworthiness of a Web Source

Google uses the term “web source” to refer to a specific web page or an entire website. I’ll use “web source” here to refer to both.

Isaac Watts: “Learning to trust is one of life’s most difficult tasks.”

KBT focuses primarily on verifying facts associated with an entity. For example, an entity might be “Paris.” In this example, if a web source claims that Paris is the capital of Luxembourg, Google understands that the source is sharing incorrect information.

Knowledge Based Trust and the Knowledge Vault

Yes — Knowledge Based Trust does use the Knowledge Graph and the Knowledge Vault.

The Knowledge Vault is a massive database that allows Google to understand “entities” as real defined things — not simply character strings.

The Knowledge Vault is not just a simple knowledge base like the Knowledge Graph — it also automatically creates new entities through what’s called an inference engine.

Example of a Knowledge Graph (set of triples)

With such a powerful knowledge database available, Knowledge Based Trust can be applied. When a web source shares false information, KBT can know or predict that the information is false.

Knowledge Based Trust score in triple form

Either a defined triple (Subject predicate Object) can determine whether the information is false — or the relationship hasn’t been defined, in which case the Knowledge Vault can predict whether the source’s claim is actually true or not.

Imagine the Knowledge Vault has a triple stating that Socrates is a human. So “Socrates” is defined as an entity and “Human” is also an entity, linked by the relationship “Socrates is Human.” If a web source claims that Socrates is mortal — since the entity Socrates is connected to the entity Human, the Knowledge Vault can predict that Socrates is mortal because he is human. That’s essentially what the Knowledge Vault does.

Knowledge Based Trust is a probabilistic algorithm that works in conjunction with the Knowledge Vault (and its related systems like KG and Freebase) to function.

KBT vs PageRank

Knowledge Based Trust vs PageRank

The chart shows KBT score on the x-axis and PageRank on the y-axis.

This graph comes directly from the KBT paper. The red signal shows that the correlation is orthogonal — meaning the score is correlated between PageRank and Knowledge Based Trust.

But this graph also shows the advantage of KBT — it can identify credible sources even without high PageRank (shown in green).

Only 20 of the 85 trustworthy sites have a PageRank greater than 0.5. This shows that KBT can identify sources with reliable data even if they are secondary sources with low PageRank.

Knowledge Based Trust is a metric that could potentially deliver better results for information retrieval — and therefore better rankings in Google search results — compared to PageRank alone.

PageRank is ultimately just an algorithm that ranks web pages based on authority and thus the perceived trustworthiness of a source through its inbound links. While hyperlinks are a fairly reliable indicator of a web source’s importance, PageRank fails to capture the true essence of trust — a website can be well-known and popular while still spreading misinformation, as is the case with gossip sites:

Impact of Knowledge Based Trust on gossip sites

Knowledge Based Trust is therefore an additional measure alongside PageRank to rank web sources that truly deserve it.

From the paper:

“Web search has traditionally been evaluated using exogenous signals such as hyperlinks and browsing history. However, these signals primarily reflect the popularity of a web page. For example, the listed gossip websites mostly have high PageRank but are generally not considered trustworthy. On the other hand, some less popular websites still contain very accurate information. In this paper, we address the fundamental question of estimating the trustworthiness of a given web source.”
“We discuss new research opportunities to improve it and use it in conjunction with existing signals like PageRank.”

“Exogenous” means something that comes from the outside — the opposite of “endogenous.” These Google engineers are explaining that their inventions allow them to evaluate a site’s quality by its own internal factors rather than external ones. For example, PageRank is an “off-page” algorithm — it influences your website through external signals — while KBT evaluates your website based on the intrinsic quality of its content.

KBT and E-E-A-T

Understanding KBT clarified for me the distinction Google makes between trust and expertise in their E-A-T (expertise, authority, trust) concepts.

KBT verifies whether the web source does not propagate misinformation — meaning it is trustworthy.

However, this doesn’t necessarily make that source an expert or authoritative source.

Authority relates, among other things, to PageRank.

Expertise relates to the author of the web source (Google Author Rank, Google Agent Patent, etc.).

Conclusion on Knowledge Based Trust

Much work has been done to evaluate whether a web source is high quality.

PageRank and Authority-hub consider quality signals from link analysis.

EigenTrust and TrustMe consider behavior signals from a source within a Peer-to-Peer network.

TrustRank and AntiTrust detect web spam.

KBT is a knowledge-based reliability measure — its goal is to resolve conflicts from data provided by multiple sources and find truths that are consistent with the real world.

The idea behind KBT had already been explored to measure the reliability of a web source at open-web scale — but previous fusion measures were based on other websites without a true ontological system like KG and KV. As a result, they couldn’t truly distinguish an unreliable source from a trustworthy one.

It’s also interesting to see how Xin Luna Dong has since distinguished between a knowledge graph and a product graph — after working at Amazon. She describes the challenges of building a product Knowledge Graph for an e-commerce site. Part of the answer seems to involve using structured data sources and entity resolution to find information for a product graph. She also highlights the use of semi-structured data, such as DOM extraction from web pages and leveraging information from Amazon product profiles — a goldmine of insights.

That work is here: All You Need to Know to Build a Product Knowledge Graph (KDD 2021 Tutorial): https://naixlee.github.io/Product_Knowledge_Graph_Tutorial_KDD2021/

If you want to learn more about KG, KV, KBT, information retrieval (IR) or the connection of these systems with artificial intelligence, I recommend visiting her website: http://lunadong.com