crawl

Robots Meta Tag & X-Robots-Tag: The Complete Guide to Noindex

The robots.txt file, sitemap.xml and canonical tags are only suggestions to Google. The robots meta tag lets you control indexation precisely — page by page or in bulk. This guide covers all directives, WordPress setup (Yoast) and X-Robots-Tag for non-HTML resources.

Robots Meta Tag & X‑Robots-Tag / no-index

The robots.txt file, like the sitemap.xml and canonical tags, are suggestions for Google, but if it decides for one reason or another to index them anyway, there is nothing you can do about it.

Well… almost nothing.

What is a Meta Tag?

The meta tag, or meta robots tag in English, is an HTML code to place in the <head> section of your page that allows you to deindex a page.

It looks like this:

<meta name=”robots” content=”noindex”/>

Why is the meta robots tag important for SEO?

The meta robots tag is often used to prevent pages from appearing in search results, meaning they are not in Google’s index.

Although it has other uses.

You might want to prevent search engines from indexing different types of content:

  • Pages with no SEO value and/or no value for the user
  • Staging pages (development environment)
  • Internal search engine pages
  • Landing pages dedicated solely to conversion optimisation for Google Ads (PPC) paid campaigns
  • Promotional or contest pages
  • Duplicate content (also use canonical tags to suggest the “original” URL)

This allows you to sculpt your site and avoid wasting your crawl budget. Combining this with technical SEO optimisations such as robots.txt and sitemaps will be crucial for your rankings.

Note: Do not use the disallow directive in robots.txt for the page types listed above — it is a bad practice. You should use noindex instead.

Implementing the no-index tag

Meta Robots tags are composed of two attributes: name and content. You must specify values for each of these attributes. Let’s look at this in detail.

Meta name

The “name” attribute specifies which robot the noindex tag applies to. Known as user-agents in robots.txt.
The UA value for all robots is:

<meta name=”robots” content=”noindex”/>

To specify the UA for Googlebot specifically:

<meta name=”googlebot-image” content=”noindex” />

Note: The different UAs are: Googlebot / Googlebot-Image / Bingbot / Slurp (Yahoo) / Baiduspider / DuckDuckBot

The different name attributes

  • <meta name=”robots” content=”all”/> – Default value
  • <meta name=”robots” content=”noindex” /> – Deindexes the page
  • <meta name=”robots” content=”nofollow” /> – Nofollow
  • <meta name=”robots” content=”none” /> – Noindex + Nofollow (redundant)
  • <meta name=”robots” content=”noarchive” /> – Prevents Google from caching the page in the SERP (useful if you want to modify the content)
  • <meta name=”robots” content=”notranslate” /> – Notranslate
  • <meta name=”robots” content=”noimageindex” /> – NoimageIndex
  • <meta name=”robots” content=”unavailable_after: Sunday, 01-Sep-19 12:34:56 GMT” /> – Time-delayed noindex
  • <meta name=”robots” content=”nosnippet” /> – Prevents rich snippets from being displayed for the page.
    Can be applied to a div, span, or section using “data-nosnippet”, which is a boolean attribute, meaning it is valid with or without a value. So <div data-nosnippet>this will not appear in a snippet</div> is equivalent to <div data-nosnippet=”true”>neither will this</div>
  • <meta name=”robots” content=”max-snippet:-1, max-image-preview:large, max-video-preview:-1″ /> – For copyright reasons, you may not want rich snippets on your pages. Adding this tag solves the problem (automatically included if you use Yoast SEO).

Attribute support

Not all directives are recognised by search engines other than Google, so here is a summary table:

Directive

Google

Bing

all

noindex

nofollow

none

noarchive

nosnippet

notranslate

noimageindex

unavailable_after

 

Note: other attributes exist for other search engines. For example, “noyaca” prevents Yandex from using search result snippets.

Noindex on WordPress

If you use a CMS like WordPress, it will be more complex to modify your <head> directly. The procedure with WordPress and Yoast SEO is as follows:

Select your page → Edit → Go to Yoast settings → Advanced → Do not allow search engines to show this page in search results:

Adding noindex on WordPress using Yoast SEO

The “Advanced meta robots” line gives you the option to implement other directives such as noimageindex.

If you want to add time-delayed indexation countdowns, you will need to modify your child theme from the WordPress admin panel by adding a few lines of PHP to modify your <head>.

You can also deindex groups of pages by going to: Yoast → SEO Settings:

Deindexing groups of pages on WordPress using Yoast SEO

If you click “disabled”, in this case, Yoast will deindex your posts with noindex tags and also by removing the relevant links from sitemap.xml.

What is X-Robots-Tag?

Meta robots noindex tags are ideal for deindexing pages structured in HTML. However, if there are PDFs you want to deindex, use X-Robots-Tag instead.

This is what is called an HTTP header.

The code:

Header set X-Robots-Tag “noindex”

How to implement the X-Robots-Tag HTTP header?

This code should be placed in your .htaccess file, at the root of your site (or httpd.conf). Exactly like redirect rules.

For example, to deindex all PDFs:

<Files ~ “.pdf$”>
Header set X-Robots-Tag “noindex”
</Files>

If you need to deindex a subdomain, a subdirectory, or anything else requiring a bulk modification, use X-Robots-Tag headers.

Note: If you are unsure whether resources have a noindex HTTP header, you can use a browser extension such as “Live HTTP Headers”.

A reminder of the basic rules

Here are some common mistakes to avoid:

 

  • Adding noindex directives to pages that are disallowed in robots.txt (if Google cannot crawl the page, it cannot see that you do not want it indexed)
  • Poor sitemap management (the ideal approach is to remove the page from the sitemap after Google has understood it should be deindexed)
  • Forgetting to remove noindex directives from the production environment (staging).