technique

Regular Expressions (RegEx) for SEO

Regular expressions (regex) are a powerful tool for SEO data analysis. Discover how regex works, the key syntax, and how to use it in Google Search Console, Screaming Frog and more.

Les expressions régulières (regex) en SEO

RegEx is not always as complicated as it first appears — see for yourself: ([0-9]+(.[0-9]*) ?) — crystal clear?

More seriously, what often looks like a cat walking across a keyboard with random characters can be very hard to interpret, but in reality, it takes just a little practice to be able to use certain regular expressions in your workflow.

For example, by using this regex on Google Search Console « .*(best|top|vs|review*).* » you could quickly identify pages or queries that match the commercial search intent of your site.

Your client thinks they have been hacked? Content injection? Use the following regex: « .*viagra.*|.*cialis.*|.*levitra.*|.*drugs.*|.*porn.*|.*www.*www.* »

Rather simple to understand, isn’t it?

Wait — what is RegEx? RegEx, short for Regular Expressions, is a way of matching strings (essentially pieces of text). You create an expression that is a combination of characters and metacharacters, and a string is then compared against it.

So in the previous example, your regular expression is applied and if it matches, it is included in the report. If it does not match, it is rejected. In this case, Google Search Console either shows you pages that contain those words, or it shows you nothing.

RegEx has many uses beyond Google Search Console. For example, its application on Screaming Frog is often indispensable for extracting data in an optimal way.

The Different RegEx Expressions

Here are the different characters you can use:

Wildcards

Syntax Expression
. Matches any character
* Match the preceding character 0 or more times
? Match the preceding character 0 or 1 time
+ Match the preceding character 1 or more times
| OR

Anchors

Syntax Expression
^ The string starts with the following character
$ The string ends with the preceding character

Groups

Syntax An expression
( ) Match the included characters in the exact order
[ ] Match the included characters in any order
Match all characters in the specified range

Escape

Syntax An expression
Treat the character literally. Indicates that the adjacent character should be interpreted literally rather than as a metacharacter

Why Use RegEx?

Although you first need to learn and understand how the concept of operators works, RegEx is extremely useful for SEO professionals. And even more so for those who manage large sites.

So if you are working for a client, you can quickly filter using RegEx for pages that focus on particular search intents, filter by country, see single-word queries (short-tail keywords), and… anything you want, really.

There are many tools that allow filtering with RegEx, such as Ahrefs, SEMrush, crawlers like Screaming Frog and Oncrawl, as well as other tools like Google Search Console and Google Analytics.

If it initially seems complicated to get to grips with, start with very simple expressions and gradually add more complexity. Learning RegEx comes with practice — the more you use it, the more it becomes intuitive and the more powerful your SEO analyses will be.

For example, in the context of a site audit with Screaming Frog, you can use RegEx to extract all title tags that contain a specific keyword, or to identify all URLs that follow a particular pattern. This allows you to automate tasks that would otherwise require hours of manual work through a spreadsheet.

RegEx is also extremely useful in Google Analytics to create segments, filters, or views that isolate specific traffic. For instance, you can exclude internal traffic, isolate mobile traffic, or filter visits from a specific country.

Practical RegEx Examples for SEO

Here are some practical examples to use for SEO. You can directly use them in Google Search Console, Screaming Frog, or Ahrefs.

Spam / hacking detection: .*viagra.*|.*cialis.*|.*levitra.*|.*drugs.*|.*porn.*|.*www.*www.*

Commercial search intent: .*(best|top|vs|review*).*

Informational search intent: .*(how|what|why|guide|tutorial).*

Short-tail queries (1 word): ^[^s]+$

Queries containing a specific URL: .*createur2site.fr.*

Note: [“‘]item[“‘]: *{[“‘]@id[“‘]: *[“‘].*?[“‘], *[“‘]name[“‘]: *[“‘](.*?)[“‘] allows you to extract all types of JSON-LD schema markup on a page — not necessarily your own site 😉

Resources to Consult

As I am not expert enough on this subject and do not wish to write multiple articles about RegEx, here are plenty of sites that can help you.

Google explains RegEx to you:
https://support.google.com/analytics/answer/1034324?hl=en

Plenty of ready-made RegEx tables to use in your Google Search Console:
https://www.jcchouinard.com/regex-in-google-search-console/

Plenty of tables and RegEx explanations for Screaming Frog:
https://uproer.com/articles/screaming-frog-custom-extraction-xpath-regex/

And finally, here is an excellent tool to create and test your regex:
https://regex101.com/