Regular Expressions in Grep (Regex)

Introduction

Before we begin talking about Regular Expressions in Grep (Regex), let's briefly understand – What is grep?

For text processing, grep is one of the most helpful and powerful programs in Linux. grep looks for lines that match a regular expression in one or more input files and outputs each matching line to standard output.

A regular expression in Grep (Regex) is a pattern used to search and match specific text within a given input. It allows for advanced search capabilities by defining complex search patterns using a combination of characters, meta characters, and quantifiers.

In this tutorial, you will use regular expressions in the GNU version of grep, which comes installed by default on most Linux systems. We will also address a few FAQs on Regular Expressions in Grep (Regex).

Grep Regular Expression

A regex, or regular expression, is a pattern that matches a set of strings. Operators construct literal characters, and meta-characters, which have a particular meaning or make up a pattern. Basic, Extended, and Perl-compatible regular expression syntaxes are supported by GNU grep.

When no regular expression type is specified, grep interprets search patterns as basic regular expressions in their most basic form. Use the -E (or --extended-regexp) option to interpret the pattern as an extended regular expression.

There is no functional difference between the basic and extended regular expression syntaxes in GNU's implementation of grep. The main distinction is that the meta-characters ?, +, |, (, and ) are regarded as literal characters in basic regular expressions. When utilizing simple regular expressions, the meta-characters must be escaped with a backslash to maintain their particular meanings (\).

Later, we'll go into the meanings of these and other meta-characters.

To prevent the shell from interpreting and expanding the meta-characters, you should always enclose the regular expression in single quotes.

Literal Matches

The grep command's most basic use is to search a file for a literal character or series of characters. For example, to see all the entries in the /etc/passwd file that contains the string "bash," use the following command:

grep bash /etc/passwd

You will get an output like below:

root:x:0:0:root:/root:/bin/bash
vega:x:1000:1000:vega:/home/vega:/bin/bash

The string "bash" is a basic regular expression that consists of four literal letters in our example. This tells grep to look for a string that starts with a "b" and ends with "a," "s," and "h."

The grep command is case-sensitive by default. The uppercase and lowercase characters are treated separately in this scenario.

Use the -i option to disregard case while searching (or --ignore-case).

It's worth noting that grep searches for the search pattern as a string rather than a word. grep will also report lines when "gnu" is embedded in larger terms, such as "cygnus" or "magnum," if you're looking for "gnu."

If the search string contains spaces, you must wrap it in single or double-quotes.

grep "Gnome Display Manager" /etc/passwd

Anchoring

Anchors are meta-characters that allow you to indicate where the match should be found in the line.

The ^ (caret) sign is used to represent an empty string at the start of a line. The string "Linux" will only match in the following example if it appears at the very beginning of a line.

grep '^linux' file.txt

The $ (dollar) sign is used to represent an empty string at the start of a line. You'd type the following command to discover a line that ends with the string "Linux."

grep 'linux$' file.txt

Both anchors can be used to create a regular expression. For example, to find lines that solely include the word "linux," type:

grep '^linux$' file.txt

The ^$ pattern, which matches all empty lines, is another useful example.

Matching Single Character

The symbol . (period) is a meta-character that can be used to match any single character. For example, you could use the following pattern to match anything that starts with "kan," then has two characters, and ends with the string "roo":

grep 'kan..roo' file.txt

Bracket Expressions

By surrounding a group of characters in brackets [], bracket expressions allow you to match them. To identify the lines that contain "accept" or "accent," for example, you may use the expression:

grep 'acce[np]t' file.txt

If the caret is the first character inside the brackets, it matches any single character that is not enclosed in the brackets. The following pattern will match any string beginning with "co" and ending with any letter other than "l" and ending with "la," such as "coca", "cobalt", and so on, but not lines containing "cola":

grep 'co[^l]a' file.txt

You can define a range of characters inside the brackets instead of placing them one by one. The initial and last characters of the range are separated by a hyphen to create a range expression. For instance, [a-a] is the same as [abcde], while [1-3] is the same as [123].

Each line that begins with a capital letter is matched with the following expression:

grep '^[A-Z]' file.txt

Predefined classes of characters are also supported by grep when they are enclosed in brackets. The table below lists some of the most prevalent character types:

Quantifier	Character Classes
`[:alnum:]`	Alphanumeric characters.
`[:alpha:]`	Alphabetic characters.
`[:blank:]`	Space and tab.
`[:digit:]`	Digits.
`[:lower:]`	Lowercase letters.
`[:upper:]`	Uppercase letters.

Check out the grep manual for a complete list of all character classes.

Quantifiers

Quantifiers let you specify the number of times an item must appear in order for a match to occur. The quantifiers supported by GNU grep are listed in the table below:

Quantifier	Description
`*`	`Match the preceding item zero or more times.`
`?`	`Match the preceding item zero or one time.`
`+`	`Match the preceding item one or more times.`
`{n}`	`Match the preceding item exactly n times.`
`{n,}`	`Match the preceding item at least n times.`
`{,m}`	`Match the preceding item at most m times.`
`{n,m}`	`Match the preceding item from n to m times.`

The * (asterisk) character appears zero or more times in the preceding item. "Right," "sright," "ssright," and so on will match the following:

grep 's*right'

A more complex pattern is shown below, which matches all lines that begin with a capital letter and terminate with a period or comma. The .* regex is used to match any number of characters:

grep -E '^[A-Z].*[.,]$' file.txt

The preceding item is made optional by the ? (question mark) character, and it can only match once. Both "bright" and "right" will be matched by the following. Because we're using basic regular expressions, the ? character is escaped with a backslash:

grep 'b\?right' file.txt

Here's the expanded regular expression version of the same regex:

grep -E 'b?right' file.txt

The + (plus) character appears one or more times in the previous item. "sright" and "ssright" will match, but "right" will not:

grep -E 's+right' file.txt

You can use the brace characters {} to define an exact number, an upper or lower bound, or a range of events that must occur for a match to occur.

All integers from 3 to 9 digits are represented by the following:

grep -E '[[:digit:]]{3,9}' file.txt

Alternation

The term "alternation" is a straightforward "OR." The | pipe operator allows you to define many possible matches, which can be literal texts or expression sets. Of all the regular expression operators, this one has the lowest precedence.

We're looking for all occurrences of the words fatal, error, and critical in the Nginx log error file in the example below:

grep 'fatal\|error\|critical' /var/log/nginx/error.log

The operator | should not be escaped if you use the extended regular expression, as demonstrated below:

grep -E 'fatal|error|critical' /var/log/nginx/error.log

Grouping

Regular expressions have a feature called grouping that allows you to group patterns together and refer to them as a single object. Parenthesis are used to form groups ().

The parenthesis must be escaped with a backslash when using simple regular expressions ().

Both "fearless" and "less" are found in the following example. The (fear) group is optional thanks to the ? quantifier:

grep -E '(fear)?less' file.txt

Special Backslash Expressions

Several meta-characters in GNU grep are made up of a backslash followed by a regular character. Some of the most common special backslash expressions are listed in the table below:

Expression	Description
`\b`	Match a word boundary.
`\<`	Match an empty string at the beginning of a word.
`\>`	Match an empty string at the end of a word.
`\w`	Match a word.
`\s`	Match a space.

The pattern below will match the terms "abject" and "object" separately. If the words are embedded in larger words, they will not match:

grep '\b[ao]bject\b' file.txt

FAQs on Regular Expressions in Grep (Regex)

How do I use regular expressions with Grep?

To use regular expressions with Grep, you need to enclose the regular expression pattern within forward slashes ("/") during the search operation. For example, grep /pattern/ file.txt searches for occurrences of "pattern" in "file.txt".

What are metacharacters in regular expressions?

Metacharacters are special characters in regular expressions that have reserved meanings. Some commonly used metacharacters in Grep include ".", "*", "+", "^", "$", and "[ ]". They define positions, classes of characters, quantifiers, and more.

How do I use metacharacters in regular expressions?

To match a metacharacter literally, you need to escape it with a backslash (""). For example, to match a period (.), you would use "." in your regular expression.

Can I search for multiple patterns using Grep?

Yes, you can search for multiple patterns using Grep by separating each pattern with the pipe symbol ("|"). For example, grep 'pattern1|pattern2' file.txt searches for occurrences of either "pattern1" or "pattern2" in "file.txt".

What are character classes in regular expressions?

Character classes allow you to specify a group or range of characters to match. They are represented using square brackets ([]) and can include individual characters or ranges. For example, [a-z] matches any lowercase letter.

How can I make a regular expression case-insensitive in Grep?

To make a regular expression case-insensitive in Grep, you can use the "-i" option. For example, grep -i 'pattern' file.txt ignores the case of characters while searching for "pattern" in "file.txt".

How do I use quantifiers in regular expressions?

Quantifiers specify the number of times a character or group of characters should appear in a match. Some commonly used quantifiers in Grep include "", "+", "?", "{n}", "{n,}", and "{n,m}". For example, "a" matches zero or more occurrences of the letter "a".

Conclusion

Text editors, programming languages, and command-line tools like grep, sed, and awk all use regular expressions. When searching text files, building scripts, or filtering command output, knowing how to generate regular expressions can be extremely useful.

If you have any queries, please leave a comment below and we’ll be happy to respond to them.