Awk Command in Linux with Examples

Introduction

Before we discuss Awk Command in Linux with Examples, let's first understand-What is Awk Command ?

awk is a text processing scripting language that can be used for a variety of tasks. It's primarily used for reporting and analysis.

Unlike other procedural programming languages, awk is data-driven, which means you specify a series of actions to be done on the incoming text. It takes the incoming data, alters it, and outputs the result to standard output.

The fundamentals of the awk programming language are covered in this tutorial. Knowing the fundamentals of awk will increase your ability to modify text files on the command line dramatically. We will also address a few FAQs on Awk Command in Linux with Examples.

How awk Works

awk has a number of alternative implementations. We'll use gawk, which is the GNU implementation of awk. The awk interpreter is simply a symlink to gawk on most Linux systems.

Records and Fields

awk can read and write text files and streams. Records and fields are used to organize the input data. awk works on one record at a time until it reaches the end of the input. The record separator is a character that separates records. The newline character is the default record separator, meaning that each line in the text data is a record. The RS variable can be used to create a new record separator.

The field separator separates the fields that make up a record. Fields are separated by whitespace by default, which includes one or more tab, space, and newline characters.

Each record's fields are identified by a dollar symbol ($) followed by a field number, starting with 1. The first field has a value of $1, the second has a value of $2, and so on. The special variable $NF can also be used to refer to the last field. $0 can be used to refer to the entire record.

Here's an example of how to refer to records and fields visually:

tmpfs      788M  1.8M  786M   1% /run/lock 
/dev/sda1  234G  191G   31G  87% /
|-------|  |--|  |--|   |--| |-| |--------| 
   $1       $2    $3     $4   $5  $6 ($NF) --> fields
|-----------------------------------------| 
                    $0                     --> record

Awk Program

To use awk to process a text, you must first write a program that instructs the command on what to do. A set of rules and user-defined functions make up the program. Each rule has one pattern and one action. Newlines or semi-colons are used to separate rules (;). An awk program usually looks like this:

pattern { action }
pattern { action }
...

If the pattern matches the record, awk takes the specified action on that record while processing data. When there is no pattern in the rule, it matches all records (lines).

The statement of an awk action are contained in braces (). The operation to be done is specified in each statement. Multiple statements can be separated by newlines or semi-colons in an action (;). If the rule has no action, it prints the entire record by default.

awk can handle a variety of statements, such as expressions, conditionals, input, output, and more. The following are the most common awk statements:

  • exit - Stops the entire program from running and exits.
  • next - Stops the current record from being processed and moves on to the next record in the input data.
  • print - Print custom text, fields, variables, and records.
  • printf - Like C and bash printf, gives you more control over the output format.

Everything following the hash mark (#) and until the end of the line is considered a comment when developing awk programs. The continuation character, backslash  (\), can be used to break long lines into numerous lines.

Executing awk Programs

There are various ways to run an awk program. If the program is short and basic, it can be run directly from the command line using the awk interpreter:

awk 'program' input-file...

When running the program from the command line, single quotes ('') should be used to prevent the shell from interpreting it.

If the program is long and complex, it's better to save it to a file and feed it to the awk command using the -f option:

awk -f program-file input-file...

Example file:

Bucks Milwaukee    60 22 0.732 
Raptors Toronto    58 24 0.707 
76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598
Pacers Indiana     48 34 0.585

Awk Patterns

In awk, patterns determine whether or not the related action should be executed.

Regular expressions, relation expressions, range expressions, and special expression patterns are all supported by awk.

Each input record is matched when the rule has no pattern. Here's an example of a rule that only has a single action:

awk '{ print $3 }' teams.txt

The third field of each record will be printed by the program:

Output

60
58
51
49
48

Regular Expression Patterns

A regex, or regular expression, is a pattern that matches a set of strings. Slashes (//) are used to separate regular expression patterns in awk:

/regex pattern/ { action }

A literal character or string match is the most basic example. For example, you might execute the following command to see the first field of each record that contains "0.5":

awk '/0.5/ { print $1 }' teams.txt
Output

Celtics
Pacers

Any sort of extended regular expression can be used as the pattern. If the record begins with two or more digits, this example outputs the first field:

awk '/^[0-9][0-9]/ { print $1 }' teams.txt
Output

76ers

Relational Expressions Patterns

The content of a specific field or variable is usually matched using relational expressions patterns.

Regular expression patterns are matched against the records by default. To match a regex against a field, give the field and compare the pattern with the "contain" comparison operator (~).

For example, to display the first field of each record with "ia" in the second field, type:

awk '$2 ~ /ia/ { print $1 }' teams.txt
Output

76ers
Pacers

Use the !~ operator to match fields that don't contain a given pattern:

awk '$2 !~ /ia/ { print $1 }' teams.txt
Output

Bucks
Raptors
Celtics

For relationships like greater than, less than, equal, and so on, you can compare strings or numbers. The first field of all records whose third field is bigger than 50 is printed using the following command:

awk '$3 > 50 { print $1 }' teams.txt
Output

Bucks
Raptors
76ers

Range patterns

Two patterns are separated by a comma in a range pattern:

pattern1, pattern2

All records are matched, starting with the first record that matches the first pattern and ending with the second record that matches the second pattern.

Here's an example of how to print the first field of all records, starting with the "Raptors" record and ending with the "Celtics" record:

awk '/Raptors/,/Celtics/ { print $1 }' teams.txt
Output

Raptors
76ers
Celtics

Relation expressions can also be used as patterns. The following command will print all records beginning with the one whose fourth field equals 32 and ending with the one whose fourth field equals 33:

awk '$4 == 31, $4 == 33 { print $0 }' teams.txt
Output

76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598

Other pattern expressions cannot be coupled with range patterns.

Special Expression Patterns

awk has the following unique patterns:

  • BEGIN - Used to execute actions prior to the processing of records.
  • END - This command is used to execute actions once records have been processed.

The BEGIN pattern is commonly used to set variables, while the END pattern is typically used to handle data from records, such as calculations.

"Start Processing." is printed first, followed by the third field of each record, and finally "End Processing." :

awk 'BEGIN { print "Start Processing." }; { print $3 }; END { print "End Processing." }' teams.txt
Output

Start Processing
60
58
51
49
48
End Processing.

When a program merely has a BEGIN pattern, it executes actions but does not process input. If a program just has a END pattern, the input is processed before the rule actions are performed.

BEGINFILE and ENDFILE are two more special patterns in the Gnu version of awk that allow you to conduct actions when processing files.

Combining Patterns

Using the logical AND (&&) and logical OR (||) operators in awk, you can combine two or more patterns.

Here's an example of how the && operator may be used to print the first field of records with a third field greater than 50 and a fourth field less than 30:

awk '$3 > 50 && $4 < 30 { print $1 }' teams.txt
Output

Bucks
Raptors

Built-in Variables

awk comes with a variety of built-in variables that provide useful data and allow you to modify how the program is executed. Some of the most common built-in variables are listed below:

  • NF - The number of fields in the record.
  • NR - The number of the current record.
  • FILENAME - The name of the current input file being processed.
  • FS stands for field separator.
  • RS stands for record separator.
  • OFS stands for output field separator.
  • ORS stands for output record separator.

An example of how to print the file name and the number of lines (records) is as follows:

awk 'END { print "File", FILENAME, "contains", NR, "lines." }' teams.txt
Output

File teams.txt contains 5 lines.

Variables in awk can be set on any line of the program. Put a variable in a BEGIN pattern to define it for the entire program.

Changing the Field and Record Separator

The field separator's default value is any number of space or tab characters. The FS variable can be adjusted to change it.

To set the field separator to . you'd type:

awk 'BEGIN { FS = "." } { print $1 }' teams.txt
Output
Bucks Milwaukee    60 22 0
Raptors Toronto    58 24 0
76ers Philadelphia 51 31 0
Celtics Boston     49 33 0
Pacers Indiana     48 34 0

More than one character can be used as a field separator:

awk 'BEGIN { FS = ".." } { print $1 }' teams.txt

You may also use the -F option to modify the field separator when running awk one-liners from the command line:

awk -F "." '{ print $1 }' teams.txt

The record separator is set to a newline character by default, but it can be altered using the RS variable.

The following is an example of how to alter the record separator to.:

awk 'BEGIN { RS = "." } { print $1 }' teams.txt
Output

Bucks Milwaukee    60 22 0
732 
Raptors Toronto    58 24 0
707 
76ers Philadelphia 51 31 0
622
Celtics Boston     49 33 0
598
Pacers Indiana     48 34 0
585

Awk Actions

When the pattern matches, awk actions are enclosed in braces ({}) and executed. There can be zero or more statements in a single action. Multiple statements must be separated by newlines or semi-colons and are executed in the order in which they appear (;).

In awk, there are numerous forms of action statements that can be used:

  • Variable assignment, arithmetic operators, increment, and decrement operators are examples of expressions.
  • Control statements are statements that are used to control the program's flow (if, for, while, switch, and more)
  • print and printf are examples of output statements.
  • To group additional assertions, use compound statements.
  • Input statements are used to govern how the input is processed.
  • To remove array elements, use deletion statements.

The print statement is arguably the most commonly used awk command. It outputs text, records, fields, and variables in a prepared output.

When printing numerous things, use commas to separate them. Here's an example:

awk '{ print $1, $3, $5 }' teams.txt

Single spaces are used to divide the printed items:

Output

Bucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585

There will be no space between the things if you don't use commas:

awk '{ print $1 $3 $5 }' teams.txt

Concatenation of printed items:

Output

Bucks600.732
Raptors580.707
76ers510.622
Celtics490.598
Pacers480.585

When print is called without a parameter it gives print $0. The most recent record gets printed.

To print a custom text, use double-quote characters to quote the content:

awk '{ print "The first field:", $1}' teams.txt
Output

The first field: Bucks
The first field: Raptors
The first field: 76ers
The first field: Celtics
The first field: Pacers

Special characters, such as newline can also be printed.

awk 'BEGIN { print "First line\nSecond line\nThird line" }'
Output

First line
Second line
Third line

You have additional control over the output format with the printfstatement. Here's an example of where line numbers are inserted:

awk '{ printf "%3d. %s\n", NR, $0 }' teams.txt

We're using \n since printf doesn't produce a newline after each row:

Output

1. Bucks Milwaukee    60 22 0.732 
2. Raptors Toronto    58 24 0.707 
3. 76ers Philadelphia 51 31 0.622
4. Celtics Boston     49 33 0.598
5. Pacers Indiana     48 34 0.585

The sum of the values stored in the third field of each line is calculated with the following command:

awk '{ sum += $3 } END { printf "%d\n", sum }' teams.txt
Output

266

Here's another example of how to print the squares of numbers from 1 to 5 using expressions and control statements:

awk 'BEGIN { i = 1; while (i < 6) { print "Square of", i, "is", i*i; ++i } }'
Output

Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25

One-line commands, like the one above, are more difficult to comprehend and manage. You need to create a separate program file for lengthy programs:

BEGIN { 
  i = 1
  while (i < 6) { 
    print "Square of", i, "is", i*i; 
    ++i 
  } 
}

Pass the file name to the awk interpreter to run the program:

awk -f prg.awk

You may also use the shebang directive and set the awk interpreter to run an awk program as an executable:

#!/usr/bin/awk -f
BEGIN { 
  i = 1
  while (i < 6) { 
    print "Square of", i, "is", i*i; 
    ++i 
  } 
}

Make the file executable by saving it:

chmod +x prg.awk

You may now execute the application by typing in the following command:

./prg.awk

Using Shell Variables in Awk Programs

If you're using the awk command in a shell script, you'll almost certainly need to supply a shell variable to it. One option is to use double quotes instead of single quotes to surround the program and substitute the variable in the program. However, because you'll have to escape the awk variables, this technique will make your awk program more complicated.

Assigning a shell variable to an awk variable is the recommended approach to use shell variables in awk applications. Here's an illustration:

num=51
awk -v n="$num" 'BEGIN {print n}'
Output

51

FAQs on Awk Command in Linux

How does the awk command work in Linux?

awk reads input files (or standard input) line by line, matching patterns and executing corresponding actions. It splits lines into fields based on a delimiter, and you can manipulate and print these fields as desired.

How do I use the awk command to process data in a file?

To use awk to process data in a file, you typically provide a set of patterns and corresponding actions. For example, awk '/pattern/ { action }' file.txt will execute the action for each line that matches the pattern.

Can I use awk to extract specific columns from a file?

Yes, awk can extract specific columns from a file. By using the field separator (FS) and accessing fields by their index, you can print selected columns. For example, awk -F',' '{ print $1, $3 }' file.csv will print the first and third columns, assuming the file is comma-separated.

How can I use awk to perform calculations on numeric data?

awk has built-in variables and arithmetic operators that allow you to perform calculations on numeric data. You can create formulas and use the printf function to format the output. For example, awk '{ total += $1 } END { printf "Sum: %.2f\n", total }' file.txt calculates the sum of the first column.

Can I use conditionals with awk to process data selectively?

Yes, you can use conditionals (if, else, else if) within awk to process data selectively. By specifying conditions, you can determine which actions to take based on the input. For example, awk '{ if ($1 > 10) print $2; else print $3 }' file.txt prints the second field if the first field is greater than 10; otherwise, it prints the third field.

How can I specify the field separator in awk?

By default, awk uses whitespace as the field separator. However, you can specify a different field separator using the -F flag. For example, -F',' sets the field separator to a comma.

Is it possible to process multiple files with awk?

Yes, you can process multiple files with awk by providing multiple filenames as arguments. awk processes each file sequentially, applying patterns and actions for each file.

Conclusion

awk is one of the most powerful text manipulation tools available.

The awk programming language is hardly scratched in this tutorial. Check out the official Gawk documentation to learn more about awk.

If you have any queries, please leave a comment below and we’ll be happy to respond to them.