Awk Command in Linux with Examples
Introduction
Before we discuss Awk Command in Linux with Examples, let's first understand-What is Awk Command ?
awk
is a text processing scripting language that can be used for a variety of tasks. It's primarily used for reporting and analysis.
Unlike other procedural programming languages, awk
is data-driven, which means you specify a series of actions to be done on the incoming text. It takes the incoming data, alters it, and outputs the result to standard output.
The fundamentals of the awk
programming language are covered in this tutorial. Knowing the fundamentals of awk
will increase your ability to modify text files on the command line dramatically. We will also address a few FAQs on Awk Command in Linux with Examples.
How awk
Works
awk
has a number of alternative implementations. We'll use gawk
, which is the GNU implementation of awk
. The awk
interpreter is simply a symlink to gawk
on most Linux systems.
Records and Fields
awk
can read and write text files and streams. Records and fields are used to organize the input data. awk
works on one record at a time until it reaches the end of the input. The record separator is a character that separates records. The newline character is the default record separator, meaning that each line in the text data is a record. The RS
variable can be used to create a new record separator.
The field separator separates the fields that make up a record. Fields are separated by whitespace by default, which includes one or more tab, space, and newline characters.
Each record's fields are identified by a dollar symbol ($
) followed by a field number, starting with 1
. The first field has a value of $1
, the second has a value of $2
, and so on. The special variable $NF
can also be used to refer to the last field. $0
can be used to refer to the entire record.
Here's an example of how to refer to records and fields visually:
tmpfs 788M 1.8M 786M 1% /run/lock
/dev/sda1 234G 191G 31G 87% /
|-------| |--| |--| |--| |-| |--------|
$1 $2 $3 $4 $5 $6 ($NF) --> fields
|-----------------------------------------|
$0 --> record
Awk Program
To use awk
to process a text, you must first write a program that instructs the command on what to do. A set of rules and user-defined functions make up the program. Each rule has one pattern and one action. Newlines or semi-colons are used to separate rules (;). An awk
program usually looks like this:
pattern { action }
pattern { action }
...
If the pattern matches the record, awk
takes the specified action on that record while processing data. When there is no pattern in the rule, it matches all records (lines).
The statement of an awk
action are contained in braces ()
. The operation to be done is specified in each statement. Multiple statements can be separated by newlines or semi-colons in an action (;
). If the rule has no action, it prints the entire record by default.
awk
can handle a variety of statements, such as expressions, conditionals, input, output, and more. The following are the most common awk
statements:
exit
- Stops the entire program from running and exits.next
- Stops the current record from being processed and moves on to the next record in the input data.print
- Print custom text, fields, variables, and records.printf
- Like C and bashprintf
, gives you more control over the output format.
Everything following the hash mark (#)
and until the end of the line is considered a comment when developing awk
programs. The continuation character, backslash (\
), can be used to break long lines into numerous lines.
Executing awk
Programs
There are various ways to run an awk
program. If the program is short and basic, it can be run directly from the command line using the awk interpreter:
awk 'program' input-file...
When running the program from the command line, single quotes (''
) should be used to prevent the shell from interpreting it.
If the program is long and complex, it's better to save it to a file and feed it to the awk
command using the -f
option:
awk -f program-file input-file...
Example file:
Bucks Milwaukee 60 22 0.732
Raptors Toronto 58 24 0.707
76ers Philadelphia 51 31 0.622
Celtics Boston 49 33 0.598
Pacers Indiana 48 34 0.585
Awk Patterns
In awk
, patterns determine whether or not the related action should be executed.
Regular expressions, relation expressions, range expressions, and special expression patterns are all supported by awk
.
Each input record is matched when the rule has no pattern. Here's an example of a rule that only has a single action:
awk '{ print $3 }' teams.txt
The third field of each record will be printed by the program:
Output
60
58
51
49
48
Regular Expression Patterns
A regex, or regular expression, is a pattern that matches a set of strings. Slashes (//
) are used to separate regular expression patterns in awk
:
/regex pattern/ { action }
A literal character or string match is the most basic example. For example, you might execute the following command to see the first field of each record that contains "0.5":
awk '/0.5/ { print $1 }' teams.txt
Output
Celtics
Pacers
Any sort of extended regular expression can be used as the pattern. If the record begins with two or more digits, this example outputs the first field:
awk '/^[0-9][0-9]/ { print $1 }' teams.txt
Output
76ers
Relational Expressions Patterns
The content of a specific field or variable is usually matched using relational expressions patterns.
Regular expression patterns are matched against the records by default. To match a regex against a field, give the field and compare the pattern with the "contain" comparison operator (~
).
For example, to display the first field of each record with "ia" in the second field, type:
awk '$2 ~ /ia/ { print $1 }' teams.txt
Output
76ers
Pacers
Use the !~
operator to match fields that don't contain a given pattern:
awk '$2 !~ /ia/ { print $1 }' teams.txt
Output
Bucks
Raptors
Celtics
For relationships like greater than, less than, equal, and so on, you can compare strings or numbers. The first field of all records whose third field is bigger than 50 is printed using the following command:
awk '$3 > 50 { print $1 }' teams.txt
Output
Bucks
Raptors
76ers
Range patterns
Two patterns are separated by a comma in a range pattern:
pattern1, pattern2
All records are matched, starting with the first record that matches the first pattern and ending with the second record that matches the second pattern.
Here's an example of how to print the first field of all records, starting with the "Raptors" record and ending with the "Celtics" record:
awk '/Raptors/,/Celtics/ { print $1 }' teams.txt
Output
Raptors
76ers
Celtics
Relation expressions can also be used as patterns. The following command will print all records beginning with the one whose fourth field equals 32 and ending with the one whose fourth field equals 33:
awk '$4 == 31, $4 == 33 { print $0 }' teams.txt
Output
76ers Philadelphia 51 31 0.622
Celtics Boston 49 33 0.598
Other pattern expressions cannot be coupled with range patterns.
Special Expression Patterns
awk
has the following unique patterns:
BEGIN
- Used to execute actions prior to the processing of records.END
- This command is used to execute actions once records have been processed.
The BEGIN
pattern is commonly used to set variables, while the END
pattern is typically used to handle data from records, such as calculations.
"Start Processing." is printed first, followed by the third field of each record, and finally "End Processing." :
awk 'BEGIN { print "Start Processing." }; { print $3 }; END { print "End Processing." }' teams.txt
Output
Start Processing
60
58
51
49
48
End Processing.
When a program merely has a BEGIN
pattern, it executes actions but does not process input. If a program just has a END
pattern, the input is processed before the rule actions are performed.BEGINFILE
and ENDFILE
are two more special patterns in the Gnu version of awk
that allow you to conduct actions when processing files.
Combining Patterns
Using the logical AND (&&
) and logical OR (||
) operators in awk
, you can combine two or more patterns.
Here's an example of how the &&
operator may be used to print the first field of records with a third field greater than 50 and a fourth field less than 30:
awk '$3 > 50 && $4 < 30 { print $1 }' teams.txt
Output
Bucks
Raptors
Built-in Variables
awk
comes with a variety of built-in variables that provide useful data and allow you to modify how the program is executed. Some of the most common built-in variables are listed below:
NF
- The number of fields in the record.NR
- The number of the current record.FILENAME
- The name of the current input file being processed.FS
stands for field separator.RS
stands for record separator.OFS
stands for output field separator.ORS
stands for output record separator.
An example of how to print the file name and the number of lines (records) is as follows:
awk 'END { print "File", FILENAME, "contains", NR, "lines." }' teams.txt
Output
File teams.txt contains 5 lines.
Variables in awk
can be set on any line of the program. Put a variable in a BEGIN
pattern to define it for the entire program.
Changing the Field and Record Separator
The field separator's default value is any number of space or tab characters. The FS
variable can be adjusted to change it.
To set the field separator to .
you'd type:
awk 'BEGIN { FS = "." } { print $1 }' teams.txt
Output
Bucks Milwaukee 60 22 0
Raptors Toronto 58 24 0
76ers Philadelphia 51 31 0
Celtics Boston 49 33 0
Pacers Indiana 48 34 0
More than one character can be used as a field separator:
awk 'BEGIN { FS = ".." } { print $1 }' teams.txt
You may also use the -F
option to modify the field separator when running awk
one-liners from the command line:
awk -F "." '{ print $1 }' teams.txt
The record separator is set to a newline character by default, but it can be altered using the RS
variable.
The following is an example of how to alter the record separator to.:
awk 'BEGIN { RS = "." } { print $1 }' teams.txt
Output
Bucks Milwaukee 60 22 0
732
Raptors Toronto 58 24 0
707
76ers Philadelphia 51 31 0
622
Celtics Boston 49 33 0
598
Pacers Indiana 48 34 0
585
Awk Actions
When the pattern matches, awk actions are enclosed in braces ({}
) and executed. There can be zero or more statements in a single action. Multiple statements must be separated by newlines or semi-colons and are executed in the order in which they appear (;)
.
In awk
, there are numerous forms of action statements that can be used:
- Variable assignment, arithmetic operators, increment, and decrement operators are examples of expressions.
- Control statements are statements that are used to control the program's flow (
if
,for
,while
,switch
, and more) print
andprintf
are examples of output statements.- To group additional assertions, use compound statements.
- Input statements are used to govern how the input is processed.
- To remove array elements, use deletion statements.
The print
statement is arguably the most commonly used awk
command. It outputs text, records, fields, and variables in a prepared output.
When printing numerous things, use commas to separate them. Here's an example:
awk '{ print $1, $3, $5 }' teams.txt
Single spaces are used to divide the printed items:
Output
Bucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585
There will be no space between the things if you don't use commas:
awk '{ print $1 $3 $5 }' teams.txt
Concatenation of printed items:
Output
Bucks600.732
Raptors580.707
76ers510.622
Celtics490.598
Pacers480.585
When print
is called without a parameter it gives print $0
. The most recent record gets printed.
To print a custom text, use double-quote characters to quote the content:
awk '{ print "The first field:", $1}' teams.txt
Output
The first field: Bucks
The first field: Raptors
The first field: 76ers
The first field: Celtics
The first field: Pacers
Special characters, such as newline can also be printed.
awk 'BEGIN { print "First line\nSecond line\nThird line" }'
Output
First line
Second line
Third line
You have additional control over the output format with the printf
statement. Here's an example of where line numbers are inserted:
awk '{ printf "%3d. %s\n", NR, $0 }' teams.txt
We're using \n
since printf
doesn't produce a newline after each row:
Output
1. Bucks Milwaukee 60 22 0.732
2. Raptors Toronto 58 24 0.707
3. 76ers Philadelphia 51 31 0.622
4. Celtics Boston 49 33 0.598
5. Pacers Indiana 48 34 0.585
The sum of the values stored in the third field of each line is calculated with the following command:
awk '{ sum += $3 } END { printf "%d\n", sum }' teams.txt
Output
266
Here's another example of how to print the squares of numbers from 1 to 5 using expressions and control statements:
awk 'BEGIN { i = 1; while (i < 6) { print "Square of", i, "is", i*i; ++i } }'
Output
Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25
One-line commands, like the one above, are more difficult to comprehend and manage. You need to create a separate program file for lengthy programs:
BEGIN {
i = 1
while (i < 6) {
print "Square of", i, "is", i*i;
++i
}
}
Pass the file name to the awk
interpreter to run the program:
awk -f prg.awk
You may also use the shebang directive and set the awk interpreter to run an awk
program as an executable:
#!/usr/bin/awk -f
BEGIN {
i = 1
while (i < 6) {
print "Square of", i, "is", i*i;
++i
}
}
Make the file executable by saving it:
chmod +x prg.awk
You may now execute the application by typing in the following command:
./prg.awk
Using Shell Variables in Awk Programs
If you're using the awk
command in a shell script, you'll almost certainly need to supply a shell variable to it. One option is to use double quotes instead of single quotes to surround the program and substitute the variable in the program. However, because you'll have to escape the awk
variables, this technique will make your awk
program more complicated.
Assigning a shell variable to an awk
variable is the recommended approach to use shell variables in awk
applications. Here's an illustration:
num=51
awk -v n="$num" 'BEGIN {print n}'
Output
51
FAQs on Awk Command in Linux
How does the awk
command work in Linux?
awk
reads input files (or standard input) line by line, matching patterns and executing corresponding actions. It splits lines into fields based on a delimiter, and you can manipulate and print these fields as desired.
How do I use the awk
command to process data in a file?
To use awk
to process data in a file, you typically provide a set of patterns and corresponding actions. For example, awk '/pattern/ { action }' file.txt
will execute the action for each line that matches the pattern.
Can I use awk
to extract specific columns from a file?
Yes, awk
can extract specific columns from a file. By using the field separator (FS
) and accessing fields by their index, you can print selected columns. For example, awk -F',' '{ print $1, $3 }' file.csv
will print the first and third columns, assuming the file is comma-separated.
How can I use awk
to perform calculations on numeric data?
awk
has built-in variables and arithmetic operators that allow you to perform calculations on numeric data. You can create formulas and use the printf
function to format the output. For example, awk '{ total += $1 } END { printf "Sum: %.2f\n", total }' file.txt
calculates the sum of the first column.
Can I use conditionals with awk
to process data selectively?
Yes, you can use conditionals (if
, else
, else if
) within awk
to process data selectively. By specifying conditions, you can determine which actions to take based on the input. For example, awk '{ if ($1 > 10) print $2; else print $3 }' file.txt
prints the second field if the first field is greater than 10; otherwise, it prints the third field.
How can I specify the field separator in awk
?
By default, awk
uses whitespace as the field separator. However, you can specify a different field separator using the -F
flag. For example, -F','
sets the field separator to a comma.
Is it possible to process multiple files with awk
?
Yes, you can process multiple files with awk
by providing multiple filenames as arguments. awk
processes each file sequentially, applying patterns and actions for each file.
Conclusion
awk
is one of the most powerful text manipulation tools available.
The awk
programming language is hardly scratched in this tutorial. Check out the official Gawk documentation to learn more about awk
.
If you have any queries, please leave a comment below and we’ll be happy to respond to them.