Cut Command in Linux

Introduction

Many utilities for processing and filtering text files are available in Linux and Unix platforms. cut is a command-line tool for cutting lines from files or piped data and printing the result to standard output. It can be used to break lines into chunks based on the delimiter, byte position, and character.

In this tutorial we'll show you how to use the cut command with examples and thorough explanations of the most common choices. We will also address a few FAQs on cut Command in Linux.

How to Use the `cut` Command

The cut command has the following syntax:

cut OPTION... [FILE]...

When cutting out selected bits of lines, the parameters that instruct cut whether to use a delimiter, byte location, or character are as follows:

-f (--fields=LIST) - Select a field, a set of fields, or a range of fields. This is by far the most popular option.
-b (--bytes=LIST) - Choose a byte, a group of bytes, or a range of bytes.
-c (--characters=LIST) - Select by specify a character, a set of characters, or a range of characters.

One and only one of the choices given above can be used.

Other options include:

-d (--delimiter) - Replaces the default "TAB" delimiter with a delimiter of your choice.
--complement - Adds a finishing touch to the selection. When this option is selected, all bytes, characters, or fields are displayed except the one that is selected.
-s (--only-delimited) - Cut prints only the lines with no delimiter character by default. Cut does not print lines that do not have delimiters when this option is selected.
--output-delimiter - Cut uses the input delimiter as the output delimiter by default. You can define an alternative output delimiter string with this option.

The cut command can take one or more FILE names as input. Cut will read from the standard input if no FILE is supplied or if FILE is -.

The LIST parameter can be an integer, several integers separated by commas, a range of numbers, or multiple integer ranges separated by commas when used with the -f, -b, and -c options. One of the following ranges can be used for each range:

N is the Nth field, byte, or character starting from 1.
N- from the Nth field, byte or character, to the end of the line.
N-M from the Nth to the Mth field, byte, or character.
-M from the first to the Mth field, byte, or character.

How to Cut by Field

The -f argument is used to define the fields that should be cut when using the command. The default delimiter is "TAB" if none is supplied.

The following file will be used in the examples below. Tabs are used to separate the fields.

245:789 4567    M:4540  Admin   01:10:1980
535:763 4987    M:3476  Sales   11:04:1978

To display the first and third fields, for example, you would type:

cut test.txt -f 1,3

Output

245:789	M:4540
535:763	M:3476

Alternatively, if you wish to show data from the first to the fourth field, type:

cut test.txt -f -4

Output

245:789	4567	M:4540	Admin
535:763	4987	M:3476	Sales

How to cut based on a delimiter

To cut using a delimiter, use the command with the -d option and the delimiter you want to use.

You would type the following command to display the first and third fields using ":" as a delimiter:

cut test.txt -d ':' -f 1,3

Output

245:4540	Admin	01
535:3476	Sales	11

As a delimiter, you can use any single character. The space character is used as a delimiter in the following example, and the second field is printed:

echo "Lorem ipsum dolor sit amet" | cut -d ' ' -f 2

Output

ipsum

How to complement the selection

Use the --complement option to add to the selected field list. Only the fields not chosen with the -f option will be printed.

All fields except the first and third will be printed with the following command:

cut test.txt -f 1,3 --complement

Output

4567	Admin	01:10:1980
4987	Sales	11:04:1978

How to specify an output delimiter

The --output-delimiter option is used to specify the output delimiter. To set the output delimiter to _ for example, type:

cut test.txt -f 1,3 --output-delimiter='_'

Output

245:789_M:4540
535:763_M:3476

How to Cut by Bytes and Characters

Before we proceed any further, it's important to understand the difference between bytes and characters.

A byte is an 8-bit value that may hold 256 distinct values. The ASCII standard took into account all the letters, numbers, and symbols required to function with English when it was created. Each character is represented by one byte in the ASCII character table, which has 128 characters. When computers became more widely available, tech companies began to develop new character encodings for many languages. A simple 1 to 1 mapping was not practicable for languages with more than 256 characters. This causes issues such as sharing documents and surfing websites, necessitating the creation of a new Unicode standard that can accommodate the majority of the world's writing systems. To address these issues, UTF-8 was established. Not all characters in UTF-8 are represented by a single byte. A single byte to four bytes can be used to represent a character.

The -b (--bytes) option instructs the command to cut chunks from each line based on byte positions specified.

The ü character, which occupies two bytes, is used in the following examples.

Choose the fifth byte:

echo 'drüberspringen' | cut -b 5

Output

b

Choose the 5th, 9th, and 13th bytes:

echo 'drüberspringen' | cut -b 5,9,13

bpg

Choose the range from 1st to 5th byte:

echo 'drüberspringen' | cut -b 1-5

Output

drüb

At the time of writing, the version of cut included with GNU coreutils did not offer a character-by-character option. cut acts similarly to the -b option when using the -c option.

Cut Examples

Typically, the cut command is used in conjunction with other commands via piping. Listed below are a few examples:

Get a list of all users

cut receives the output of the getent passwd command, which publishes the first field using the delimiter :.

getent passwd | cut -d ':' -f1

A list of all system users appears in the output.

View 10 most frequently used commands

cut is used to strip the first 8 bytes from each line of the history command output in the following example.

history | cut -c8- | sort | uniq -c | sort -rn | head

FAQs on cut Command in Linux

How can I extract a specific column from a text file using the `cut` command?

To extract a specific column, use the -f option followed by the column number(s) or range(s) of columns to extract. For example, cut -f3 myfile.txt will extract the third column from the text file.

Can I extract multiple columns at once with the `cut` command?

Yes, you can extract multiple columns at once by specifying a list of column numbers or ranges separated by commas. For example, cut -f1,3,5 myfile.txt will extract the first, third, and fifth columns.

How do I change the output delimiter of the `cut` command?

To change the output delimiter, use the -s option along with the -d option, followed by the desired character or string. For example, cut -d',' -s -f1 myfile.txt will use a comma as the output delimiter.

Can I use the `cut` command to extract characters from a specific position in each line?

Yes, you can use the -c option followed by a comma-separated list of character positions or ranges. For example, cut -c1-5 myfile.txt will extract the first five characters from each line.

How do I specify a specific range of characters to extract using the `cut` command?

To specify a range of characters, use the -b option followed by a comma-separated list of byte positions or ranges. For example, cut -b3-8 myfile.txt will extract bytes 3 to 8 from each line.

What if my text file contains different field lengths?

By default, the cut command treats each line as having the same field lengths. However, you can use the -s option to suppress lines that do not contain delimiters, effectively skipping lines with different field lengths.

Is it possible to count the number of fields or characters extracted using the `cut` command?

Yes, you can use the wc command in conjunction with the cut command to count the number of fields or characters extracted. For example, cut -f3 myfile.txt | wc -l will count the number of fields extracted from the third column.