How to Install Apache Kafka on Ubuntu 22.04

Choose a different version or distribution

Introduction

Before we begin talking about how to install Apache Kafka on Ubuntu 22.04, let’s briefly understand – What is Apache Kafka?

Apache Kafka is a popular distributed streaming platform used for real-time data processing. It enables high-throughput, fault-tolerant, and scalable data streaming by providing a publish-subscribe model.

Kafka allows seamless integration of various systems and applications, making it ideal for use cases like data pipelines, event-driven architectures, and real-time analytics. With its reliable messaging system and efficient data handling, Apache Kafka has become a cornerstone in modern data processing and is widely adopted across industries.

In this tutorial, you will install Apache Kafka on Ubuntu 22.04. We will also address a few FAQs on how to install Apache Kafka on Ubuntu 22.04.

Advantages of Apache Kafka

Scalability: Apache Kafka can handle high-volume data streams, making it scalable and capable of accommodating growing data requirements.
Fault-tolerance: It ensures data durability by replicating data across multiple nodes, providing fault-tolerant data processing.
Real-time processing: Kafka enables real-time data streaming, allowing instant analysis and decision-making based on up-to-date information.
Seamless integration: It integrates smoothly with existing systems, applications, and frameworks, making it easy to incorporate into various data pipelines.
High throughput: Apache Kafka can handle thousands of messages per second, ensuring efficient data processing and minimizing processing delays.

Prerequisites to Install Apache Kafka on Ubuntu 22.04

An Ubuntu 22.04 server and a non-root user with sudo privileges
At least 4GB of RAM on the server, otherwise it leads to the Kafka service failing, with the Java virtual machine (JVM) showing an “Out Of Memory” exception during startup.
OpenJDK 17 is installed on the server. Since Kafka is written in Java it requires JVM.

Step 1 – Create a user for Kafka

1) A dedicated user for Kafka is recommended since it can handle requests over a network. This can turn out to be extremely helpful in case the Kafka server is compromised. A dedicated Kafka user needs to be created in this step, but a different non-root user to perform other tasks on the server needs to be created once the Kafka setup is done.

With the help of the useradd command, create a user Kafka:

sudo useradd kafka -m -s /bin/bash

2) The -m flag will ensure that a home directory is created for the user. /home/kafka, the home directory, which will act as the workspace directory for executing commands.

3) Then, use the passwd command to set a password:

sudo passwd kafka

4) After that, add the Kafka user to the sudo group with the help of the adduser command. This is done so that it has privileges to install Kafka's dependencies.

sudo adduser kafka sudo

5) Next, log in with the help of su:

su -l kafka

Step 2 – Download and Extract Kafka Binaries

1) At first, create a directory in /home/kafka called Downloads to store the downloads.

mkdir ~/Downloads

2) With the help of curl command, download the Kafka binaries.

wget https://downloads.apache.org/kafka/3.5.0/kafka_2.13-3.5.0.tgz -O ~/Downloads/kafka.tgz

3) You then need to create a directory kafka which will be the base directory of the Kafka installation

mkdir ~/kafka && cd ~/kafka

4) Now, extract the downloaded archive using the tar command:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

5) Then, you need to specify the --strip 1 flag to ensure that the archive's contents are extracted in ~/kafka/ and not in directories like /kafka/kafka_2.13-3.5.0.tgz.

Step 3 – Configure the Kafka Server

1) The default settings don't allow a user to delete a topic, category, group, or feed name to which messages can be published. This needs to be modified, for which the configuration file needs to be edited.

2) Open the server.properties file with the help of a text editor:

nano ~/kafka/config/server.properties

3) Add the following to the bottom of the file in order to add a setting that allows you to delete Kafka topics.

delete.topic.enable = true

4) Save and exit the text editor.

Step 4 – Create Systemd Unit Files and Starting the Kafka Server

Systemd files help us in performing common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

New version of Kafka is not using Zookeeper to manage its cluster states and configurations.

1) You then need to create the systemd service file for kafka:

sudo nano /etc/systemd/system/kafka.service

2) Next, you have to enter the below-mentioned unit definition into the kafka.service file:

[Unit]

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

The systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. The [Service] section also specifies that Kafka should be restarted if it exits abnormally.

3) Finally, start Kafka with the following command:

sudo systemctl start kafka

4) Next, check the journal logs for the kafka unit to ensure that the server has started successfully.

sudo systemctl status kafka

5) You will get an output similar as below:

Output

● kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-07-06 11:50:15 UTC; 5min ago
   Main PID: 23706 (sh)
      Tasks: 71 (limit: 4618)
     Memory: 328.0M
        CPU: 9.034s
     CGroup: /system.slice/kafka.service
             ├─23706 /bin/sh -c "/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1"
             └─23708 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.hea>

Jul 06 11:50:15 ip-172-31-90-32 systemd[1]: Started kafka.service.

The Kafka server would be listening on the port 9092.

6) The kafka service won't start automatically if we reboot the server. The following command will help in the same:

sudo systemctl enable kafka

Step 5 – Test the Installation

You can begin by publishing the message “Hello World” in order to ensure that the Kafka service is behaving correctly. This requires:

A producer, that enables the publication of records and data to topics.
A consumer reads messages and data from topics.

1) Now, for that, you need to create a topic TutorialTopic:

~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TutorialTopic

With the help of the kafka-console-producer.sh script, you'll be able to create a producer. The Kafka server’s hostname, port, and topic name are the arguments expected by it.

2) Then, publish the string "Hello, World" to the TutorialTopic topic:

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

3) After that, you should create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name, as arguments.

The following command consumes messages from TutorialTopic.

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

4) You will get Hello, World as an output.

Output

Hello, World

The script will carry on running, waiting for more messages to be published. You may open a new terminal and start a producer to publish a few more messages. All of them shall be visible in the consumer script.

Then, with the help of CTRL+C, stop the consumer script.

Step 6 – Set Up a Multi-Node Cluster (Optional)

In case you wish to create a multi-broker cluster with the help of more Ubuntu 22.04 machines, you'll have to repeat Steps 1, 4 & 5 on each of the machines. Also, make the following changes to the server.properties file.

The value of the broker.id property needs to be changed such that it is unique throughout the cluster. This property identifies each server in the cluster uniquely and can also have any string as its value. For instance, "server1", "server2", etc.

Step 8 – Restrict the Kafka User

1) You can now proceed to remove all admin privileges from the Kafka user. Make sure to log out and then log in as any other non-root sudo user before you begin. In case you're running the same shell session, simply use exit.

Then, remove Kafka user from the sudo group:

sudo deluser kafka sudo

2) You can lock the Kafka user's password using passwd command, which ensures that nobody directly logs into the server using this account.

sudo passwd kafka -l

3) Only a root or a sudo user can log in as kafka at this point, with the following command:

sudo su - kafka

You can unlock it with the help of passwd with the -u option:

sudo passwd kafka -u

FAQs to Install Apache Kafka on Ubuntu 22.04

What is Apache Kafka, and what is it used for?

Apache Kafka is an open-source distributed streaming platform that is used to publish and subscribe to streams of records in real-time, effectively handling large amounts of data.

What are the system requirements for installing Apache Kafka on Ubuntu 22.04?

To install Apache Kafka on Ubuntu 22.04, you will need a 64-bit operating system with at least 1 GB of RAM, and a minimum of 2 CPU cores.

What is the default directory for installing Apache Kafka on Ubuntu 22.04?

The default directory for installing Apache Kafka on Ubuntu 22.04 is /opt/kafka.

What is the configuration file for Apache Kafka on Ubuntu 22.04?

The configuration file for Apache Kafka on Ubuntu 22.04 is located in the /opt/kafka/config/server.properties file.

How do I start the Apache Kafka server on Ubuntu 22.04?

To start the Apache Kafka server on Ubuntu 22.04, use the command: sudo systemctl start kafka.

How do I check if Apache Kafka is running on Ubuntu 22.04?

To check if Apache Kafka is running on Ubuntu 22.04, use the command: sudo systemctl status kafka.

How do I stop the Apache Kafka server on Ubuntu 22.04?

To stop the Apache Kafka server on Ubuntu 22.04, use the command: sudo systemctl stop kafka.

How do I uninstall Apache Kafka from Ubuntu 22.04?

To uninstall Apache Kafka from Ubuntu 22.04, use the command: sudo apt-get remove kafka.

Conclusion

We hope this detailed tutorial helped you understand how to install Apache Kafka on Ubuntu 22.04 server. To learn more about Apache Kafka installation on Ubuntu 22.04 server, check out the official installation document.

If you have any queries, please leave a comment below, and we’ll be happy to respond to them for sure.