How to Install Apache Kafka on Ubuntu 18.04

Choose a different version or distribution

Introduction

Before we begin talking about how to install Apache Kafka on Ubuntu 20.04, let’s briefly understand – What is Apache Kafka?

Apache Kafka is a well-known message broker which has the potential to handle large volumes of real-time data. In comparison with ActiveMQ & RabbitMQ, a Kafka cluster has a higher throughput along with high scalability and fault tolerance. Generally, it is used as a publish/subscribe messaging system. Though, there are organizations that use it for log aggregation because it offers persistent storage for published messages.

In this tutorial, you will install Apache Kafka on Ubuntu 20.04.

Prerequisites

An Ubuntu 20.04 server and a non-root user with sudo privileges
At least 4GB of RAM on the server, otherwise it leads to the Kafka service failing, with the Java virtual machine (JVM) showing an “Out Of Memory” exception during startup.
OpenJDK 17 is installed on the server. Since Kafka is written in Java it requires JVM.

Step 1 – Creating a user for Kafka

1) A dedicated user for Kafka is recommended since it can handle requests over a network. This can turn out to be extremely helpful in case the Kafka server is compromised. A dedicated Kafka user needs to be created in this step, but a different non-root user to perform other tasks on the server needs to be created once Kafka setup is done.

With the help of the useradd command, create a user Kafka:

sudo useradd kafka -m

2) The -m flag will ensure that a home directory is created for the user. /home/kafka, the home directory, which will act as the workspace directory for executing commands.

3) Then, use the passwd command to set a password:

sudo passwd kafka

4) After that, add the Kafka user to the sudo group with the help of the adduser command. This is done so that it has privileges to install Kafka's dependencies.

sudo adduser kafka sudo

5) Next, log in with the help of su:

su -l kafka

Step 2 – Download and Extract Kafka Binaries

1) At first, create a directory in /home/kafka called Downloads to store the downloads.

mkdir ~/Downloads

2) With the help of curl command, download the Kafka binaries.

wget https://downloads.apache.org/kafka/3.4.0/kafka_2.12-3.4.0.tgz -O ~/Downloads/kafka.tgz

3) You then need to create a directory kafka which will be the base directory of the Kafka installation

mkdir ~/kafka && cd ~/kafka

4) Now, extract the downloaded archive using the tar command:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

5) Then, you need to specify the --strip 1 flag to ensure that the archive's contents are extracted in ~/kafka/ and not in directories like /kafka/kafka_2.12-3.4.0/.

Step 3 – Configuring the Kafka Server

1) The default settings don't allow a user to delete a topic, category, group, or feed name to which messages can be published. This needs to be modified, for which the configuration file needs to be edited.

2) Open the server.properties file with the help of a text editor:

nano ~/kafka/config/server.properties

3) Add the following to the bottom of the file in order to add a setting that allows you to delete Kafka topics.

delete.topic.enable = true

4) Save and exit the text editor.

Step 4 – Creating Systemd Unit Files and Starting the Kafka Server

Systemd files help us in performing common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

Kafka uses Zookeeper to manage its cluster states and configurations. It is used in many systems as an integral component.

1) Now, create a unit file for zookeeper:

sudo nano /etc/systemd/system/zookeeper.service

2) After that, enter the following unit definition into the file:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

3) The [Unit] section specifies that Zookeeper requires networking and the file system should be ready so that it can function properly.

The [Service] section specifies that systemd should use zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for initiating and halting the service. Also, it specifies that Zookeeper should be restarted automatically if it exits abnormally.

4) You then need to create the systemd service file for kafka:

sudo nano /etc/systemd/system/kafka.service

5) Next, you have to enter the below-mentioned unit definition into the kafka.service file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

6) The [Unit] section specifies that this unit file depends on zookeeper.service which will ensure zookeeper gets started automatically when the kafka service starts.

The systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. The [Service] section also specifies that Kafka should be restarted if it exits abnormally.

7) Finally, start Kafka with the following command:

sudo systemctl start kafka

8) Next, check the journal logs for the kafka unit to ensure that the server has started successfully.

sudo journalctl -u kafka

9) You will get an output similar as below:

Output

Jul 17 18:38:59 kafka-ubuntu systemd[1]: Started kafka.service.

The Kafka server would be listening on the port 9092.

10) The kafka service won't start automatically if we reboot the server. The following command will help in the same:

sudo systemctl enable kafka

Step 5 – Testing the Installation

You can begin by publishing the message “Hello World” in order to ensure that the Kafka service is behaving correctly. This requires:

A producer, that enables the publication of records and data to topics.
A consumer reads messages and data from topics.

1) Now, for that, you need to create a topic TutorialTopic:

~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

With the help of the kafka-console-producer.sh script, you'll be able to create a producer. The Kafka server’s hostname, port, and topic name are the arguments expected by it.

2) Then, publish the string "Hello, World" to the TutorialTopic topic:

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

3) After that, you should create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name, as arguments.

The following command consumes messages from TutorialTopic.

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

4) You will get Hello, World as an output.

Output

Hello, World

The script will carry on running, waiting for more messages to be published. You may open a new terminal and start a producer to publish a few more messages. All of them shall be visible in the consumer script.

Then, with the help of CTRL+C, stop the consumer script.

Step 6 – Install KafkaT (Optional)

1) KafkaT is a tool from Airbnb that help users by making it easy to view details about the Kafka cluster and perform certain administrative tasks from the command line. Also, you'll require build-essential packaging order to build other gems it is dependent on, you can use apt to install them:

sudo apt install ruby ruby-dev build-essential

2) You can then proceed with installing KafkaT with the help of the gem command:

sudo gem install kafkat

.kafkatcfg is used as the configuration file by KafkaT to determine the installation and log directories of the Kafka server. An entry pointing KafkaT to your ZooKeeper instance should also be there.

3) Create a new file .kafkatcfg:

nano ~/.kafkatcfg

4) Next, you need to add the lines mentioned below to specify the required information about your Kafka server and Zookeeper instance:

{
  "kafka_path": "~/kafka",
  "log_path": "/tmp/kafka-logs",
  "zk_path": "localhost:2181"
}

5) You can use the following command to view details about all Kafka partitions and receive the below output:

kafkat partitions

Output

Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         0             0         [0]             [0]
__consumer_offsets    0               0           [0]                           [0]
...
...

You should be able to see TutorialTopic along with __consumer_offsets, which is useful for storing client-related information. The lines starting with __consumer_offsets can be safely ignored.

Step 7 – Setting Up a Multi-Node Cluster (Optional)

In case you wish to create a multi-broker cluster with the help of more Ubuntu 18.04 machines, you'll have to repeat Steps 1, 4 & 5 on each of the machines. Also, make the following changes to the server.properties file.

The value of the broker.id property needs to be changed such that it is unique throughout the cluster. This property identifies each server in the cluster uniquely and can also have any string as its value. For instance, "server1", "server2", etc.
The value of the zookeeper.connect property needs to be changed such that all node points to one ZooKeeper instance. This property specifies the Zookeeper instance’s address and follows the <HOSTNAME/IP_ADDRESS>:<PORT> format. For instance, "203.0.113.0:2181", "203.0.113.1:2181" etc.

The value of the zookeeper.connect property on each node should be the same, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances, in case you want to have multiple ZooKeeper instances for your cluster.

Step 8 – Restricting the Kafka User

1) You can now proceed to remove all admin privileges from the Kafka user. Make sure to log out and then log in as any other non-root sudo user before you begin. In case you're running the same shell session, simply use exit.

Then, remove Kafka user from the sudo group:

sudo deluser kafka sudo

2) You can lock the Kafka user's password using passwd command, which ensures that nobody directly logs into the server using this account.

sudo passwd kafka -l

3) Only a root or a sudo user can log in as kafka at this point, with the following command:

sudo su - kafka

You can unlock it with the help of passwd with the -u option:

sudo passwd kafka -u

FAQs to Install Apache Kafka on Ubuntu 20.04

What is Apache Kafka, and what is it used for?

Apache Kafka is an open-source distributed streaming platform that is used to publish and subscribe to streams of records in real-time, effectively handling large amounts of data.

What are the system requirements for installing Apache Kafka on Ubuntu 20.04?

To install Apache Kafka on Ubuntu 20.04, you will need a 64-bit operating system with at least 1 GB of RAM, and a minimum of 2 CPU cores.

What is the default directory for installing Apache Kafka on Ubuntu 20.04?

The default directory for installing Apache Kafka on Ubuntu 20.04 is /opt/kafka.

What is the configuration file for Apache Kafka on Ubuntu 20.04?

The configuration file for Apache Kafka on Ubuntu 20.04 is located in the /opt/kafka/config/server.properties file.

How do I start the Apache Kafka server on Ubuntu 20.04?

To start the Apache Kafka server on Ubuntu 20.04, use the command: sudo systemctl start kafka.

How do I check if Apache Kafka is running on Ubuntu 20.04?

To check if Apache Kafka is running on Ubuntu 20.04, use the command: sudo systemctl status kafka.

How do I stop the Apache Kafka server on Ubuntu 20.04?

To stop the Apache Kafka server on Ubuntu 20.04, use the command: sudo systemctl stop kafka.

How do I uninstall Apache Kafka from Ubuntu 20.04?

To uninstall Apache Kafka from Ubuntu 20.04, use the command: sudo apt-get remove kafka.

Conclusion

We hope this detailed guide helped you understand how to install Apache Kafka on Ubuntu 20.04 server. To learn more about Apache Kafka installation on Ubuntu 20.04 server, check out the official installation document.

If you have any queries, please leave a comment below and we’ll be happy to respond to them for sure.