How To Install Apache Kafka on Debian 10

Choose a different version or distribution

Introduction

Before we begin talking about how to install apache kafka on debian 10, let's briefly understand – What is Apache Kafka?

Apache Kafka is a distributed streaming platform that enables real-time processing of high volumes of data streams. Installing Apache Kafka on Debian 10 allows you to set up a scalable and fault-tolerant messaging system.

A publish/subscribe messaging system enables one or more producers to publish messages without taking into account their target market size or the recipients' processing capabilities.

The goal of this tutorial is to teach you how to securely install and set up Apache Kafka 2.1.1 on a Debian 10 server.

Advantages of Installing Apache Kafka on Debian 10

Scalability: Apache Kafka allows you to scale your data streams by distributing them across multiple broker nodes in a cluster, ensuring high throughput and low latency.
Fault-Tolerance: Kafka's distributed nature and support for replication enable fault-tolerant data processing and resilience against hardware failures.
Real-time Stream Processing: Apache Kafka provides a platform for building real-time stream processing systems, allowing you to react to events as they happen and process data in near real-time.
Integration with Big Data Ecosystem: Kafka integrates well with other components of the big data ecosystem, such as Apache Spark, Apache Storm, and Hadoop, enabling seamless data ingestion and processing.
Reliable Messaging System: Kafka offers persistent storage and durable message queues, ensuring reliable and guaranteed delivery of messages in the order they were sent

Prerequisites

You'll need the following to follow along:

A single Debian 10 server with a minimum of 4 GB of RAM and a non-root user with sudo access. If you do not already have a non-root user set up, proceed with the instructions in our Initial Server Setup tutorial for Debian 10.
Installed on your server is OpenJDK 11. Follow the directions for installing certain versions of OpenJDK in How To Install Java with Apt on Debian 10 to install this version. Java is used to write Kafka.

ℹ️

Note: Installations with less than 4 GB of RAM may result in the Kafka service failing, with the Java virtual machine (JVM) generating an exception for Out Of Memory upon startup.

Step 1 — Making a User for Kafka

Making a special user for Kafka is recommended since it can handle requests made across a network. In the event that the Kafka server is compromised, this limits the impact to your Debian computer. In this phase, the dedicated user kafka will be created.

Create a user named kafka using the useradd command while logged in as your non-root sudo user:

sudo useradd kafka -m

The -m flag guarantees that the user's home directory will be generated. For later command execution, your workspace directory will be this home directory, /home/kafka.

Using passwd, set the password:

sudo passwd kafka

If you want to use a password for this user, enter it here.

To give the kafka user the permissions necessary to install Kafka's dependencies, use the adduser command to add the user to the sudo group:

sudo adduser kafka sudo

Your ready kafka user is now available. su is the password to enter this account.

su -l kafka

You can now download and extract the Kafka binaries after creating the user specifically for Kafka.

Step 2 — Download and Extract the Kafka Binaries

In this stage, the Kafka binaries will be downloaded and extracted into specific directories in your kafka user's home directory.

Make a directory called Downloads in /home/kafka to house your downloads to get started:

mkdir ~/Downloads

Then, in order to download files from remote locations, install curl using apt-get:

sudo apt-get update && sudo apt-get install curl

The curl download must be confirmed by typing Y when requested.

Use curl to get the Kafka binaries after it has been installed:

curl "https://downloads.apache.org/kafka/3.4.1/kafka_2.13-3.4.1.tgz" -o ~/Downloads/kafka.tgz

Make a directory called kafka and switch to it. This will be the Kafka installation's root directory:

mkdir ~/kafka && cd ~/kafka

Utilize the tar command to extract the archive you downloaded:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

In order to prevent the contents of the archive from being extracted into a subdirectory within ~/kafka/, such as ~/kafka/kafka_2.12-2.1.1/, you used the --strip 1 flag.

Once the binaries have been successfully downloaded and unpacked, you may proceed to setting up Kafka such that topics can be deleted.

Step 3 — Configuring the Kafka Server

By default, Kafka won't let us remove a topic, a category, a group, or the name of a feed where messages can be broadcast. You can alter this by editing the configuration file.

In server.properties, the setup options for Kafka are listed. Open this document in nano or any editor of your choice:

nano ~/kafka/config/server.properties

Let's include a setting that enables the removal of Kafka subjects. Add the file's bottom line, indicated as follows:

...
group.initial.rebalance.delay.ms

delete.topic.enable = true

After saving the file, close nano. Having configured Kafka, you can now build systemd unit files to enable and run Kafka when the computer starts up.

Step 4 — Generating Systemd Unit Files and Starting the Kafka Server

You will create systemd unit files in this part of the Kafka service. This will make it easier for you to manage Kafka in the same way you manage other Linux services by starting, halting, and resuming it frequently used service activities.

Kafka uses the service ZooKeeper to control the configuration and state of its clusters. It frequently serves as a crucial element in distributed systems. In this tutorial, you will learn how to utilize Zookeeper to manage these elements of Kafka. Visit the official ZooKeeper docs if you want to learn more about it.

Make the unit file for zookeeper first:

sudo nano /etc/systemd/system/zookeeper.service

Fill out the file with the following unit definition:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

ZooKeeper needs networking and a prepared filesystem before it can launch, according to the [Unit] section.

The [Service] section specifies that systemd should start and stop the service using the shell scripts zookeeper-server-start.sh and zookeeper-server-stop.sh. Additionally, it states that in the event of an abnormal exit, ZooKeeper must be automatically resumed.

The systemd service file for kafka should then be created:

sudo nano /etc/systemd/system/kafka.service

In the file, enter the following unit definition:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This unit file's dependence on zookeeper.service is specified in the [Unit] section. By doing this, it will be ensured that zookeeper is launched automatically when the kafka service is launched.

The [Service] section specifies that systemd should launch and terminate the service using the shell scripts kafka-server-start.sh and kafka-server-stop.sh. Additionally, it states that if Kafka terminates abnormally, it should be automatically resumed.

Once the units have been established, run the following command to launch Kafka:

sudo systemctl start kafka

Check the kafka unit's journal logs to make that the server has started successfully:

sudo journalctl -u kafka

Output will look somewhat like this:

Output
Mar 23 13:31:48 kafka systemd[1]: Started kafka.service.

As of right now, you have a Kafka server listening on port 9092, which is the protocol's default port.

Step 5 — Testing the Installation

To ensure the Kafka server is operating properly, let's publish and consume a Hello World message. For Kafka to publish messages, you need:

a producer that permits the distribution of information and records on issues.
a consumer who reads information from topics, including messages.

Start by typing the following in a topic called TutorialTopic:

~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

Using the script kafka-console-producer.sh, you can build a producer from the command line. The hostname, port, and topic name of the Kafka server are expected as inputs.

Type the following to publish the string Hello, World to the TutorialTopic topic:

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

The message will be sent to a list of message brokers, in this case localhost:9092, depending on the --broker-list flag. the topic is designated as TutorialTopic by the --topic flag.

The next step is to use the kafka-console-consumer.sh script to establish a Kafka consumer. It anticipates the hostname, port, and topic name of the ZooKeeper server as parameters.

The command that comes after takes in messages from TutorialTopic. Take note of how the --from-beginning signal is being used, allowing message consumption from before the consumer even started:

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server `localhost:9092` --topic TutorialTopic --from-beginning

A list of ingresses into the Kafka cluster is provided by the --bootstrap-server. You are currently utilizing localhost:9092.

In your terminal, you will see the sign Hello, World:

Output
Hello, World

While waiting for new messages to be published to the subject, the script will keep running. Open a new terminal and launch a producer if you'd want to broadcast a few additional messages. All of these ought to be seen in the output of the consumer. See the official Kafka documentation for further information on using the software.

To terminate the consumer script after testing is complete, press CTRL+C. You can now install KafkaT to enhance the administration of your Kafka cluster after testing the installation.

Step 6 — Installing KafkaT (Optional)

You can view information about your Kafka cluster and carry out specific administrative activities from the command line with the help of Airbnb's KafkaT tool. You need Ruby to utilize it because it is a Ruby gem. The other gems it depends on can only be built if you have the build-essential package. Employing apt, install them:

sudo apt install ruby ruby-dev build-essential

Utilizing the gem command, you can now set up KafkaT:

sudo CFLAGS=-Wno-error=format-overflow gem install kafkat

The KafkaT dependence ZooKeeper gem requires the CFLAGS=-Wno-error=format-overflow option to prevent format overflow warnings. ZooKeeper is a dependency of KafkaT.

The installation and log directories of your Kafka server are determined by the configuration file called .kafkatcfg, which is used by KafkaT. The entry pointing KafkaT to your ZooKeeper instance must also be present.

.kafkatcfg should be created as a new file.

nano ~/.kafkatcfg

To add the necessary lines to your code, include the following information about your Zookeeper instance and Kafka server:

{
  "kafka_path": "~/kafka",
  "log_path": "/tmp/kafka-logs",
  "zk_path": "localhost:2181"
}

You can now start using KafkaT. Here's an example of how you could use it to view information on all Kafka partitions:

kafkat partitions

The following output will appear:

Output
Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         0             0         [0]             [0]
__consumer_offsets	  0		        0		  [0]			  [0]
...

This output displays TutorialTopic as well as the internal Kafka topic __consumer_offsets, which is used to store client-related data. You may safely disregard lines beginning with __consumer_offsets.

Visit the KafkaT GitHub repository to find out more information.

After installing KafkaT, you can opportunistically configure Kafka on a group of Debian 10 computers to create a multi-node cluster.

Step 7 — Setting Up a Multi-Node Cluster (Optional)

Repeat Steps 1, 3, and 5 on every new computer if you want to build a multi-broker cluster utilizing more Debian 10 servers. Additionally, modify the server.properties file for each in ~/kafka/config/server.properties as follows:

Create a cluster-wide unique value for the broker.id property by changing its value. Any string may be used as the value of this field, which uniquely identifies each server in the cluster. An example of a good identification might be "server1", "server2", etc.
To ensure that every node points to the same ZooKeeper instance, modify the value of the zookeeper.connect attribute. The format of this property, which defines the address of the ZooKeeper instance, is <HOSTNAME/IP_ADDRESS>:<PORT>. You should use your_first_server_IP:2181 for this tutorial, substituting your Debian 10 server's IP address for your_first_server_IP.
The value of the zookeeper.connect attribute on each node should be a consistent, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances if you want to have more than one ZooKeeper instance for your cluster.

ℹ️

Note: Make careful to open port 2181 on the Debian 10 server with Zookeeper installed if you have a firewall running so that queries from other cluster nodes can come in.

Step 8 — Restricting the Kafka User

You can take away the admin privileges from the kafka user now that all the installations have been completed. Log out and back in as any other sudo user who isn't root before doing this. Type exit if you want to end the shell session you were using when you started this instruction.

kafka user should be removed from the sudo group:

sudo deluser kafka sudo

Use the passwd command to lock the kafka user's password to further increase the security of your Kafka server. This ensures that nobody may access the server using this account directly:

sudo passwd kafka -l

Only root or a sudo user can currently login as kafka by entering the command:

sudo su - kafka

In the future, use passwd with the -u option to unlock it:

sudo passwd kafka -u

The administrator privileges of the kafka user have now been successfully restricted.

FAQs on Installing Apache Kafka on Debian 10

Is Apache Kafka compatible with Debian 10?

Yes, Apache Kafka is compatible with Debian 10.

What are the system requirements to install Apache Kafka on Debian 10?

To install Apache Kafka on Debian 10, you need a 64-bit architecture, a Java runtime environment

Why is ZooKeeper required for Apache Kafka?

ZooKeeper is a centralized coordination service that Kafka uses for managing and maintaining its metadata and state information.

How can I install Java 8 on Debian 10?

To install Java 8 on Debian 10, you can use the apt package manager by running sudo apt update && sudo apt install openjdk-8-jre.

What are the steps to install Apache Kafka on Debian 10?

The installation process involves downloading and extracting Apache Kafka, configuring the ZooKeeper connection, and starting the Kafka server.

How can I start the ZooKeeper server for Apache Kafka on Debian 10?

You can start the ZooKeeper server by running the appropriate script included in the Apache Kafka distribution.

Do I need to configure any environment variables for Apache Kafka on Debian 10?

While not required, you may need to modify the Kafka configuration to suit your specific requirements.

Conclusion

By installing Apache Kafka, you gain the advantages of scalability, fault-tolerance, real-time stream processing, integration with the big data ecosystem

This tutorial covered frequently asked questions regarding the installation process.