How To Install Apache Kafka on Debian 10
Choose a different version or distribution
Introduction
Before we begin talking about how to install apache kafka on debian 10, let's briefly understand – What is Apache Kafka?
Apache Kafka is a distributed streaming platform that enables real-time processing of high volumes of data streams. Installing Apache Kafka on Debian 10 allows you to set up a scalable and fault-tolerant messaging system.
A publish/subscribe messaging system enables one or more producers to publish messages without taking into account their target market size or the recipients' processing capabilities.
The goal of this tutorial is to teach you how to securely install and set up Apache Kafka 2.1.1 on a Debian 10 server.
Advantages of Installing Apache Kafka on Debian 10
- Scalability: Apache Kafka allows you to scale your data streams by distributing them across multiple broker nodes in a cluster, ensuring high throughput and low latency.
- Fault-Tolerance: Kafka's distributed nature and support for replication enable fault-tolerant data processing and resilience against hardware failures.
- Real-time Stream Processing: Apache Kafka provides a platform for building real-time stream processing systems, allowing you to react to events as they happen and process data in near real-time.
- Integration with Big Data Ecosystem: Kafka integrates well with other components of the big data ecosystem, such as Apache Spark, Apache Storm, and Hadoop, enabling seamless data ingestion and processing.
- Reliable Messaging System: Kafka offers persistent storage and durable message queues, ensuring reliable and guaranteed delivery of messages in the order they were sent
Prerequisites
You'll need the following to follow along:
- A single Debian 10 server with a minimum of 4 GB of RAM and a non-root user with sudo access. If you do not already have a non-root user set up, proceed with the instructions in our Initial Server Setup tutorial for Debian 10.
- Installed on your server is OpenJDK 11. Follow the directions for installing certain versions of OpenJDK in How To Install Java with Apt on Debian 10 to install this version. Java is used to write Kafka.
Out Of Memory
upon startup.Step 1 — Making a User for Kafka
Making a special user for Kafka is recommended since it can handle requests made across a network. In the event that the Kafka server is compromised, this limits the impact to your Debian computer. In this phase, the dedicated user kafka will be created.
Create a user named kafka
using the useradd
command while logged in as your non-root sudo user:
sudo useradd kafka -m
The -m
flag guarantees that the user's home directory will be generated. For later command execution, your workspace directory will be this home directory, /home/kafka
.
Using passwd
, set the password:
sudo passwd kafka
If you want to use a password for this user, enter it here.
To give the kafka user the permissions necessary to install Kafka's dependencies, use the adduser
command to add the user to the sudo
group:
sudo adduser kafka sudo
Your ready kafka user is now available. su
is the password to enter this account.
su -l kafka
You can now download and extract the Kafka binaries after creating the user specifically for Kafka.
Step 2 — Download and Extract the Kafka Binaries
In this stage, the Kafka binaries will be downloaded and extracted into specific directories in your kafka user's home directory.
Make a directory called Downloads
in /home/kafka
to house your downloads to get started:
mkdir ~/Downloads
Then, in order to download files from remote locations, install curl
using apt-get
:
sudo apt-get update && sudo apt-get install curl
The curl
download must be confirmed by typing Y
when requested.
Use curl
to get the Kafka binaries after it has been installed:
curl "https://downloads.apache.org/kafka/3.4.1/kafka_2.13-3.4.1.tgz" -o ~/Downloads/kafka.tgz
Make a directory called kafka
and switch to it. This will be the Kafka installation's root directory:
mkdir ~/kafka && cd ~/kafka
Utilize the tar
command to extract the archive you downloaded:
tar -xvzf ~/Downloads/kafka.tgz --strip 1
In order to prevent the contents of the archive from being extracted into a subdirectory within ~/kafka/
, such as ~/kafka/kafka_2.12-2.1.1/
, you used the --strip 1
flag.
Once the binaries have been successfully downloaded and unpacked, you may proceed to setting up Kafka such that topics can be deleted.
Step 3 — Configuring the Kafka Server
By default, Kafka won't let us remove a topic, a category, a group, or the name of a feed where messages can be broadcast. You can alter this by editing the configuration file.
In server.properties
, the setup options for Kafka are listed. Open this document in nano
or any editor of your choice:
nano ~/kafka/config/server.properties
Let's include a setting that enables the removal of Kafka subjects. Add the file's bottom line, indicated as follows:
...
group.initial.rebalance.delay.ms
delete.topic.enable = true
After saving the file, close nano
. Having configured Kafka, you can now build systemd
unit files to enable and run Kafka when the computer starts up.
Step 4 — Generating Systemd Unit Files and Starting the Kafka Server
You will create systemd
unit files in this part of the Kafka service. This will make it easier for you to manage Kafka in the same way you manage other Linux services by starting, halting, and resuming it frequently used service activities.
Kafka uses the service ZooKeeper to control the configuration and state of its clusters. It frequently serves as a crucial element in distributed systems. In this tutorial, you will learn how to utilize Zookeeper to manage these elements of Kafka. Visit the official ZooKeeper docs if you want to learn more about it.
Make the unit file for zookeeper
first:
sudo nano /etc/systemd/system/zookeeper.service
Fill out the file with the following unit definition:
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
ZooKeeper needs networking and a prepared filesystem before it can launch, according to the [Unit]
section.
The [Service]
section specifies that systemd
should start and stop the service using the shell scripts zookeeper-server-start.sh
and zookeeper-server-stop.sh
. Additionally, it states that in the event of an abnormal exit, ZooKeeper must be automatically resumed.
The systemd
service file for kafka
should then be created:
sudo nano /etc/systemd/system/kafka.service
In the file, enter the following unit definition:
[Unit]
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
This unit file's dependence on zookeeper.service
is specified in the [Unit]
section. By doing this, it will be ensured that zookeeper
is launched automatically when the kafka
service is launched.
The [Service]
section specifies that systemd
should launch and terminate the service using the shell scripts kafka-server-start.sh
and kafka-server-stop.sh
. Additionally, it states that if Kafka terminates abnormally, it should be automatically resumed.
Once the units have been established, run the following command to launch Kafka:
sudo systemctl start kafka
Check the kafka
unit's journal logs to make that the server has started successfully:
sudo journalctl -u kafka
Output will look somewhat like this:
Output
Mar 23 13:31:48 kafka systemd[1]: Started kafka.service.
As of right now, you have a Kafka server listening on port 9092
, which is the protocol's default port.
Step 5 — Testing the Installation
To ensure the Kafka server is operating properly, let's publish and consume a Hello World
message. For Kafka to publish messages, you need:
- a producer that permits the distribution of information and records on issues.
- a consumer who reads information from topics, including messages.
Start by typing the following in a topic called TutorialTopic
:
~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic
Using the script kafka-console-producer.sh
, you can build a producer from the command line. The hostname, port, and topic name of the Kafka server are expected as inputs.
Type the following to publish the string Hello, World
to the TutorialTopic
topic:
echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null
The message will be sent to a list of message brokers, in this case localhost:9092
, depending on the --broker-list
flag. the topic is designated as TutorialTopic
by the --topic
flag.
The next step is to use the kafka-console-consumer.sh
script to establish a Kafka consumer. It anticipates the hostname, port, and topic name of the ZooKeeper server as parameters.
The command that comes after takes in messages from TutorialTopic
. Take note of how the --from-beginning
signal is being used, allowing message consumption from before the consumer even started:
~/kafka/bin/kafka-console-consumer.sh --bootstrap-server `localhost:9092` --topic TutorialTopic --from-beginning
A list of ingresses into the Kafka cluster is provided by the --bootstrap-server
. You are currently utilizing localhost:9092
.
In your terminal, you will see the sign Hello, World
:
Output
Hello, World
While waiting for new messages to be published to the subject, the script will keep running. Open a new terminal and launch a producer if you'd want to broadcast a few additional messages. All of these ought to be seen in the output of the consumer. See the official Kafka documentation for further information on using the software.
To terminate the consumer script after testing is complete, press CTRL+C
. You can now install KafkaT to enhance the administration of your Kafka cluster after testing the installation.
Step 6 — Installing KafkaT (Optional)
You can view information about your Kafka cluster and carry out specific administrative activities from the command line with the help of Airbnb's KafkaT tool. You need Ruby to utilize it because it is a Ruby gem. The other gems it depends on can only be built if you have the build-essential
package. Employing apt
, install them:
sudo apt install ruby ruby-dev build-essential
Utilizing the gem
command, you can now set up KafkaT:
sudo CFLAGS=-Wno-error=format-overflow gem install kafkat
The KafkaT dependence ZooKeeper gem requires the CFLAGS=-Wno-error=format-overflow
option to prevent format overflow warnings. ZooKeeper is a dependency of KafkaT.
The installation and log directories of your Kafka server are determined by the configuration file called .kafkatcfg
, which is used by KafkaT. The entry pointing KafkaT to your ZooKeeper instance must also be present.
.kafkatcfg
should be created as a new file.
nano ~/.kafkatcfg
To add the necessary lines to your code, include the following information about your Zookeeper instance and Kafka server:
{
"kafka_path": "~/kafka",
"log_path": "/tmp/kafka-logs",
"zk_path": "localhost:2181"
}
You can now start using KafkaT. Here's an example of how you could use it to view information on all Kafka partitions:
kafkat partitions
The following output will appear:
Output
Topic Partition Leader Replicas ISRs
TutorialTopic 0 0 [0] [0]
__consumer_offsets 0 0 [0] [0]
...
This output displays TutorialTopic
as well as the internal Kafka topic __consumer_offsets
, which is used to store client-related data. You may safely disregard lines beginning with __consumer_offsets
.
Visit the KafkaT GitHub repository to find out more information.
After installing KafkaT, you can opportunistically configure Kafka on a group of Debian 10 computers to create a multi-node cluster.
Step 7 — Setting Up a Multi-Node Cluster (Optional)
Repeat Steps 1, 3, and 5 on every new computer if you want to build a multi-broker cluster utilizing more Debian 10 servers. Additionally, modify the server.properties file for each in ~/kafka/config/server.properties
as follows:
- Create a cluster-wide unique value for the
broker.id
property by changing its value. Any string may be used as the value of this field, which uniquely identifies each server in the cluster. An example of a good identification might be"server1"
,"server2"
, etc. - To ensure that every node points to the same ZooKeeper instance, modify the value of the
zookeeper.connect
attribute. The format of this property, which defines the address of the ZooKeeper instance, is<HOSTNAME/IP_ADDRESS>:<PORT>
. You should useyour_first_server_IP:2181
for this tutorial, substituting your Debian 10 server's IP address foryour_first_server_IP
. - The value of the
zookeeper.connect
attribute on each node should be a consistent, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances if you want to have more than one ZooKeeper instance for your cluster.
2181
on the Debian 10 server with Zookeeper installed if you have a firewall running so that queries from other cluster nodes can come in.Step 8 — Restricting the Kafka User
You can take away the admin privileges from the kafka
user now that all the installations have been completed. Log out and back in as any other sudo user who isn't root before doing this. Type exit
if you want to end the shell session you were using when you started this instruction.
kafka
user should be removed from the sudo group:
sudo deluser kafka sudo
Use the passwd
command to lock the kafka
user's password to further increase the security of your Kafka server. This ensures that nobody may access the server using this account directly:
sudo passwd kafka -l
Only root or a sudo user can currently login as kafka
by entering the command:
sudo su - kafka
In the future, use passwd
with the -u
option to unlock it:
sudo passwd kafka -u
The administrator privileges of the kafka
user have now been successfully restricted.
FAQs on Installing Apache Kafka on Debian 10
Is Apache Kafka compatible with Debian 10?
Yes, Apache Kafka is compatible with Debian 10.
What are the system requirements to install Apache Kafka on Debian 10?
To install Apache Kafka on Debian 10, you need a 64-bit architecture, a Java runtime environment
Why is ZooKeeper required for Apache Kafka?
ZooKeeper is a centralized coordination service that Kafka uses for managing and maintaining its metadata and state information.
How can I install Java 8 on Debian 10?
To install Java 8 on Debian 10, you can use the apt
package manager by running sudo apt update && sudo apt install openjdk-8-jre
.
What are the steps to install Apache Kafka on Debian 10?
The installation process involves downloading and extracting Apache Kafka, configuring the ZooKeeper connection, and starting the Kafka server.
How can I start the ZooKeeper server for Apache Kafka on Debian 10?
You can start the ZooKeeper server by running the appropriate script included in the Apache Kafka distribution.
Do I need to configure any environment variables for Apache Kafka on Debian 10?
While not required, you may need to modify the Kafka configuration to suit your specific requirements.
Conclusion
By installing Apache Kafka, you gain the advantages of scalability, fault-tolerance, real-time stream processing, integration with the big data ecosystem
This tutorial covered frequently asked questions regarding the installation process.