How To Install Apache Kafka on CentOS 7

Choose a different version or distribution

Introduction

Before we begin talking about how to install Apache Kafka on CentOS 7, let's briefly understand – What is Apache Kafka?

Apache Kafka is a distributed streaming platform that enables the real-time processing of high volumes of data streams. Installing Apache Kafka on CentOS 7 allows you to set up a scalable and fault-tolerant messaging system.

A publish/subscribe messaging system enables one or more producers to publish messages without taking into account their target market size or the recipients' processing capabilities.

This tutorial will help you through the process of installing Apache Kafka on CentOS 7. We will also address a few FAQs on how to install Apache Kafka on CentOS 7.

Advantages of Installing Apache Kafka on CentOS 7

Scalability: Apache Kafka allows you to scale your data streams by distributing them across multiple broker nodes in a cluster, ensuring high throughput and low latency.
Fault-Tolerance: Kafka's distributed nature and support for replication enable fault-tolerant data processing and resilience against hardware failures.
Real-time Stream Processing: Apache Kafka provides a platform for building real-time stream processing systems, allowing you to react to events as they happen and process data in near real-time.
Integration with Big Data Ecosystem: Kafka integrates well with other components of the big data ecosystem, such as Apache Spark, Apache Storm, and Hadoop, enabling seamless data ingestion and processing.
Reliable Messaging System: Kafka offers persistent storage and durable message queues, ensuring reliable and guaranteed delivery of messages in the order they were sent.

Prerequisites

You'll need the following to follow along:

One CentOS 7 server and a non-root user with sudo access.
A server with a minimum of 4 GB of RAM. Installations with less RAM than this could result in the Kafka service failing and the Java virtual machine (JVM) producing an "Out Of Memory" exception when it first starts.
Installed on your server is OpenJDK 8. Since Kafka is built in Java, it needs a JVM; nevertheless, a version detection bug in its startup shell script prevents it from starting with JVM versions higher than 8.

Step 1 — Making a User for Kafka

Kafka can handle network requests, thus you ought to make a special user for it. By doing this, the harm that a compromised Kafka server could cause to your CentOS machine is reduced. Once you have done configuring Kafka, you should create a distinct non-root account to carry out other operations on this server. We will create a dedicated kafka user in this step.

Using the useradd command while logged in as your non-root sudo user, create a user named kafka:

sudo useradd kafka -m

A user's home directory will be created with the -m flag enabled. For the purposes of running the commands in the sections below, our workspace directory will be this home directory, /home/kafka.

passwd is used to set the password.

sudo passwd kafka

Make sure the kafka user has the privileges necessary to install Kafka's dependencies by adding it to the wheel group with the adduser command:

sudo usermod -aG wheel kafka

You can now utilize your kafka user. Use su to sign in to this account:

su -l kafka

We can now download and extract the Kafka binaries after creating the Kafka-specific user.

Step 2 — Downloading and Extracting the Kafka Binaries

Let's download and extract the Kafka binaries into certain folders in the home directory for our kafka user.

To begin, establish a directory called Downloads in /home/kafka to house your downloads:

mkdir ~/Downloads

To get the Kafka binaries, use curl:

curl "https://downloads.apache.org/kafka/3.4.1/kafka_2.13-3.4.1.tgz" -o ~/Downloads/kafka.tgz

Make a directory called "kafka" and switch to it. The installation of Kafka will use this as its starting directory:

mkdir ~/kafka && cd ~/kafka

Utilize the tar command to extract the archive you downloaded:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

To guarantee that the contents of the archive are extracted in ~/kafka/ rather than another directory (like ~/kafka/kafka_2.11-2.1.1/), we specify the --strip 1 flag.

After successfully downloading and extracting the binaries, we may proceed to setting up Kafka such that topics can be deleted.

Step 3 — Configuring the Kafka Server

By default, Kafka won't let us remove a topic, a category, a group, or the name of a feed where messages can be broadcast. Let's alter the configuration file to change this.

In server.properties, the setup options for Kafka are listed. Open this document in vi or your preferred editor:

vi ~/kafka/config/server.properties

Let's include a setting that enables the removal of Kafka subjects. To the bottom of the document, insert the following text by pressing the i key:

delete.topic.enable = true

To exit insert mode, press ESC and save your changes to the file, hit :wq after you are done. Once Kafka has been configured, we can create systemd unit files to enable it at startup and run it.

Step 4 — Making Systemd Unit Files and Starting the Kafka Server

For the Kafka service, we will create systemd unit files in this part. We will be able to start, stop, and restart Kafka in a way that is consistent with other Linux services thanks to this.

Kafka manages the state and configurations of its clusters using the service Zookeeper. It is frequently employed in numerous distributed systems as a crucial element. Visit the official Zookeeper docs to learn more if you're interested.

For zookeeper, create the unit file as follows:

sudo vi /etc/systemd/system/zookeeper.service

Put the unit definition listed below into the file:

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Zookeeper needs networking and the filesystem to be set up before it can begin, according to the [Unit] section.

According to the [Service] line, the zookeeper-server-start.sh and zookeeper-server-stop are required.Systemd should start and end the service using sh shell files, respectively. In addition, it mandates that Zookeeper must be automatically restarted in the event of an abnormal exit.

Close the file after you've finished editing.

Next, make the kafka systemd service file:

sudo vi /etc/systemd/system/kafka.service

Put the unit definition listed below into the file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

This unit file's dependence on zookeeper.service is specified in the [Unit] section. This will guarantee that zookeeper launches automatically when the kafa service kicks off.

The shell scripts kafka-server-start.sh and kafka-server-stop.sh are to be used by systemd to start and stop the service, respectively, according to the [Service] section.

When you're done editing, save the file and close it.

Once the units have been established, run the following command to launch Kafka:

sudo systemctl start kafka

Check the kafka unit's journal logs to make that the server has started successfully:

journalctl -u kafka

You should observe output resembling this:

Output
Jul 17 18:38:59 kafka-centos systemd[1]: Started kafka.service.

Kafka server is currently up and listening on port 9092.

The kafka service is now running, however if we rebooted our server, it would not immediately start. Run the following command to make kafka available at server startup:

sudo systemctl enable kafka

Let's check the installation now that we have started and enabled the services.

Step 5 — Testing the Installation

To check that the Kafka server is functioning properly, let's publish and consume a "Hello World" message. Message publication in Kafka requires:

A producer who makes it possible for information on many issues to be published.
A consumer who reads data and messages from topics.

First, type the following into a new subject called TutorialTopic:

~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

The output will look like this:

Output
Created topic "TutorialTopic".

The script kafka-console-producer.sh can be used to build a producer from the command line. As parameters, it anticipates receiving the hostname, port, and topic name of the Kafka server.

In the TutorialTopic topic, enter the phrase "Hello, World" by typing:

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

The kafka-console-consumer.sh script may then be used to construct a Kafka consumer. It anticipates arguments for the hostname, port, and topic name of the ZooKeeper server.

Messages from TutorialTopic are consumed by the next command. Please take note of the usage of the --from-beginning flag, which enables the consumption of messages published before the consumer was started:

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

If there are no configuration problems, Hello, World should appear in your terminal:

Output
Hello, World

The script will keep running while it waits for new messages to be posted to the subject. To publish a few more messages, feel free to open a new terminal and launch a producer. The output of the consumer ought to show them all.

Press CTRL+C to end the consumer script after testing is complete. Let's install KafkaT now that the installation has been tested.

Step 6 — Installing KafkaT (Optional)

The Airbnb tool KafkaT makes it simpler for you to view information about your Kafka cluster and carry out specific administration activities from the command line. Since it is a Ruby gem, Ruby is required to utilize it. Build-related packages like make and gcc, as well as ruby-devel, are required in order to construct the other dependent gems. Use yum to install them:

sudo yum install ruby ruby-devel make gcc patch

Currently, the gem command can be used to install KafkaT:

sudo gem install kafkat

The installation and log locations of your Kafka server are determined by KafkaT using the configuration file .kafkatcfg. Additionally, it must have a line pointing KafkaT to your ZooKeeper instance.

Generate a new file with the extension .kafkatcfg:

vi ~/.kafkatcfg

The following lines should be added to your code to provide the necessary details about your Zookeeper instance and Kafka server:

{
  "kafka_path": "~/kafka",
  "log_path": "/tmp/kafka-logs",
  "zk_path": "localhost:2181"
}

When you're done making changes, save the file and close it.

Now that you are prepared, use KafkaT. Here's how you would use it to view information about all Kafka partitions, to start:

kafkat partitions

The output should resemble the following:

Output
Topic                 Partition   Leader      Replicas        ISRs    
TutorialTopic         0             0         [0]             [0]
__consumer_offsets	  0		        0		  [0]							[0]
...
...

Both TutorialTopic and __consumer_offsets, a topic that Kafka uses internally to store information about clients, are visible. __consumer_offsets prefixed lines can be safely ignored.

KafkaT's GitHub repository can be used to learn more about it.

Step 7 — Configuring a Multi-Node Cluster (Optional)

Steps 1, 4, and 5 should be repeated on every new CentOS 7 system if you want to build a multi-broker cluster employing more than one of those servers. The server.properties file for every should also be modified with the following additions:

The broker.id property's value has to be modified to make it distinct across the cluster. Any string may be used as the value of this field, which uniquely identifies each server in the cluster. Examples are "server1," "server2," etc.
To ensure that every node points to the same ZooKeeper instance, the value of the zookeeper.connect property should be modified. This field, which has the pattern <HOSTNAME/IP_ADDRESS>:<PORT>, defines the address of the Zookeeper instance. Use "203.0.113.0:2181", "203.0.113.1:2181" etc. as examples.

A comma-separated string containing the IP addresses and port numbers of each ZooKeeper instance should be the value of the zookeeper.connect attribute on each node if you wish your cluster to have several ZooKeeper instances.

Step 8 — Restricting the Kafka User

The kafka user's admin credentials can now be removed as all installations have been completed. Before doing so, log out and then log back in as any other sudo user who isn't root. Type exit to leave the shell session you were using to start this tutorial if it's still active.

kafka user should be removed from the sudo group:

sudo gpasswd -d kafka wheel

To better secure your Kafka server, use the passwd command to lock the kafka user's password. This ensures that no one can access the server directly using this account:

sudo passwd kafka -l

Only root or a sudo user can currently log in as kafka by using the command:

sudo su - kafka

Use passwd with the -u option moving forward if you wish to unlock it:

sudo passwd kafka -u

You have now successfully limited the admin rights of the kafka user.

FAQs on Installing Apache Kafka on CentOS 7

Is Apache Kafka compatible with CentOS 7?

Yes, Apache Kafka is compatible with CentOS 7.

What are the system requirements to install Apache Kafka on CentOS 7?

To install Apache Kafka on CentOS 7, you need a 64-bit operating system, Java 8 or higher version, and a sufficient amount of memory and disk space.

How can I install Java 8 on CentOS 7?

To install Java 8 on CentOS 7, you can use the yum package manager by running sudo yum install java-1.8.0-openjdk.

What are the steps to install Apache Kafka on CentOS 7?

The installation process involves downloading and extracting Apache Kafka, configuring the ZooKeeper connection, and starting the Kafka server.

Do I need to configure any environment variables for Apache Kafka on CentOS 7?

It is not necessary to configure environment variables for Apache Kafka, but you might need to modify the Kafka configuration to suit your specific requirements.