Choose a different version or distribution
Introduction
Before we begin talking about how to install Apache Kafka on CentOS 7, let's briefly understand – What is Apache Kafka?
Apache Kafka is a distributed streaming platform that enables the real-time processing of high volumes of data streams. Installing Apache Kafka on CentOS 7 allows you to set up a scalable and fault-tolerant messaging system.
A publish/subscribe messaging system enables one or more producers to publish messages without taking into account their target market size or the recipients' processing capabilities.
This tutorial will help you through the process of installing Apache Kafka on CentOS 7. We will also address a few FAQs on how to install Apache Kafka on CentOS 7.
Advantages of Installing Apache Kafka on CentOS 7
- Scalability: Apache Kafka allows you to scale your data streams by distributing them across multiple broker nodes in a cluster, ensuring high throughput and low latency.
- Fault-Tolerance: Kafka's distributed nature and support for replication enable fault-tolerant data processing and resilience against hardware failures.
- Real-time Stream Processing: Apache Kafka provides a platform for building real-time stream processing systems, allowing you to react to events as they happen and process data in near real-time.
- Integration with Big Data Ecosystem: Kafka integrates well with other components of the big data ecosystem, such as Apache Spark, Apache Storm, and Hadoop, enabling seamless data ingestion and processing.
- Reliable Messaging System: Kafka offers persistent storage and durable message queues, ensuring reliable and guaranteed delivery of messages in the order they were sent.
Prerequisites
You'll need the following to follow along:
- One CentOS 7 server and a non-root user with sudo access.
- A server with a minimum of 4 GB of RAM. Installations with less RAM than this could result in the Kafka service failing and the Java virtual machine (JVM) producing an "Out Of Memory" exception when it first starts.
- Installed on your server is OpenJDK 8. Since Kafka is built in Java, it needs a JVM; nevertheless, a version detection bug in its startup shell script prevents it from starting with JVM versions higher than 8.
Step 1 — Making a User for Kafka
Kafka can handle network requests, thus you ought to make a special user for it. By doing this, the harm that a compromised Kafka server could cause to your CentOS machine is reduced. Once you have done configuring Kafka, you should create a distinct non-root account to carry out other operations on this server. We will create a dedicated kafka user in this step.
Using the useradd
command while logged in as your non-root sudo user, create a user named kafka:
sudo useradd kafka -m
A user's home directory will be created with the -m
flag enabled. For the purposes of running the commands in the sections below, our workspace directory will be this home directory, /home/kafka
.
passwd
is used to set the password.
sudo passwd kafka
Make sure the kafka user has the privileges necessary to install Kafka's dependencies by adding it to the wheel
group with the adduser
command:
sudo usermod -aG wheel kafka
You can now utilize your kafka user. Use su
to sign in to this account:
su -l kafka
We can now download and extract the Kafka binaries after creating the Kafka-specific user.
Step 2 — Downloading and Extracting the Kafka Binaries
Let's download and extract the Kafka binaries into certain folders in the home directory for our kafka user.
To begin, establish a directory called Downloads
in /home/kafka
to house your downloads:
mkdir ~/Downloads
To get the Kafka binaries, use curl
:
curl "https://downloads.apache.org/kafka/3.4.1/kafka_2.13-3.4.1.tgz" -o ~/Downloads/kafka.tgz
Make a directory called "kafka" and switch to it. The installation of Kafka will use this as its starting directory:
mkdir ~/kafka && cd ~/kafka
Utilize the tar
command to extract the archive you downloaded:
tar -xvzf ~/Downloads/kafka.tgz --strip 1
To guarantee that the contents of the archive are extracted in ~/kafka/
rather than another directory (like ~/kafka/kafka_2.11-2.1.1/
), we specify the --strip 1
flag.
After successfully downloading and extracting the binaries, we may proceed to setting up Kafka such that topics can be deleted.
Step 3 — Configuring the Kafka Server
By default, Kafka won't let us remove a topic, a category, a group, or the name of a feed where messages can be broadcast. Let's alter the configuration file to change this.
In server.properties
, the setup options for Kafka are listed. Open this document in vi
or your preferred editor:
vi ~/kafka/config/server.properties
Let's include a setting that enables the removal of Kafka subjects. To the bottom of the document, insert the following text by pressing the i
key:
delete.topic.enable = true
To exit insert mode, press ESC
and save your changes to the file, hit :wq
after you are done. Once Kafka has been configured, we can create systemd unit files to enable it at startup and run it.
Step 4 — Making Systemd Unit Files and Starting the Kafka Server
For the Kafka service, we will create systemd unit files in this part. We will be able to start, stop, and restart Kafka in a way that is consistent with other Linux services thanks to this.
Kafka manages the state and configurations of its clusters using the service Zookeeper. It is frequently employed in numerous distributed systems as a crucial element. Visit the official Zookeeper docs to learn more if you're interested.
For zookeeper
, create the unit file as follows:
sudo vi /etc/systemd/system/zookeeper.service
Put the unit definition listed below into the file:
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Zookeeper needs networking and the filesystem to be set up before it can begin, according to the [Unit]
section.
According to the [Service]
line, the zookeeper-server-start.sh
and zookeeper-server-stop
are required.Systemd should start and end the service using sh shell files, respectively. In addition, it mandates that Zookeeper must be automatically restarted in the event of an abnormal exit.
Close the file after you've finished editing.
Next, make the kafka
systemd service file:
sudo vi /etc/systemd/system/kafka.service
Put the unit definition listed below into the file:
[Unit]
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
This unit file's dependence on zookeeper.service
is specified in the [Unit]
section. This will guarantee that zookeeper
launches automatically when the kafa
service kicks off.
The shell scripts kafka-server-start.sh
and kafka-server-stop.sh
are to be used by systemd to start and stop the service, respectively, according to the [Service]
section.
When you're done editing, save the file and close it.
Once the units have been established, run the following command to launch Kafka:
sudo systemctl start kafka
Check the kafka
unit's journal logs to make that the server has started successfully:
journalctl -u kafka
You should observe output resembling this:
Output
Jul 17 18:38:59 kafka-centos systemd[1]: Started kafka.service.
Kafka server is currently up and listening on port 9092
.
The kafka
service is now running, however if we rebooted our server, it would not immediately start. Run the following command to make kafka
available at server startup:
sudo systemctl enable kafka
Let's check the installation now that we have started and enabled the services.
Step 5 — Testing the Installation
To check that the Kafka server is functioning properly, let's publish and consume a "Hello World" message. Message publication in Kafka requires:
- A producer who makes it possible for information on many issues to be published.
- A consumer who reads data and messages from topics.
First, type the following into a new subject called TutorialTopic
:
~/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic
The output will look like this:
Output
Created topic "TutorialTopic".
The script kafka-console-producer.sh
can be used to build a producer from the command line. As parameters, it anticipates receiving the hostname, port, and topic name of the Kafka server.
In the TutorialTopic
topic, enter the phrase "Hello, World"
by typing:
echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null
The kafka-console-consumer.sh
script may then be used to construct a Kafka consumer. It anticipates arguments for the hostname, port, and topic name of the ZooKeeper server.
Messages from TutorialTopic
are consumed by the next command. Please take note of the usage of the --from-beginning
flag, which enables the consumption of messages published before the consumer was started:
~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning
If there are no configuration problems, Hello, World
should appear in your terminal:
Output
Hello, World
The script will keep running while it waits for new messages to be posted to the subject. To publish a few more messages, feel free to open a new terminal and launch a producer. The output of the consumer ought to show them all.
Press CTRL+C
to end the consumer script after testing is complete. Let's install KafkaT now that the installation has been tested.
Step 6 — Installing KafkaT (Optional)
The Airbnb tool KafkaT makes it simpler for you to view information about your Kafka cluster and carry out specific administration activities from the command line. Since it is a Ruby gem, Ruby is required to utilize it. Build-related packages like make
and gcc
, as well as ruby-devel
, are required in order to construct the other dependent gems. Use yum to install them:
sudo yum install ruby ruby-devel make gcc patch
Currently, the gem command can be used to install KafkaT:
sudo gem install kafkat
The installation and log locations of your Kafka server are determined by KafkaT using the configuration file .kafkatcfg
. Additionally, it must have a line pointing KafkaT to your ZooKeeper instance.
Generate a new file with the extension .kafkatcfg
:
vi ~/.kafkatcfg
The following lines should be added to your code to provide the necessary details about your Zookeeper instance and Kafka server:
{
"kafka_path": "~/kafka",
"log_path": "/tmp/kafka-logs",
"zk_path": "localhost:2181"
}
When you're done making changes, save the file and close it.
Now that you are prepared, use KafkaT. Here's how you would use it to view information about all Kafka partitions, to start:
kafkat partitions
The output should resemble the following:
Output
Topic Partition Leader Replicas ISRs
TutorialTopic 0 0 [0] [0]
__consumer_offsets 0 0 [0] [0]
...
...
Both TutorialTopic
and __consumer_offsets
, a topic that Kafka uses internally to store information about clients, are visible. __consumer_offsets
prefixed lines can be safely ignored.
KafkaT's GitHub repository can be used to learn more about it.
Step 7 — Configuring a Multi-Node Cluster (Optional)
Steps 1, 4, and 5 should be repeated on every new CentOS 7 system if you want to build a multi-broker cluster employing more than one of those servers. The server.properties
file for every should also be modified with the following additions:
- The
broker.id
property's value has to be modified to make it distinct across the cluster. Any string may be used as the value of this field, which uniquely identifies each server in the cluster. Examples are"server1,"
"server2,"
etc. - To ensure that every node points to the same ZooKeeper instance, the value of the
zookeeper.connect
property should be modified. This field, which has the pattern<HOSTNAME/IP_ADDRESS>:<PORT>
, defines the address of the Zookeeper instance. Use"203.0.113.0:2181"
,"203.0.113.1:2181"
etc. as examples.
A comma-separated string containing the IP addresses and port numbers of each ZooKeeper instance should be the value of the zookeeper.connect
attribute on each node if you wish your cluster to have several ZooKeeper instances.
Step 8 — Restricting the Kafka User
The kafka user's admin credentials can now be removed as all installations have been completed. Before doing so, log out and then log back in as any other sudo user who isn't root. Type exit
to leave the shell session you were using to start this tutorial if it's still active.
kafka user should be removed from the sudo group:
sudo gpasswd -d kafka wheel
To better secure your Kafka server, use the passwd
command to lock the kafka user's password. This ensures that no one can access the server directly using this account:
sudo passwd kafka -l
Only root or a sudo
user can currently log in as kafka
by using the command:
sudo su - kafka
Use passwd
with the -u
option moving forward if you wish to unlock it:
sudo passwd kafka -u
You have now successfully limited the admin rights of the kafka user.
FAQs on Installing Apache Kafka on CentOS 7
Is Apache Kafka compatible with CentOS 7?
Yes, Apache Kafka is compatible with CentOS 7.
What are the system requirements to install Apache Kafka on CentOS 7?
To install Apache Kafka on CentOS 7, you need a 64-bit operating system, Java 8 or higher version, and a sufficient amount of memory and disk space.
How can I install Java 8 on CentOS 7?
To install Java 8 on CentOS 7, you can use the yum
package manager by running sudo yum install java-1.8.0-openjdk
.
What are the steps to install Apache Kafka on CentOS 7?
The installation process involves downloading and extracting Apache Kafka, configuring the ZooKeeper connection, and starting the Kafka server.
Do I need to configure any environment variables for Apache Kafka on CentOS 7?
It is not necessary to configure environment variables for Apache Kafka, but you might need to modify the Kafka configuration to suit your specific requirements.
How can I start the ZooKeeper server for Apache Kafka on CentOS 7?
You can start the ZooKeeper server by running the appropriate script included in the Apache Kafka distribution.
Can I run multiple Apache Kafka brokers on CentOS 7?
Yes, you can run multiple Kafka brokers to form a Kafka cluster for achieving higher availability, fault tolerance, and scalability.
Conclusion
Installing Apache Kafka on CentOS 7 enables you to set up a powerful and scalable distributed streaming platform for real-time data processing.
This tutorial covered frequently asked questions regarding the installation process.