Apache Kafka is a robust, distributed messaging system that enables real-time data streaming at scale. In this guide, you will learn how to install Kafka on an Ubuntu VPS and explore a practical use case, demonstrating Kafka's effectiveness in handling large-scale data pipelines.

10 min

Edited:12-10-2024

Installing Apache Kafka on Ubuntu VPS and Use It for Real-Time Data Streaming

Apache Kafka is a powerful, distributed platform designed for managing real-time data streams at scale. It serves as a messaging system that facilitates the seamless transfer of data between producers and consumers in various applications. This guide will walk you through the steps to install Kafka on an Ubuntu VPS, along with a practical example showcasing how Kafka efficiently handles large-scale data pipelines, enabling robust, high-throughput data processing across distributed environments.

How to Install Apache Kafka on Ubuntu VPS and Use It for Real-Time Data Streaming/image/2

Update Your Ubuntu System

Begin by ensuring your Ubuntu VPS is up to date:

1. sudo apt update

2. sudo apt upgrade

Install Java

Kafka requires Java, so you'll need to install the Java Development Kit (JDK). Run the following command:

1. sudo apt install openjdk-11-jdk -y

Verify the installation by checking the Java version:

1. java -version

You should see output similar to:

1. openjdk version "11.0.11"

Download and Install Apache Kafka

Download the latest stable version of Kafka from the official Apache Kafka website using the following command:

1. wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz

Extract the downloaded Kafka archive:

1. tar -xzf kafka_2.13-3.6.0.tgz

Move the extracted files to a more accessible directory:

1. sudo mv kafka_2.13-3.6.0 /opt/kafka

Configure Kafka Environment Variables

To make Kafka easier to run, add its binaries to your system's PATH. Edit your .bashrc file:

1. nano ~/.bashrc

Add the following lines at the end:

1. # Kafka Environment Variables

2. export KAFKA_HOME=/opt/kafka

3. export PATH=$PATH:$KAFKA_HOME/bin

Save and exit the file, then apply the changes:

1. source ~/.bashrc

Install and Configure Zookeeper

Kafka uses Zookeeper to maintain its distributed configuration. Zookeeper is included with Kafka, so you just need to start it. First, create a data directory for Zookeeper:

1. sudo mkdir -p /var/lib/zookeeper

Now, start Zookeeper with the following command:

1. zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

Zookeeper will now be running on port 2181 by default.

Start Apache Kafka

With Zookeeper running, you can now start Kafka. Use the following command:

1. kafka-server-start.sh $KAFKA_HOME/config/server.properties

Kafka will now be running on port 9092 by default.

Create Kafka Topics

Kafka organizes its messages into topics. You can create a topic using the following command:

1. kafka-topics.sh --create --topic test_topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

This creates a new Kafka topic named test_topic.

Test Kafka Installation

To test Kafka, open two terminal windows: one for producing messages and the other for consuming messages.

1- Terminal 1: Kafka Producer

Run the following command to start producing messages to the test_topic:

1. kafka-console-producer.sh --topic test_topic --bootstrap-server localhost:9092

Type messages in the terminal to send them to Kafka.

2- Terminal 2: Kafka Consumer

Run the following command in a separate terminal to consume messages from the test_topic:

1. kafka-console-consumer.sh --topic test_topic --from-beginning --bootstrap-server localhost:9092

You will see the messages from the producer terminal displayed here.

Use Case Example: Real-Time Log Monitoring with Kafka

A common use case for Kafka is real-time log monitoring. Here’s how you can implement it:

Install Logstash

Logstash is a data processing pipeline that can ingest data from various sources, process it, and send it to different destinations, such as Kafka.

Install Logstash using the following command:

1. sudo apt install logstash

Configure Logstash to Send Logs to Kafka

Create a Logstash configuration file that reads log files and sends them to Kafka. For example, create a file named logstash.conf:

1. nano logstash.conf

Add the following content:

1. input {

2. file {

3. path => "/var/log/syslog"

4. start_position => "beginning"

5. }

6. }

7.

8. output {

9. kafka {

10. bootstrap_servers => "localhost:9092"

11. topic_id => "logs_topic"

12. }

13. }

This configuration reads logs from /var/log/syslog and sends them to Kafka under the topic logs_topic.

Start Logstash

Start Logstash with your configuration:

1. sudo logstash -f logstash.conf

Logstash will now stream your server logs to Kafka.

Consume Logs from Kafka

You can consume logs from Kafka by running:

1. kafka-console-consumer.sh --topic logs_topic --bootstrap-server localhost:9092 --from-beginning

This will display the logs streamed by Logstash in real-time.

You now have Apache Kafka installed on your Ubuntu VPS, along with a practical use case of real-time log monitoring. Kafka’s ability to handle high-throughput, real-time data streams makes it ideal for applications like log analysis, monitoring, and event-driven architectures.

See more

This article offers a detailed guide on installing and configuring IPTables on an Ubuntu VPS. IPTables is a powerful firewall tool that helps secure your server by controlling inbound and outbound traffic. Learn how to set up rules for traffic filtering, configure basic security policies, and apply custom rules to protect your VPS.

IPtables

security

12 min

This article offers a comprehensive guide on installing and configuring ModSecurity, a powerful web application firewall (WAF), on an Ubuntu VPS. Learn how to secure your server by filtering and monitoring HTTP requests, set up ModSecurity with Nginx or Apache, and apply rules to protect against common web attacks.

Modsecurity

security

10 min

This article provides a comprehensive guide on installing and configuring PHP-FPM (FastCGI Process Manager) on an Ubuntu VPS. Learn how to optimize PHP performance for your web applications by configuring PHP-FPM with Nginx or Apache, managing pools, and fine-tuning settings for efficient processing of PHP scripts.

PHP-FPM

speed

optimise

12 min

How to Install Apache Kafka on Ubuntu VPS and Use It for Real-Time Data Streaming