Apache Kafka is a robust, distributed messaging system that enables real-time data streaming at scale. In this guide, you will learn how to install Kafka on an Ubuntu VPS and explore a practical use case, demonstrating Kafka's effectiveness in handling large-scale data pipelines.
10 min
Edited:12-10-2024
Installing Apache Kafka on Ubuntu VPS and Use It for Real-Time Data Streaming
Apache Kafka is a powerful, distributed platform designed for managing real-time data streams at scale. It serves as a messaging system that facilitates the seamless transfer of data between producers and consumers in various applications. This guide will walk you through the steps to install Kafka on an Ubuntu VPS, along with a practical example showcasing how Kafka efficiently handles large-scale data pipelines, enabling robust, high-throughput data processing across distributed environments.
Update Your Ubuntu System
Begin by ensuring your Ubuntu VPS is up to date:
1. sudo apt update
2. sudo apt upgrade
Install Java
Kafka requires Java, so you'll need to install the Java Development Kit (JDK). Run the following command:
1. sudo apt install openjdk-11-jdk -y
Verify the installation by checking the Java version:
1. java -version
You should see output similar to:
1. openjdk version "11.0.11"
Download and Install Apache Kafka
Download the latest stable version of Kafka from the official Apache Kafka website using the following command:
1. wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz
Extract the downloaded Kafka archive:
1. tar -xzf kafka_2.13-3.6.0.tgz
Move the extracted files to a more accessible directory:
1. sudo mv kafka_2.13-3.6.0 /opt/kafka
Configure Kafka Environment Variables
To make Kafka easier to run, add its binaries to your system's PATH. Edit your .bashrc file:
1. nano ~/.bashrc
Add the following lines at the end:
1. # Kafka Environment Variables
2. export KAFKA_HOME=/opt/kafka
3. export PATH=$PATH:$KAFKA_HOME/bin
Save and exit the file, then apply the changes:
1. source ~/.bashrc
Install and Configure Zookeeper
Kafka uses Zookeeper to maintain its distributed configuration. Zookeeper is included with Kafka, so you just need to start it. First, create a data directory for Zookeeper:
1. sudo mkdir -p /var/lib/zookeeper
Now, start Zookeeper with the following command:
1. zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
Zookeeper will now be running on port 2181 by default.
Start Apache Kafka
With Zookeeper running, you can now start Kafka. Use the following command:
1. kafka-server-start.sh $KAFKA_HOME/config/server.properties
Kafka will now be running on port 9092 by default.
Create Kafka Topics
Kafka organizes its messages into topics. You can create a topic using the following command:
1. kafka-topics.sh --create --topic test_topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
This creates a new Kafka topic named test_topic.
Test Kafka Installation
To test Kafka, open two terminal windows: one for producing messages and the other for consuming messages.
1- Terminal 1: Kafka Producer
Run the following command to start producing messages to the test_topic:
1. kafka-console-producer.sh --topic test_topic --bootstrap-server localhost:9092
Type messages in the terminal to send them to Kafka.
2- Terminal 2: Kafka Consumer
Run the following command in a separate terminal to consume messages from the test_topic:
1. kafka-console-consumer.sh --topic test_topic --from-beginning --bootstrap-server localhost:9092
You will see the messages from the producer terminal displayed here.
Use Case Example: Real-Time Log Monitoring with Kafka
A common use case for Kafka is real-time log monitoring. Here’s how you can implement it:
Install Logstash
Logstash is a data processing pipeline that can ingest data from various sources, process it, and send it to different destinations, such as Kafka.
Install Logstash using the following command:
1. sudo apt install logstash
Configure Logstash to Send Logs to Kafka
Create a Logstash configuration file that reads log files and sends them to Kafka. For example, create a file named logstash.conf:
1. nano logstash.conf
Add the following content:
1. input {
2. file {
3. path => "/var/log/syslog"
4. start_position => "beginning"
5. }
6. }
7.
8. output {
9. kafka {
10. bootstrap_servers => "localhost:9092"
11. topic_id => "logs_topic"
12. }
13. }
This configuration reads logs from /var/log/syslog and sends them to Kafka under the topic logs_topic.
Start Logstash
Start Logstash with your configuration:
1. sudo logstash -f logstash.conf
Logstash will now stream your server logs to Kafka.
Consume Logs from Kafka
You can consume logs from Kafka by running:
1. kafka-console-consumer.sh --topic logs_topic --bootstrap-server localhost:9092 --from-beginning
This will display the logs streamed by Logstash in real-time.
You now have Apache Kafka installed on your Ubuntu VPS, along with a practical use case of real-time log monitoring. Kafka’s ability to handle high-throughput, real-time data streams makes it ideal for applications like log analysis, monitoring, and event-driven architectures.
See more
14-10-2024
This article offers a detailed guide on installing and configuring IPTables on an Ubuntu VPS. IPTables is a powerful firewall tool that helps secure your server by controlling inbound and outbound traffic. Learn how to set up rules for traffic filtering, configure basic security policies, and apply custom rules to protect your VPS.
IPtables
security
12 min
This article offers a comprehensive guide on installing and configuring ModSecurity, a powerful web application firewall (WAF), on an Ubuntu VPS. Learn how to secure your server by filtering and monitoring HTTP requests, set up ModSecurity with Nginx or Apache, and apply rules to protect against common web attacks.
Modsecurity
security
10 min
14-10-2024
This article provides a comprehensive guide on installing and configuring PHP-FPM (FastCGI Process Manager) on an Ubuntu VPS. Learn how to optimize PHP performance for your web applications by configuring PHP-FPM with Nginx or Apache, managing pools, and fine-tuning settings for efficient processing of PHP scripts.
PHP-FPM
speed
optimise
12 min