Building a Distributed Logging System with Flume: A Step-by-Step Guide
Overview
In this article, we will walk you through the process of building a distributed logging system using Flume, a popular open-source data collection and aggregation tool. We will cover the installation, configuration, and testing of Flume on a CentOS 7.0 system with Java 1.8.
Prerequisites
- CentOS 7.0
- Java 1.8
- Flume 1.7.0
Step 1: Download and Install Flume
To download the latest version of Flume, visit the official website at http://flume.apache.org/download.html. At the time of writing, the current latest version is apache-flume-1.7.0-bin.tar.gz. Once downloaded, extract the file to the /usr/local
folder and rename it to flume170
.
Step 2: Configure the Environment
To configure the environment, we need to modify the Flume-env.sh
file to add the JAVA_HOME
variable. Set JAVA_HOME
to /usr/lib/jvm/java8
. We also need to set the global variables for Flume and add the FLUME
and PATH
variables to the profile
file.
Step 3: Verify the Installation
To verify that the installation was successful, run the following command:
flume-ng version
This should display the version of Flume installed on your system.
Step 4: Test a Small Example
To test a small example, we will create a spooling directory and configure the spool.conf
file to monitor the directory and read files from it. The spool.conf
file should contain the following configuration:
# Describe the source
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# Describe the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /usr/local/flume170/logs
a1.sources.r1.fileHeader = true
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
Start the Flume agent using the following command:
flume-ng agent -c /usr/local/flume170/conf -F /usr/local/flume170/conf/spool.conf -n a1 -Dflume.root.logger=INFO,console
Append a file to the /usr/local/flume170/logs
directory:
echo "spool test1" > /usr/local/flume170/logs/spool_text.log
In the console, you should see the following information:
14/08/10 11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
14/08/10 11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
14/08/10 11:37:14 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/flume170/logs/spool_text.log to /usr/local/flume170/logs/spool_text.log.COMPLETED
14/08/10 11:37:14 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown
14/08/10 11:37:14 INFO source.SpoolDirectorySource:. Spooling Directory Source runner has shutdown
14/08/10 11:37:14. INFO sink.LoggerSink: Event: {headers: {file = /usr/local/flume170/logs/spool_text.log} body: 73 70 6F 6F 6C 20 74 65 73 74 31 spool test1}
14/08/10 11:37:15 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown
14/08/10 11:37:15 INFO source.SpoolDirectorySource:. Spooling Directory Source runner has shutdown
14/08/10 11:37:16 INFO source.SpoolDirectorySource. : Spooling Directory Source runner has shutdown
14/08/10 11:37:16 INFO source..SpoolDirectorySource: Spooling Directory Source runner has shutdown
14/08/10 11:37:17 INFO source.SpoolDirectorySource:. Spooling Directory Source runner has shutdown
This shows that the Flume agent has successfully read the file from the spooling directory and sent it to the logger sink.
Conclusion
In this article, we have walked through the process of building a distributed logging system using Flume on a CentOS 7.0 system with Java 1.8. We have covered the installation, configuration, and testing of Flume, and demonstrated how to use the spool.conf
file to monitor a directory and read files from it. This is just a small example of what can be achieved with Flume, and we encourage you to explore its capabilities further.