Setting Up Hadoop on MacOS for Big Data Learning

Setting Up Hadoop on MacOS for Big Data Learning

Ready to Work: Installation Manual

This article provides a step-by-step guide to installing and configuring Hadoop on MacOS for big data learning. We will cover the installation of JDK, Hadoop, and configuration of core-site.xml, hdfs-site.xml, and yarn-site.xml.

Installation Manual

1. JDK Installation

First, let’s install JDK on our MacOS system. We will use the brew package manager to install JDK 1.7.0_80.

WZB-MacBook: 50_bigdata wangzhibin $ java -version
java version "1.7.0_80"
Java (TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot (TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

We will set the JAVA_HOME environment variable to the path of the installed JDK.

WZB-MacBook: 50_bigdata wangzhibin $ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

2. Download Hadoop

Next, we will download Hadoop 2.8.4 using wget.

WZB-MacBook: 50_bigdata wangzhibin $ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.8.4/hadoop-2.8.4.tar.gz

3. Installation and Configuration of Hadoop

Now, let’s install and configure Hadoop.

WZB-MacBook: hadoop-2.8.4 wangzhibin $ tar -zxvf hadoop-2.8.4.tar.gz

We will modify the JDK configuration by adding the following lines to the hadoop-env.sh file:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

4. Stand-alone Mode Execution

To execute Hadoop in stand-alone mode, we will create a directory called input and copy the necessary files into it.

WZB-MacBook: hadoop-2.8.4 wangzhibin $ mkdir input
WZB-MacBook: hadoop-2.8.4 wangzhibin $ cp etc/hadoop/*.xml input
WZB-MacBook: hadoop-2.8.4 wangzhibin $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-zA-Z.]+'

5. Configuring core-site.xml

We will create a new directory called hdfs/tmp and add the following configuration to the core-site.xml file:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ mkdir -p hdfs/tmp
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/core-site.xml
<Configuration>
  <Property>
    <Name>hadoop.tmp.dir</Name>
    <Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp</Value>
    <Description>A base for other temporary directories.</Description>
  </Property>
  <Property>
    <Name>fs.defaultFS</Name>
    <Value>hdfs://localhost:9000</Value>
  </Property>
</Configuration>

6. Configuring hdfs-site.xml

We will add the following configuration to the hdfs-site.xml file:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/hdfs-site.xml
<Configuration>
  <Property>
    <Name>dfs.replication</Name>
    <Value>1</Value>
  </Property>
  <Property>
    <Name>dfs.namenode.name.dir</Name>
    <Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name</Value>
  </Property>
  <Property>
    <Name>dfs.datanode.data.dir</Name>
    <Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/data</Value>
  </Property>
</Configuration>

7. Start and Stop Hadoop

To start and stop Hadoop, we will use the following commands:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./bin/hdfs namenode -format
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/start-dfs.sh
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/stop-dfs.sh

8. Configuration of .bash_profile

We will add the following lines to the .bash_profile file:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi ~/.bash_profile
# Set hadoop
export HADOOP_HOME=/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

9. Start HDFS

To start HDFS, we will use the following command:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/start-dfs.sh

10. Stop HDFS

To stop HDFS, we will use the following command:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/stop-dfs.sh

11. Verify HDFS

To verify HDFS, we will use the following command:

WZB-MacBook: hadoop wangzhibin $ hadoop fs -mkdir /test
WZB-MacBook: hadoop wangzhibin $ hadoop fs -ls /
Found 1 items
drwxr-xr-x - wangzhibin supergroup 0 2019-05-16 11:26 /test

12. Start YARN

To start YARN, we will use the following command:

WZB-MacBook: hadoop wangzhibin $ ./sbin/start-yarn.sh

13. Stop YARN

To stop YARN, we will use the following command:

WZB-MacBook: hadoop wangzhibin $ ./sbin/stop-yarn.sh

14. Configuration of mapred-site.xml and yarn-site.xml

We will add the following configuration to the mapred-site.xml and yarn-site.xml files:

WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/mapred-site.xml
<Configuration>
  <Property>
    <Name>mapreduce.framework.name</Name>
    <Value>yarn</Value>
  </Property>
</Configuration>

WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/yarn-site.xml
<Configuration>
  <Property>
    <Name>yarn.nodemanager.aux-services</Name>
    <Value>mapreduce_shuffle</Value>
  </Property>
</Configuration>

15. Start and Stop YARN

To start and stop YARN, we will use the following commands:

WZB-MacBook: hadoop wangzhibin $ ./sbin/start-yarn.sh
WZB-MacBook: hadoop wangzhibin $ ./sbin/stop-yarn.sh

16. Verify YARN

To verify YARN, we will use the following command:

WZB-MacBook: hadoop wangzhibin $ jps
534 NutstoreGUI
49135 DataNode
49834 ResourceManager
49234 SecondaryNameNode
49973 Jps
67596 4912 NodeManager
49057 NameNode

This concludes the installation and configuration of Hadoop on MacOS for big data learning.