Setting Up Hadoop on MacOS for Big Data Learning
Ready to Work: Installation Manual
This article provides a step-by-step guide to installing and configuring Hadoop on MacOS for big data learning. We will cover the installation of JDK, Hadoop, and configuration of core-site.xml, hdfs-site.xml, and yarn-site.xml.
Installation Manual
1. JDK Installation
First, let’s install JDK on our MacOS system. We will use the brew package manager to install JDK 1.7.0_80.
WZB-MacBook: 50_bigdata wangzhibin $ java -version
java version "1.7.0_80"
Java (TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot (TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
We will set the JAVA_HOME environment variable to the path of the installed JDK.
WZB-MacBook: 50_bigdata wangzhibin $ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home
2. Download Hadoop
Next, we will download Hadoop 2.8.4 using wget.
WZB-MacBook: 50_bigdata wangzhibin $ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.8.4/hadoop-2.8.4.tar.gz
3. Installation and Configuration of Hadoop
Now, let’s install and configure Hadoop.
WZB-MacBook: hadoop-2.8.4 wangzhibin $ tar -zxvf hadoop-2.8.4.tar.gz
We will modify the JDK configuration by adding the following lines to the hadoop-env.sh file:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home
4. Stand-alone Mode Execution
To execute Hadoop in stand-alone mode, we will create a directory called input and copy the necessary files into it.
WZB-MacBook: hadoop-2.8.4 wangzhibin $ mkdir input
WZB-MacBook: hadoop-2.8.4 wangzhibin $ cp etc/hadoop/*.xml input
WZB-MacBook: hadoop-2.8.4 wangzhibin $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-zA-Z.]+'
5. Configuring core-site.xml
We will create a new directory called hdfs/tmp and add the following configuration to the core-site.xml file:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ mkdir -p hdfs/tmp
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/core-site.xml
<Configuration>
<Property>
<Name>hadoop.tmp.dir</Name>
<Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp</Value>
<Description>A base for other temporary directories.</Description>
</Property>
<Property>
<Name>fs.defaultFS</Name>
<Value>hdfs://localhost:9000</Value>
</Property>
</Configuration>
6. Configuring hdfs-site.xml
We will add the following configuration to the hdfs-site.xml file:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/hdfs-site.xml
<Configuration>
<Property>
<Name>dfs.replication</Name>
<Value>1</Value>
</Property>
<Property>
<Name>dfs.namenode.name.dir</Name>
<Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name</Value>
</Property>
<Property>
<Name>dfs.datanode.data.dir</Name>
<Value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/data</Value>
</Property>
</Configuration>
7. Start and Stop Hadoop
To start and stop Hadoop, we will use the following commands:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./bin/hdfs namenode -format
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/start-dfs.sh
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/stop-dfs.sh
8. Configuration of .bash_profile
We will add the following lines to the .bash_profile file:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi ~/.bash_profile
# Set hadoop
export HADOOP_HOME=/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
9. Start HDFS
To start HDFS, we will use the following command:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/start-dfs.sh
10. Stop HDFS
To stop HDFS, we will use the following command:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ ./sbin/stop-dfs.sh
11. Verify HDFS
To verify HDFS, we will use the following command:
WZB-MacBook: hadoop wangzhibin $ hadoop fs -mkdir /test
WZB-MacBook: hadoop wangzhibin $ hadoop fs -ls /
Found 1 items
drwxr-xr-x - wangzhibin supergroup 0 2019-05-16 11:26 /test
12. Start YARN
To start YARN, we will use the following command:
WZB-MacBook: hadoop wangzhibin $ ./sbin/start-yarn.sh
13. Stop YARN
To stop YARN, we will use the following command:
WZB-MacBook: hadoop wangzhibin $ ./sbin/stop-yarn.sh
14. Configuration of mapred-site.xml and yarn-site.xml
We will add the following configuration to the mapred-site.xml and yarn-site.xml files:
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/mapred-site.xml
<Configuration>
<Property>
<Name>mapreduce.framework.name</Name>
<Value>yarn</Value>
</Property>
</Configuration>
WZB-MacBook: hadoop-2.8.4 wangzhibin $ vi etc/hadoop/yarn-site.xml
<Configuration>
<Property>
<Name>yarn.nodemanager.aux-services</Name>
<Value>mapreduce_shuffle</Value>
</Property>
</Configuration>
15. Start and Stop YARN
To start and stop YARN, we will use the following commands:
WZB-MacBook: hadoop wangzhibin $ ./sbin/start-yarn.sh
WZB-MacBook: hadoop wangzhibin $ ./sbin/stop-yarn.sh
16. Verify YARN
To verify YARN, we will use the following command:
WZB-MacBook: hadoop wangzhibin $ jps
534 NutstoreGUI
49135 DataNode
49834 ResourceManager
49234 SecondaryNameNode
49973 Jps
67596 4912 NodeManager
49057 NameNode
This concludes the installation and configuration of Hadoop on MacOS for big data learning.