Hadoop 3 Single-Node Install Guide

$i fi done unset i fi export HADOOP_HOME=/opt/hadoop export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/opt/hive/bin:/opt/spark/bin:/opt/presto/bin export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export SPARK_HOME=/opt/spark export SPARK_CONF_DIR=/opt/spark/conf export SPARK_MASTER_HOST=localhost export JAVA_HOME=/usr/lib/jvm/java-8-oracle $ sudo ln -sf /etc/profile /root/.bashrc $ source /etc/profile Downloading Hadoop, Hive, Spark & Presto Im going to install all the software under the /opt directory and store HDFS underlying data there as well..Below will create the folders with a single command..$ sudo mkdir -p /opt/{hadoop,hdfs/{datanode,namenode},hive,presto/{etc/catalog,data},spark} The layout of the folders looks like the following../opt/ ├── hadoop ├── hdfs │   ├── datanode │   └── namenode ├── hive ├── presto │   ├── data │   └── etc │   └── catalog └── spark The following downloads Hadoop, Hive, Spark & Presto..$ DIST=http://www-eu.apache.org/dist $ wget -c -O hadoop.tar.gz $DIST/hadoop/common/hadoop-3.0.3/hadoop-3.0.3.tar.gz $ wget -c -O hive.tar.gz $DIST/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz $ wget -c -O spark.tgz $DIST/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz $ wget -c -O presto.tar.gz https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.196/presto-server-0.196.tar.gz $ wget -c -O presto-cli.jar https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.196/presto-cli-0.196-executable.jar The binary release of Hadoop 3 is 293 MB compressed..Its decompressed size is 733 MB with 400 MB of small documentation files that take a long time to decompress..For this reason Ill skip these files..Hive has a large number of unit test files which are excluded from decompression as well..$ sudo tar xvf hadoop.tar.gz –directory=/opt/hadoop –exclude=hadoop-3.0.3/share/doc –strip 1 $ sudo tar xvf hive.tar.gz –directory=/opt/hive –exclude=apache-hive-2.3.3-bin/ql/src/test –strip 1 $ sudo tar xzvf spark.tgz –directory=/opt/spark –strip 1 $ sudo tar xvf presto.tar.gz –directory=/opt/presto –strip 1 Ill move Prestos CLI into its binary folder and make sure its executable..$ sudo mv presto-cli.jar /opt/presto/bin/presto $ sudo chmod +x /opt/presto/bin/presto Configuring Hadoop This will be a single machine installation so the master and slave nodes list for Hadoop will just be localhost..$ sudo vi /opt/hadoop/etc/hadoop/master localhost $ sudo vi /opt/hadoop/etc/hadoop/slaves localhost Ill discuss HDFS in greater detail later in this blog post but essentially it is the file system you most commonly use with Hadoop when not working in the Cloud..Below Ill create two configuration files with overrides needed for HDFS.. More details

Leave a Reply