在Ubuntu系统中安装Hadoop全分布式
Ubuntu安装Hadoop全分布式
基于 Ububtu Server 20.04.1 LTS 版本
Hadoop 3.13 版本
一、重新设置主机名
三台节点都要重新设置主机名
hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2
二、关闭防火墙
三台节点都要关闭防火墙
防火墙常用命令
Ubuntu安装防火墙 sudo apt-get install ufw -y
1. 查看防火墙开启状态 sudo ufw status
2. 开启某个端口(以8866为例) sudo ufw allow 8866
3. 开启防火墙 sudo ufw enable
4. 关闭防火墙 sudo ufw disable
5. 重启防火墙 sudo ufw reload
6. 禁止某个端口(以8866为例) sudo ufw delete allow 8866
7. 查看端口IP netstat -ltn
2.1 关闭防火墙
ufw disable
2.2 检查防火墙是否关闭
ufw status
三、设置IP映射
3.1 主节点配置 hosts 文件
三台节点都要配置
vim /etc/hosts
3.2 把三台节点的ip地址和主机名添加进去
10.211.55.50 master
10.211.55.51 slave1
10.211.55.52 slave2
3.3 测试IP映射配置
ping master
ping slave1
ping slave2
四、配置免密登录
4.1 每台节点上生成两个文件,一个公钥(id_rsa.pub),一个私钥(id_rsa)
ssh-keygen -t rsa
4.2 将公匙上传到主节点
注意:在每台机器上都要输入
ssh-copy-id master
4.3 分发 authorized_keys 文件
在master主机上把 authorized_keys 分发到slave1和slave2上
scp ~/.ssh/authorized_keys root@slave1:~/.ssh/
scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
4.4 测试免密登录到其他节点
ssh master
ssh slave1
ssh slave2
五、安装 JDK
5.1 解压JDK安装包
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /usr/local/src/
5.2 移动并重命名JDK包
mv /usr/local/src/jdk1.8.0_212 /usr/local/src/java
5.3 配置Java环境变量
vim /etc/profile
# JAVA_HOME
export JAVA_HOME=/usr/local/src/java
export PATH=$PATH:$JAVA_HOME/bin
export JRE_HOME=/usr/local/src/java/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
source /etc/profile
5.4 查看Java是否成功安装
java -version
六、安装Hadoop
6.1 解压Hadoop安装包
tar -zxvf hadoop-3.1.3.tar.gz -C /usr/local/src/
6.2 移动并重命名Hadoop包
mv /usr/local/src/hadoop-3.1.3 /usr/local/src/hadoop
6.3 配置Hadoop环境变量
vim /etc/profile
# HADOOP_HOME
export HADOOP_HOME=/usr/local/src/hadoop/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile
6.4 修改配置文件yarn-env.sh和hadoop-env.sh
6.4.1 修改yarn-env.sh
vim /usr/local/src/hadoop/etc/hadoop/yarn-env.sh
export JAVA_HOME=/usr/local/src/java
6.4.2 修改hadoop-env.sh
vim /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/src/java
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export HADOOP_SHELL_EXECNAME=root
6.5 测试Hadoop是否安装成功
hadoop version
七、配置Hadoop
Hadoop集群节点规划
常用端口号
HDFS NameNode 网页查看端口: http://master:9870
Yarn网页查看端口: http://master:8088/cluster
历史服务器网页查看端口:http://slave2:19888/jobhistory
Master
Slave1
Slave2
NameNode
/
/
DataNode
DataNode
DataNode
ResourceManager
/
/
NodeManager
NodeManager
NodeManager
/
/
JobHistoryServer
7.1 修改core-site.xml
vim /usr/local/src/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop/tmp</value>
</property>
</configuration>
7.2 修改 hdfs-site.xml
vim /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/src/hadoop/tmp/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/src/hadoop/tmp/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2:50090</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
7.3 修改 yarn-site.xml
vim /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/usr/local/src/hadoop/logs</value>
</property>
</configuration>
7.4 修改 mapred-site.xml
vim /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>slave2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>slave2:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/usr/local/src/hadoop/tmp/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/usr/local/src/hadoop/tmp/mr-history/done</value>
</property>
</configuration>
7.5 修改 workers
注意: 在 Hadoop3.0 以上的版本,使用的是 workers 配置文件,而在 Hadoop3.0 以下,使用的是 slaves 配置文件
vim /usr/local/src/hadoop/etc/hadoop/workers
master
slave1
slave2
八、同步节点数据
xsync /usr/local/src/hadoop
xsync /usr/local/src/java
xsync /etc/profile
xcall source /etc/profile
生成Java软链接,如果不这么做,会导致通过xcall查看集群状态是报错
ln -s /usr/local/src/java/bin/jps /usr/local/bin/jps
九、格式化及启动 Hadoop
9.1 格式化namenode
hdfs namenode -format
9.2 启动并查看jps
start-all.sh && xcall jps