Skip to main content

生产实习01

企业数据分析现状

现状分析->原因分析->预测分析

过去->现在的一个分析结果,提取价值,并对未来的预测

数据分析流程

数据收集->数据预处理->数据分析->数据原理->数据预测

hadoop安装与配置

详细见此hadoop安装与配置笔记即可。

一键启动脚本

请根据需求修改:software,hadoop,spark的可执行路径

#!/bin/bash
software="/export/server"
hadoop="$software/hadoop"
spark="$software/spark"
log_file="$software/logs/startlogs/$(date +%Y-%m-%d_%H-%M-%S).log"

if jps | grep -q "NameNode"; then
echo "Hadoop已经在运行"
else
echo "正在启动你的集群,默认软件路径为$software --"
sleep 1s
echo "正在启动hadoop------------------------"
start-dfs.sh 2>&1 | tee -a $log_file
fi

if jps | grep -q "ResourceManager"; then
echo "Yarn已经在运行"
else
echo "正在启动yarn--------------------------"
start-yarn.sh 2>&1 | tee -a $log_file
fi

if jps | grep -q "JobHistoryServer"; then
echo "历史服务器已经在运行"
else
echo "启动hadoop与yarn成功,正在启动历史服务器---"
mapred --daemon start historyserver 2>&1 | tee -a $log_file
fi

if jps | grep -q "Master"; then
echo "Spark已经在运行"
else
echo "正在启动spark-------------------------"
sleep 1s
$spark/sbin/start-history-server.sh 2>&1 | tee -a $log_file
$spark/sbin/start-all.sh 2>&1 | tee -a $log_file
fi
echo "集群全部已经启动成功!"

一键关闭脚本

请根据需求修改:software,hadoop,spark的可执行路径

#!/bin/bash

# Set the log file name
software="/export/server"
hadoop="$software/hadoop"
spark="$software/spark"
log_file="$software/logs/stoplogs/$(date +%Y-%m-%d_%H-%M-%S).log"

# Stop Spark and save the output to the log file and print to the console
if jps | grep -q "Master"; then
$spark/sbin/stop-all.sh 2>&1 | tee -a $log_file
$spark/sbin/stop-history-server.sh 2>&1 | tee -a $log_file
else
echo "Spark已经停止"
fi

# Stop MapReduce history server and save the output to the log file and print to the console
if jps | grep -q "JobHistoryServer"; then
mapred --daemon stop historyserver 2>&1 | tee -a $log_file
else
echo "历史服务器已经停止"
fi

# Stop YARN and save the output to the log file and print to the console
if jps | grep -q "ResourceManager"; then
stop-yarn.sh 2>&1 | tee -a $log_file
else
echo "Yarn已经停止"
fi

# Stop HDFS and save the output to the log file and print to the console
if jps | grep -q "NameNode"; then
stop-dfs.sh 2>&1 | tee -a $log_file
else
echo "Hadoop已经停止"
fi