2.1 环境准备
2.1.1 系统要求
硬件要求: - CPU:8核心以上 - 内存:16GB以上(推荐32GB) - 存储:100GB以上可用空间 - 网络:千兆以太网
软件要求: - 操作系统:CentOS 7+、Ubuntu 18.04+、RHEL 7+ - Java:JDK 8(推荐Oracle JDK或OpenJDK) - Python:2.7或3.6+
依赖组件版本兼容性:
Kylin版本 | Hadoop | Hive | HBase | Spark |
---|---|---|---|---|
4.0.x | 2.8+ | 1.2+ | 2.0+ | 2.4+ |
3.1.x | 2.7+ | 1.2+ | 1.1+ | 2.3+ |
3.0.x | 2.7+ | 1.2+ | 1.1+ | 2.3+ |
2.6.x | 2.7+ | 1.2+ | 1.1+ | 2.1+ |
2.1.2 环境检查脚本
#!/bin/bash
# kylin_env_check.sh - Kylin环境检查脚本
echo "=== Apache Kylin 环境检查 ==="
# 检查操作系统
echo "1. 操作系统信息:"
cat /etc/os-release | grep -E "NAME|VERSION"
echo
# 检查Java版本
echo "2. Java版本检查:"
if command -v java &> /dev/null; then
java -version
echo "JAVA_HOME: $JAVA_HOME"
else
echo "错误:未找到Java,请安装JDK 8"
fi
echo
# 检查内存
echo "3. 系统内存:"
free -h
echo
# 检查磁盘空间
echo "4. 磁盘空间:"
df -h
echo
# 检查网络
echo "5. 网络连接:"
ping -c 3 google.com > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "网络连接正常"
else
echo "网络连接异常"
fi
echo
# 检查Hadoop组件
echo "6. Hadoop生态组件检查:"
for cmd in hadoop hive hbase; do
if command -v $cmd &> /dev/null; then
echo "✓ $cmd 已安装"
$cmd version 2>/dev/null | head -1
else
echo "✗ $cmd 未安装"
fi
done
echo "=== 环境检查完成 ==="
2.2 依赖组件安装
2.2.1 Java环境配置
安装OpenJDK 8:
# CentOS/RHEL
sudo yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y openjdk-8-jdk
# 配置JAVA_HOME
echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk' >> ~/.bashrc
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
# 验证安装
java -version
javac -version
2.2.2 Hadoop集群部署
下载和安装Hadoop:
# 下载Hadoop 2.10.1
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar -xzf hadoop-2.10.1.tar.gz
sudo mv hadoop-2.10.1 /opt/hadoop
sudo chown -R $USER:$USER /opt/hadoop
# 配置环境变量
echo 'export HADOOP_HOME=/opt/hadoop' >> ~/.bashrc
echo 'export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> ~/.bashrc
echo 'export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH' >> ~/.bashrc
source ~/.bashrc
Hadoop配置文件:
core-site.xml
:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/data/datanode</value>
</property>
</configuration>
yarn-site.xml
:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
</configuration>
启动Hadoop:
# 格式化HDFS
hdfs namenode -format
# 启动HDFS
start-dfs.sh
# 启动YARN
start-yarn.sh
# 验证服务
jps
hadoop fs -ls /
2.2.3 Hive安装配置
下载和安装Hive:
# 下载Hive 2.3.9
wget https://archive.apache.org/dist/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
tar -xzf apache-hive-2.3.9-bin.tar.gz
sudo mv apache-hive-2.3.9-bin /opt/hive
sudo chown -R $USER:$USER /opt/hive
# 配置环境变量
echo 'export HIVE_HOME=/opt/hive' >> ~/.bashrc
echo 'export PATH=$HIVE_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
Hive配置:
hive-site.xml
:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive123</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
</configuration>
初始化Hive:
# 下载MySQL JDBC驱动
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.33.tar.gz
tar -xzf mysql-connector-java-8.0.33.tar.gz
cp mysql-connector-java-8.0.33/mysql-connector-java-8.0.33.jar $HIVE_HOME/lib/
# 初始化schema
schematool -dbType mysql -initSchema
# 启动Hive Metastore
nohup hive --service metastore &
# 测试Hive
hive -e "SHOW DATABASES;"
2.2.4 HBase安装配置
下载和安装HBase:
# 下载HBase 2.4.17
wget https://archive.apache.org/dist/hbase/2.4.17/hbase-2.4.17-bin.tar.gz
tar -xzf hbase-2.4.17-bin.tar.gz
sudo mv hbase-2.4.17 /opt/hbase
sudo chown -R $USER:$USER /opt/hbase
# 配置环境变量
echo 'export HBASE_HOME=/opt/hbase' >> ~/.bashrc
echo 'export PATH=$HBASE_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
HBase配置:
hbase-site.xml
:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/hbase/zookeeper</value>
</property>
</configuration>
启动HBase:
# 启动HBase
start-hbase.sh
# 验证HBase
hbase shell
# 在HBase shell中执行:
# list
# exit
2.3 Kylin安装
2.3.1 下载Kylin
# 下载Apache Kylin 4.0.3
wget https://archive.apache.org/dist/kylin/apache-kylin-4.0.3/apache-kylin-4.0.3-bin.tar.gz
tar -xzf apache-kylin-4.0.3-bin.tar.gz
sudo mv apache-kylin-4.0.3-bin /opt/kylin
sudo chown -R $USER:$USER /opt/kylin
# 配置环境变量
echo 'export KYLIN_HOME=/opt/kylin' >> ~/.bashrc
echo 'export PATH=$KYLIN_HOME/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
2.3.2 Kylin配置
主配置文件 kylin.properties
:
# Kylin服务器配置
kylin.server.mode=all
kylin.server.cluster-servers=localhost:7070
# 元数据存储
kylin.metadata.url=kylin_metadata@hbase
# 作业引擎配置
kylin.engine.default=2
kylin.engine.spark-conf.spark.master=local[*]
kylin.engine.spark-conf.spark.submit.deployMode=client
kylin.engine.spark-conf.spark.driver.memory=4G
kylin.engine.spark-conf.spark.executor.memory=4G
kylin.engine.spark-conf.spark.executor.cores=4
kylin.engine.spark-conf.spark.executor.instances=1
# 存储引擎配置
kylin.storage.default=2
kylin.storage.hbase.cluster-fs=hdfs://localhost:9000
# Web服务配置
kylin.web.timezone=GMT+8
kylin.web.cross-domain-enabled=true
# 安全配置
kylin.security.profile=testing
kylin.security.ldap.connection-server=
kylin.security.ldap.connection-username=
kylin.security.ldap.connection-password=
# 查询配置
kylin.query.cache-enabled=true
kylin.query.large-query-threshold=1000000
kylin.query.timeout-seconds=300
# 构建配置
kylin.job.scheduler.default=0
kylin.job.max-concurrent-jobs=10
kylin.job.retry=3
环境配置文件 setenv.sh
:
#!/bin/bash
# Java配置
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
# Hadoop配置
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# Hive配置
export HIVE_HOME=/opt/hive
export HIVE_CONF=$HIVE_HOME/conf
# HBase配置
export HBASE_HOME=/opt/hbase
export HBASE_CONF_DIR=$HBASE_HOME/conf
# Spark配置(如果使用独立Spark)
# export SPARK_HOME=/opt/spark
# export SPARK_CONF_DIR=$SPARK_HOME/conf
# Kylin配置
export KYLIN_JVM_SETTINGS="-Xms1G -Xmx8G -XX:+UseG1GC"
export KYLIN_DEBUG_SETTINGS="-Dkylin.hdfs.working.dir=/kylin"
# 添加依赖JAR
export KYLIN_EXTRA_START_OPTS="
-Dhdp.version=2.6.5.0-292
-Djava.security.krb5.realm=
-Djava.security.krb5.kdc=
-Djava.security.auth.login.config=
"
2.3.3 初始化Kylin
检查环境:
# 检查Kylin环境
kylin.sh org.apache.kylin.common.util.CheckEnv
# 如果检查通过,会显示:
# Checking environment...
# KYLIN_HOME is set to /opt/kylin
# java version "1.8.0_XXX"
# HADOOP_HOME is set to /opt/hadoop
# HIVE_HOME is set to /opt/hive
# HBASE_HOME is set to /opt/hbase
# Environment check passed.
创建HDFS工作目录:
# 创建Kylin工作目录
hadoop fs -mkdir -p /kylin
hadoop fs -chmod 777 /kylin
# 验证目录创建
hadoop fs -ls /
初始化元数据:
# 初始化Kylin元数据表
kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default
# 检查HBase中的Kylin表
echo "list" | hbase shell
2.4 启动和验证
2.4.1 启动Kylin服务
启动脚本:
#!/bin/bash
# start_kylin.sh - Kylin启动脚本
echo "启动Apache Kylin服务..."
# 检查依赖服务
echo "检查依赖服务状态..."
if ! jps | grep -q NameNode; then
echo "错误:HDFS NameNode未运行"
exit 1
fi
if ! jps | grep -q HMaster; then
echo "错误:HBase Master未运行"
exit 1
fi
# 启动Kylin
echo "启动Kylin服务..."
kylin.sh start
# 等待服务启动
echo "等待服务启动..."
sleep 30
# 检查服务状态
if jps | grep -q KylinLauncher; then
echo "✓ Kylin服务启动成功"
echo "Web UI: http://localhost:7070/kylin"
echo "默认用户名/密码: ADMIN/KYLIN"
else
echo "✗ Kylin服务启动失败"
echo "请检查日志: $KYLIN_HOME/logs/kylin.log"
fi
执行启动:
# 启动Kylin
kylin.sh start
# 检查进程
jps | grep Kylin
# 检查日志
tail -f $KYLIN_HOME/logs/kylin.log
2.4.2 Web UI访问
访问地址: - URL: http://localhost:7070/kylin - 默认用户名: ADMIN - 默认密码: KYLIN
首次登录检查项: 1. 系统信息页面显示正常 2. 数据源连接状态正常 3. 作业引擎状态正常 4. 存储引擎状态正常
2.4.3 命令行验证
基本命令测试:
# 检查Kylin版本
kylin.sh version
# 检查系统状态
kylin.sh org.apache.kylin.tool.DiagnosisInfoCLI
# 列出项目
curl -X GET \
-H "Authorization: Basic QURNSU46S1lMSU4=" \
-H "Content-Type: application/json" \
"http://localhost:7070/kylin/api/projects"
健康检查脚本:
#!/bin/bash
# kylin_health_check.sh - Kylin健康检查
echo "=== Kylin健康检查 ==="
# 检查进程
echo "1. 进程检查:"
if jps | grep -q KylinLauncher; then
echo "✓ Kylin进程运行正常"
else
echo "✗ Kylin进程未运行"
exit 1
fi
# 检查端口
echo "2. 端口检查:"
if netstat -tlnp | grep -q :7070; then
echo "✓ Kylin端口7070监听正常"
else
echo "✗ Kylin端口7070未监听"
fi
# 检查Web服务
echo "3. Web服务检查:"
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:7070/kylin)
if [ "$response" = "200" ]; then
echo "✓ Web服务响应正常"
else
echo "✗ Web服务响应异常: $response"
fi
# 检查API服务
echo "4. API服务检查:"
api_response=$(curl -s -X GET \
-H "Authorization: Basic QURNSU46S1lMSU4=" \
-H "Content-Type: application/json" \
"http://localhost:7070/kylin/api/user/authentication" \
-w "%{http_code}")
if echo "$api_response" | grep -q "200"; then
echo "✓ API服务正常"
else
echo "✗ API服务异常"
fi
echo "=== 健康检查完成 ==="
2.5 集群部署
2.5.1 集群架构设计
推荐集群配置:
┌─────────────────────────────────────────────────────────────┐
│ Kylin集群架构 │
├─────────────────────────────────────────────────────────────┤
│ Load Balancer (Nginx/HAProxy) │
├─────────────────────────────────────────────────────────────┤
│ Kylin Node 1 │ Kylin Node 2 │ Kylin Node 3 │
│ (Query + Job) │ (Query + Job) │ (Query Only) │
├─────────────────────────────────────────────────────────────┤
│ Hadoop Cluster (HDFS + YARN) │
├─────────────────────────────────────────────────────────────┤
│ HBase Cluster │
├─────────────────────────────────────────────────────────────┤
│ Hive Metastore │
└─────────────────────────────────────────────────────────────┘
2.5.2 节点配置
节点1配置(kylin-node1):
# kylin.properties for node1
kylin.server.mode=all
kylin.server.cluster-servers=kylin-node1:7070,kylin-node2:7070,kylin-node3:7070
kylin.server.cluster-name=kylin-cluster
# 作业调度节点
kylin.job.scheduler.default=0
kylin.job.max-concurrent-jobs=5
# 查询节点
kylin.query.server.enabled=true
节点2配置(kylin-node2):
# kylin.properties for node2
kylin.server.mode=all
kylin.server.cluster-servers=kylin-node1:7070,kylin-node2:7070,kylin-node3:7070
kylin.server.cluster-name=kylin-cluster
# 作业调度节点
kylin.job.scheduler.default=0
kylin.job.max-concurrent-jobs=5
# 查询节点
kylin.query.server.enabled=true
节点3配置(kylin-node3):
# kylin.properties for node3
kylin.server.mode=query
kylin.server.cluster-servers=kylin-node1:7070,kylin-node2:7070,kylin-node3:7070
kylin.server.cluster-name=kylin-cluster
# 仅查询节点
kylin.job.scheduler.default=-1
kylin.query.server.enabled=true
2.5.3 负载均衡配置
Nginx配置:
# /etc/nginx/conf.d/kylin.conf
upstream kylin_cluster {
server kylin-node1:7070 weight=3;
server kylin-node2:7070 weight=3;
server kylin-node3:7070 weight=4;
# 健康检查
check interval=3000 rise=2 fall=5 timeout=1000;
}
server {
listen 80;
server_name kylin.example.com;
location / {
proxy_pass http://kylin_cluster;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时设置
proxy_connect_timeout 30s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
# 健康检查端点
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
2.6 性能调优
2.6.1 JVM调优
Kylin JVM参数:
# setenv.sh中的JVM配置
export KYLIN_JVM_SETTINGS="
-Xms8G
-Xmx16G
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=16m
-XX:+G1UseAdaptiveIHOP
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-Xloggc:$KYLIN_HOME/logs/kylin-gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=64M
"
2.6.2 系统调优
Linux系统参数:
# /etc/sysctl.conf
# 网络优化
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 5000
# 文件描述符
fs.file-max = 2097152
# 虚拟内存
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
# 应用配置
sysctl -p
用户限制配置:
# /etc/security/limits.conf
kylin soft nofile 65536
kylin hard nofile 65536
kylin soft nproc 32768
kylin hard nproc 32768
2.7 故障排除
2.7.1 常见问题
问题1:Kylin启动失败
# 检查日志
tail -f $KYLIN_HOME/logs/kylin.log
# 常见原因:
# 1. Java版本不兼容
# 2. Hadoop/HBase服务未启动
# 3. 端口被占用
# 4. 权限问题
# 解决方法:
# 检查Java版本
java -version
# 检查服务状态
jps
# 检查端口
netstat -tlnp | grep 7070
# 检查权限
ls -la $KYLIN_HOME
问题2:Web UI无法访问
# 检查防火墙
sudo firewall-cmd --list-ports
sudo firewall-cmd --add-port=7070/tcp --permanent
sudo firewall-cmd --reload
# 检查网络连接
telnet localhost 7070
# 检查Kylin配置
grep -i "server.port" $KYLIN_HOME/conf/kylin.properties
问题3:构建任务失败
# 检查YARN资源
yarn application -list
yarn logs -applicationId application_xxx
# 检查HDFS空间
hadoop fs -df -h
# 检查HBase状态
echo "status" | hbase shell
2.7.2 日志分析
日志文件位置:
# Kylin主日志
$KYLIN_HOME/logs/kylin.log
# 安全日志
$KYLIN_HOME/logs/security.log
# 查询日志
$KYLIN_HOME/logs/kylin-query.log
# GC日志
$KYLIN_HOME/logs/kylin-gc.log
日志分析脚本:
#!/bin/bash
# analyze_kylin_logs.sh
LOG_DIR="$KYLIN_HOME/logs"
DATE=$(date +%Y-%m-%d)
echo "=== Kylin日志分析 ($DATE) ==="
# 错误统计
echo "1. 错误统计:"
grep -i "error" $LOG_DIR/kylin.log | wc -l
echo " ERROR数量: $(grep -i 'error' $LOG_DIR/kylin.log | wc -l)"
echo " WARN数量: $(grep -i 'warn' $LOG_DIR/kylin.log | wc -l)"
# 最近错误
echo "2. 最近错误:"
grep -i "error" $LOG_DIR/kylin.log | tail -5
# 查询统计
echo "3. 查询统计:"
if [ -f "$LOG_DIR/kylin-query.log" ]; then
echo " 今日查询数: $(grep "$DATE" $LOG_DIR/kylin-query.log | wc -l)"
echo " 平均响应时间: $(grep "$DATE" $LOG_DIR/kylin-query.log | grep -o 'duration:[0-9]*' | cut -d: -f2 | awk '{sum+=$1; count++} END {if(count>0) print sum/count "ms"}')"
fi
# GC统计
echo "4. GC统计:"
if [ -f "$LOG_DIR/kylin-gc.log" ]; then
echo " GC次数: $(grep "GC" $LOG_DIR/kylin-gc.log | wc -l)"
echo " 平均GC时间: $(grep "GC" $LOG_DIR/kylin-gc.log | grep -o '[0-9]*\.[0-9]*secs' | cut -d's' -f1 | awk '{sum+=$1; count++} END {if(count>0) print sum/count "s"}')"
fi
echo "=== 分析完成 ==="
2.8 本章小结
本章详细介绍了Apache Kylin的安装部署过程:
核心内容: 1. 环境准备:系统要求、依赖检查、环境配置 2. 组件安装:Hadoop、Hive、HBase的安装配置 3. Kylin部署:单机和集群部署方案 4. 性能调优:JVM调优、系统参数优化 5. 故障排除:常见问题解决和日志分析
关键要点: 1. 确保所有依赖组件版本兼容 2. 正确配置环境变量和路径 3. 合理分配系统资源 4. 建立完善的监控和日志机制
下一章预告: 下一章将介绍Kylin的基本概念和术语,包括数据模型、维度、度量等核心概念的详细解释。
2.9 练习与思考
实践练习
- 在虚拟机中搭建完整的Kylin环境
- 编写自动化部署脚本
- 配置Kylin集群并测试负载均衡
思考题
- 如何根据业务需求选择合适的硬件配置?
- 在生产环境中如何保证Kylin的高可用性?
- 如何监控和优化Kylin的性能?