本章概述
Helm Hooks 是一种强大的机制,允许我们在 Chart 安装、升级、删除等生命周期的特定时刻执行自定义操作。通过 Hooks,我们可以实现数据库迁移、配置初始化、清理任务等复杂的部署逻辑。本章将深入探讨 Helm Hooks 的使用方法和最佳实践。
学习目标
- 理解 Helm Hooks 的概念和作用
- 掌握不同类型的 Hook 及其触发时机
- 学会创建和配置各种 Hook 资源
- 了解 Hook 的执行顺序和权重管理
- 掌握 Hook 的错误处理和重试机制
- 学习 Hook 在实际场景中的应用
- 了解 Hook 的调试和故障排除方法
5.1 Hooks 概述
5.1.1 什么是 Helm Hooks
Helm Hooks 是在 Chart 生命周期的特定时刻执行的 Kubernetes 资源。它们允许我们在安装、升级、删除等操作的前后执行自定义逻辑,如数据库迁移、配置验证、清理任务等。
graph TB
A[helm install] --> B[pre-install hooks]
B --> C[install resources]
C --> D[post-install hooks]
E[helm upgrade] --> F[pre-upgrade hooks]
F --> G[upgrade resources]
G --> H[post-upgrade hooks]
I[helm delete] --> J[pre-delete hooks]
J --> K[delete resources]
K --> L[post-delete hooks]
subgraph "Hook 类型"
M[pre-install]
N[post-install]
O[pre-upgrade]
P[post-upgrade]
Q[pre-delete]
R[post-delete]
S[pre-rollback]
T[post-rollback]
U[test]
end
5.1.2 Hook 的优势
- 生命周期控制:在部署过程的关键节点执行操作
- 数据管理:执行数据库迁移和初始化
- 配置验证:在部署前验证环境和配置
- 清理任务:在删除时执行清理操作
- 测试集成:集成自动化测试到部署流程
5.2 Hook 类型和触发时机
5.2.1 安装相关 Hooks
# pre-install Hook
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-pre-install"
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: pre-install
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Executing pre-install tasks..."
# 检查数据库连接
nc -z {{ .Values.database.host }} {{ .Values.database.port }}
echo "Database is accessible"
# 创建必要的目录
mkdir -p /shared/logs
echo "Pre-install completed successfully"
# post-install Hook
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-post-install"
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: post-install
image: curlimages/curl:7.85.0
command:
- /bin/sh
- -c
- |
echo "Executing post-install tasks..."
# 等待服务就绪
until curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; do
echo "Waiting for service to be ready..."
sleep 5
done
echo "Service is ready"
# 发送通知
curl -X POST {{ .Values.notifications.webhook }} \
-H "Content-Type: application/json" \
-d '{"message": "Application {{ include "myapp.fullname" . }} installed successfully"}'
5.2.2 升级相关 Hooks
# pre-upgrade Hook - 数据库备份
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-pre-upgrade-backup"
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": before-hook-creation
spec:
template:
spec:
restartPolicy: Never
containers:
- name: backup
image: postgres:13
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
command:
- /bin/bash
- -c
- |
echo "Creating database backup before upgrade..."
BACKUP_FILE="/backup/backup-$(date +%Y%m%d-%H%M%S).sql"
pg_dump -h {{ .Values.database.host }} \
-U {{ .Values.database.username }} \
-d {{ .Values.database.name }} \
> $BACKUP_FILE
echo "Backup created: $BACKUP_FILE"
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: {{ include "myapp.fullname" . }}-backup-pvc
# post-upgrade Hook - 数据库迁移
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-post-upgrade-migration"
annotations:
"helm.sh/hook": post-upgrade
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: migration
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
env:
- name: DATABASE_URL
value: "postgresql://{{ .Values.database.username }}:$(DB_PASSWORD)@{{ .Values.database.host }}:{{ .Values.database.port }}/{{ .Values.database.name }}"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
command:
- /bin/sh
- -c
- |
echo "Running database migrations..."
./migrate up
echo "Migrations completed successfully"
5.2.3 删除相关 Hooks
# pre-delete Hook - 数据备份
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-pre-delete-backup"
annotations:
"helm.sh/hook": pre-delete
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: final-backup
image: postgres:13
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
command:
- /bin/bash
- -c
- |
echo "Creating final backup before deletion..."
BACKUP_FILE="/backup/final-backup-$(date +%Y%m%d-%H%M%S).sql"
pg_dump -h {{ .Values.database.host }} \
-U {{ .Values.database.username }} \
-d {{ .Values.database.name }} \
> $BACKUP_FILE
echo "Final backup created: $BACKUP_FILE"
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: {{ include "myapp.fullname" . }}-backup-pvc
# post-delete Hook - 清理任务
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-post-delete-cleanup"
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: cleanup
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Performing cleanup tasks..."
# 清理外部资源
# 发送删除通知
echo "Cleanup completed"
5.2.4 回滚相关 Hooks
# pre-rollback Hook
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-pre-rollback"
annotations:
"helm.sh/hook": pre-rollback
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: pre-rollback
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Preparing for rollback..."
# 验证回滚条件
# 准备回滚环境
echo "Pre-rollback tasks completed"
# post-rollback Hook
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-post-rollback"
annotations:
"helm.sh/hook": post-rollback
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: post-rollback
image: curlimages/curl:7.85.0
command:
- /bin/sh
- -c
- |
echo "Verifying rollback..."
# 验证服务状态
until curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; do
echo "Waiting for service after rollback..."
sleep 5
done
echo "Rollback verification completed"
5.2.5 测试 Hooks
# test Hook
apiVersion: v1
kind: Pod
metadata:
name: "{{ include "myapp.fullname" . }}-test"
annotations:
"helm.sh/hook": test
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
restartPolicy: Never
containers:
- name: test
image: curlimages/curl:7.85.0
command:
- /bin/sh
- -c
- |
echo "Running application tests..."
# 健康检查测试
echo "Testing health endpoint..."
curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health
# API 功能测试
echo "Testing API endpoints..."
curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/users
# 数据库连接测试
echo "Testing database connectivity..."
curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/health/db
echo "All tests passed successfully"
5.3 Hook 注解详解
5.3.1 基本注解
metadata:
annotations:
# Hook 类型(必需)
"helm.sh/hook": "pre-install,post-install"
# Hook 权重(可选,默认为 0)
"helm.sh/hook-weight": "-5"
# Hook 删除策略(可选)
"helm.sh/hook-delete-policy": "before-hook-creation,hook-succeeded"
5.3.2 Hook 权重管理
权重决定了 Hook 的执行顺序,数值越小越先执行:
# 执行顺序示例
# 1. 权重 -10:数据库备份
apiVersion: batch/v1
kind: Job
metadata:
name: backup-job
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "-10"
---
# 2. 权重 -5:环境准备
apiVersion: batch/v1
kind: Job
metadata:
name: prepare-job
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "-5"
---
# 3. 权重 0:默认操作
apiVersion: batch/v1
kind: Job
metadata:
name: default-job
annotations:
"helm.sh/hook": pre-upgrade
# 默认权重为 0
---
# 4. 权重 5:后续处理
apiVersion: batch/v1
kind: Job
metadata:
name: post-process-job
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "5"
5.3.3 Hook 删除策略
# 删除策略选项
metadata:
annotations:
"helm.sh/hook-delete-policy": "before-hook-creation" # 创建新 Hook 前删除
# "helm.sh/hook-delete-policy": "hook-succeeded" # Hook 成功后删除
# "helm.sh/hook-delete-policy": "hook-failed" # Hook 失败后删除
# "helm.sh/hook-delete-policy": "before-hook-creation,hook-succeeded" # 组合策略
5.4 实际应用场景
5.4.1 数据库迁移系统
# templates/hooks/db-migration.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-db-migration"
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
metadata:
labels:
app: {{ include "myapp.name" . }}
component: db-migration
spec:
restartPolicy: Never
initContainers:
- name: wait-for-db
image: postgres:13
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
command:
- /bin/bash
- -c
- |
echo "Waiting for database to be ready..."
until pg_isready -h {{ .Values.database.host }} -p {{ .Values.database.port }} -U {{ .Values.database.username }}; do
echo "Database not ready, waiting..."
sleep 2
done
echo "Database is ready"
containers:
- name: migrate
image: "{{ .Values.migration.image.repository }}:{{ .Values.migration.image.tag }}"
env:
- name: DATABASE_URL
value: "postgresql://{{ .Values.database.username }}:$(DB_PASSWORD)@{{ .Values.database.host }}:{{ .Values.database.port }}/{{ .Values.database.name }}?sslmode={{ .Values.database.sslmode }}"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
- name: MIGRATION_DIR
value: {{ .Values.migration.directory }}
command:
- /bin/sh
- -c
- |
echo "Starting database migration..."
echo "Current database version:"
migrate -path $MIGRATION_DIR -database $DATABASE_URL version
echo "Running migrations..."
migrate -path $MIGRATION_DIR -database $DATABASE_URL up
echo "Migration completed. New database version:"
migrate -path $MIGRATION_DIR -database $DATABASE_URL version
volumeMounts:
- name: migration-scripts
mountPath: {{ .Values.migration.directory }}
volumes:
- name: migration-scripts
configMap:
name: {{ include "myapp.fullname" . }}-migration-scripts
5.4.2 配置验证系统
# templates/hooks/config-validation.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-config-validation"
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: validator
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Validating configuration..."
# 验证必需的配置项
if [ -z "{{ .Values.app.name }}" ]; then
echo "ERROR: app.name is required"
exit 1
fi
if [ -z "{{ .Values.database.host }}" ]; then
echo "ERROR: database.host is required"
exit 1
fi
# 验证资源配置
CPU_LIMIT="{{ .Values.resources.limits.cpu }}"
MEMORY_LIMIT="{{ .Values.resources.limits.memory }}"
if [ "$CPU_LIMIT" = "" ] || [ "$MEMORY_LIMIT" = "" ]; then
echo "ERROR: Resource limits must be specified"
exit 1
fi
# 验证网络连接
echo "Testing database connectivity..."
nc -z {{ .Values.database.host }} {{ .Values.database.port }}
if [ $? -ne 0 ]; then
echo "ERROR: Cannot connect to database"
exit 1
fi
{{- if .Values.redis.enabled }}
echo "Testing Redis connectivity..."
nc -z {{ .Values.redis.host }} {{ .Values.redis.port }}
if [ $? -ne 0 ]; then
echo "ERROR: Cannot connect to Redis"
exit 1
fi
{{- end }}
echo "Configuration validation passed"
5.4.3 数据初始化系统
# templates/hooks/data-initialization.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-data-init"
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "10"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: data-init
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
env:
- name: DATABASE_URL
value: "postgresql://{{ .Values.database.username }}:$(DB_PASSWORD)@{{ .Values.database.host }}:{{ .Values.database.port }}/{{ .Values.database.name }}"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: password
- name: ADMIN_EMAIL
value: {{ .Values.admin.email }}
- name: ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "myapp.fullname" . }}-admin-secret
key: password
command:
- /bin/sh
- -c
- |
echo "Initializing application data..."
# 等待应用服务就绪
until curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; do
echo "Waiting for application to be ready..."
sleep 5
done
# 创建管理员用户
echo "Creating admin user..."
curl -X POST http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/admin/users \
-H "Content-Type: application/json" \
-d '{
"email": "'$ADMIN_EMAIL'",
"password": "'$ADMIN_PASSWORD'",
"role": "admin"
}'
# 初始化默认数据
{{- if .Values.dataInit.enabled }}
echo "Loading initial data..."
curl -X POST http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/admin/data/init
{{- end }}
echo "Data initialization completed"
5.4.4 清理和备份系统
# templates/hooks/cleanup.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-cleanup"
annotations:
"helm.sh/hook": pre-delete
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
serviceAccountName: {{ include "myapp.fullname" . }}-cleanup
containers:
- name: cleanup
image: bitnami/kubectl:latest
command:
- /bin/bash
- -c
- |
echo "Starting cleanup process..."
# 备份重要数据
echo "Creating final backup..."
kubectl create job {{ include "myapp.fullname" . }}-final-backup \
--from=cronjob/{{ include "myapp.fullname" . }}-backup \
--namespace={{ .Release.Namespace }}
# 等待备份完成
kubectl wait --for=condition=complete job/{{ include "myapp.fullname" . }}-final-backup \
--timeout=300s --namespace={{ .Release.Namespace }}
# 清理外部资源
{{- if .Values.cleanup.s3.enabled }}
echo "Cleaning up S3 resources..."
aws s3 rm s3://{{ .Values.cleanup.s3.bucket }}/{{ include "myapp.fullname" . }}/ --recursive
{{- end }}
{{- if .Values.cleanup.dns.enabled }}
echo "Cleaning up DNS records..."
# 清理 DNS 记录的逻辑
{{- end }}
echo "Cleanup completed"
env:
{{- if .Values.cleanup.s3.enabled }}
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: {{ .Values.cleanup.s3.secretName }}
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: {{ .Values.cleanup.s3.secretName }}
key: secret-access-key
{{- end }}
5.5 Hook 错误处理和重试
5.5.1 重试机制
# templates/hooks/retry-example.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-retry-hook"
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
# 重试配置
backoffLimit: 3 # 最大重试次数
activeDeadlineSeconds: 300 # 总超时时间
template:
spec:
restartPolicy: Never
containers:
- name: retry-task
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Attempting task (attempt: $((${JOB_COMPLETION_INDEX:-0} + 1)))..."
# 模拟可能失败的任务
if [ "$(shuf -i 1-10 -n 1)" -le 3 ]; then
echo "Task failed, will retry..."
exit 1
fi
echo "Task completed successfully"
5.5.2 超时处理
# templates/hooks/timeout-example.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-timeout-hook"
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-weight": "0"
"helm.sh/hook-delete-policy": before-hook-creation
spec:
activeDeadlineSeconds: 600 # 10分钟超时
template:
spec:
restartPolicy: Never
containers:
- name: long-running-task
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Starting long-running task..."
# 设置内部超时
timeout 300 /bin/sh -c '
while true; do
echo "Processing..."
sleep 10
# 检查完成条件
if [ -f /tmp/task-complete ]; then
break
fi
done
' || {
echo "Task timed out after 5 minutes"
exit 1
}
echo "Task completed within timeout"
5.5.3 错误恢复
# templates/hooks/error-recovery.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-error-recovery"
annotations:
"helm.sh/hook": post-upgrade
"helm.sh/hook-weight": "10"
"helm.sh/hook-delete-policy": hook-failed
spec:
template:
spec:
restartPolicy: Never
containers:
- name: recovery-task
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "Starting error recovery process..."
# 检查应用状态
if ! curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; then
echo "Application health check failed, attempting recovery..."
# 尝试重启应用
kubectl rollout restart deployment/{{ include "myapp.fullname" . }} \
--namespace={{ .Release.Namespace }}
# 等待重启完成
kubectl rollout status deployment/{{ include "myapp.fullname" . }} \
--namespace={{ .Release.Namespace }} --timeout=300s
# 再次检查健康状态
sleep 30
if ! curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; then
echo "Recovery failed, manual intervention required"
exit 1
fi
fi
echo "Recovery completed successfully"
5.6 Hook 调试和监控
5.6.1 Hook 状态查看
# 查看 Hook 资源
kubectl get jobs -l "app.kubernetes.io/managed-by=Helm"
# 查看 Hook 日志
kubectl logs job/myapp-pre-install
# 查看 Hook 事件
kubectl describe job myapp-pre-install
# 查看所有 Hook 相关的 Pod
kubectl get pods -l "helm.sh/hook"
5.6.2 Hook 调试模板
# templates/hooks/debug-hook.yaml
{{- if .Values.debug.hooks.enabled }}
apiVersion: v1
kind: Pod
metadata:
name: "{{ include "myapp.fullname" . }}-debug-hook"
annotations:
"helm.sh/hook": {{ .Values.debug.hooks.type | default "post-install" }}
"helm.sh/hook-weight": "100"
"helm.sh/hook-delete-policy": "hook-succeeded"
spec:
restartPolicy: Never
containers:
- name: debug
image: alpine:3.16
command:
- /bin/sh
- -c
- |
echo "=== Hook Debug Information ==="
echo "Release Name: {{ .Release.Name }}"
echo "Release Namespace: {{ .Release.Namespace }}"
echo "Release Revision: {{ .Release.Revision }}"
echo "Chart Name: {{ .Chart.Name }}"
echo "Chart Version: {{ .Chart.Version }}"
echo "\n=== Environment Variables ==="
env | sort
echo "\n=== Kubernetes Resources ==="
kubectl get all -l "app.kubernetes.io/instance={{ .Release.Name }}" \
--namespace={{ .Release.Namespace }}
echo "\n=== Hook Resources ==="
kubectl get jobs,pods -l "helm.sh/hook" \
--namespace={{ .Release.Namespace }}
echo "\n=== Values Debug ==="
echo "App Name: {{ .Values.app.name }}"
echo "Image: {{ .Values.image.repository }}:{{ .Values.image.tag }}"
echo "Replica Count: {{ .Values.replicaCount }}"
echo "\n=== Debug completed ==="
{{- end }}
5.6.3 Hook 监控配置
# templates/hooks/monitoring-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ include "myapp.fullname" . }}-monitoring-setup"
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "15"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: monitoring-setup
image: curlimages/curl:7.85.0
command:
- /bin/sh
- -c
- |
echo "Setting up monitoring for hooks..."
# 创建 Prometheus 监控规则
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ include "myapp.fullname" . }}-hook-monitoring
namespace: {{ .Release.Namespace }}
spec:
groups:
- name: helm-hooks
rules:
- alert: HelmHookFailed
expr: kube_job_status_failed{job_name=~".*-hook.*"} > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Helm hook job failed"
description: "Hook job {{ "{{" }} $labels.job_name {{ "}}" }} has failed"
EOF
echo "Monitoring setup completed"
5.7 实践练习
5.7.1 练习1:数据库迁移 Hook
创建一个完整的数据库迁移系统:
# 1. 创建 Chart
helm create webapp-with-migration
cd webapp-with-migration
# 2. 创建迁移 Hook
mkdir -p templates/hooks
vim templates/hooks/db-migration.yaml
# 3. 创建迁移脚本 ConfigMap
vim templates/migration-configmap.yaml
# 4. 配置 values.yaml
vim values.yaml
# 5. 测试迁移
helm template . --debug
helm install webapp . --dry-run
5.7.2 练习2:多阶段部署 Hook
实现一个包含验证、备份、部署、测试的完整流程:
# 阶段1:预检查 (权重 -10)
# 阶段2:备份 (权重 -5)
# 阶段3:部署 (权重 0)
# 阶段4:迁移 (权重 5)
# 阶段5:测试 (权重 10)
# 阶段6:通知 (权重 15)
5.7.3 练习3:错误恢复系统
创建一个具有自动错误恢复能力的 Hook 系统:
# 1. 创建监控 Hook
vim templates/hooks/health-monitor.yaml
# 2. 创建恢复 Hook
vim templates/hooks/auto-recovery.yaml
# 3. 创建通知 Hook
vim templates/hooks/notification.yaml
# 4. 测试错误场景
helm install webapp . --set app.simulateError=true
5.8 故障排除
5.8.1 常见 Hook 问题
Hook 执行失败 “`bash
查看 Hook 状态
kubectl get jobs -l “helm.sh/hook”
查看失败原因
kubectl describe job myapp-pre-install kubectl logs job/myapp-pre-install
手动重新运行 Hook
kubectl delete job myapp-pre-install helm upgrade myapp . –force
2. **Hook 超时**
```bash
# 检查 Hook 配置
kubectl get job myapp-pre-install -o yaml
# 增加超时时间
# 在 Job spec 中添加:
# activeDeadlineSeconds: 1800 # 30分钟
Hook 权重问题 “`bash
查看 Hook 执行顺序
kubectl get jobs -l “helm.sh/hook” –sort-by=.metadata.creationTimestamp
检查权重配置
kubectl get jobs -l “helm.sh/hook” -o jsonpath=‘{range .items[*]}{.metadata.name}{“\t”}{.metadata.annotations.helm.sh/hook-weight}{“\n”}{end}’
### 5.8.2 调试技巧
```bash
# 启用 Hook 调试
helm install myapp . --debug --dry-run
# 查看 Hook 模板渲染结果
helm template myapp . --show-only templates/hooks/
# 手动创建 Hook 进行测试
helm template myapp . --show-only templates/hooks/pre-install.yaml | kubectl apply -f -
# 监控 Hook 执行
watch kubectl get jobs,pods -l "helm.sh/hook"
5.8.3 性能优化
# 优化 Hook 性能
spec:
parallelism: 1 # 并行度
completions: 1 # 完成数
backoffLimit: 3 # 重试限制
activeDeadlineSeconds: 300 # 超时设置
template:
spec:
restartPolicy: Never
containers:
- name: hook-task
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
5.9 本章小结
核心概念回顾
- Helm Hooks:在 Chart 生命周期特定时刻执行的 Kubernetes 资源
- Hook 类型:pre-install、post-install、pre-upgrade、post-upgrade、pre-delete、post-delete、pre-rollback、post-rollback、test
- Hook 权重:控制同类型 Hook 的执行顺序
- 删除策略:控制 Hook 资源的清理时机
- 错误处理:重试机制、超时控制、错误恢复
技术要点总结
- Hook 通过特定注解标识和配置
- 权重数值越小越先执行
- 支持多种删除策略管理资源生命周期
- 可以实现复杂的部署流程控制
- 需要合理的错误处理和监控机制
最佳实践
- 合理规划权重:确保 Hook 按正确顺序执行
- 错误处理:实现适当的重试和恢复机制
- 资源清理:选择合适的删除策略避免资源泄露
- 监控告警:监控 Hook 执行状态和性能
- 测试验证:充分测试 Hook 在各种场景下的行为
下一章预告
下一章我们将学习「测试与验证」,探讨如何使用 Helm 的测试功能验证 Chart 的正确性,包括单元测试、集成测试、端到端测试等测试策略,以及如何构建完整的 Chart 测试流程。