13.1 概述
Kubernetes生态系统是一个庞大而活跃的社区,围绕着容器编排平台构建了丰富的工具链和解决方案。本章将介绍Kubernetes生态系统中的重要组件和工具,帮助您了解如何利用这些工具提升开发和运维效率。
13.1.1 生态系统架构
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes生态系统 │
├─────────────────────────────────────────────────────────────────┤
│ 开发工具 │ 部署工具 │ 运维工具 │
│ ├─ Skaffold │ ├─ Helm │ ├─ Prometheus │
│ ├─ Tilt │ ├─ Kustomize │ ├─ Grafana │
│ ├─ Draft │ ├─ ArgoCD │ ├─ Jaeger │
│ └─ DevSpace │ ├─ Flux │ └─ Fluentd │
├─────────────────────────────────────────────────────────────────┤
│ 安全工具 │ 网络工具 │ 存储工具 │
│ ├─ Falco │ ├─ Istio │ ├─ Rook │
│ ├─ OPA Gatekeeper │ ├─ Linkerd │ ├─ Longhorn │
│ ├─ Twistlock │ ├─ Cilium │ ├─ OpenEBS │
│ └─ Aqua Security │ └─ Calico │ └─ Portworx │
├─────────────────────────────────────────────────────────────────┤
│ CI/CD工具 │ 多集群管理 │ 成本管理 │
│ ├─ Tekton │ ├─ Rancher │ ├─ KubeCost │
│ ├─ Jenkins X │ ├─ Cluster API │ ├─ Goldilocks │
│ ├─ GitLab CI │ ├─ Admiral │ └─ Fairwinds │
│ └─ GitHub Actions │ └─ Loft │ │
└─────────────────────────────────────────────────────────────────┘
13.1.2 工具分类
- 开发工具: 简化应用开发和调试流程
- 部署工具: 自动化应用部署和配置管理
- 运维工具: 监控、日志、追踪和故障排查
- 安全工具: 安全扫描、策略执行和合规检查
- 网络工具: 服务网格、网络策略和流量管理
- 存储工具: 持久化存储和数据管理
- CI/CD工具: 持续集成和持续部署
- 多集群管理: 跨集群部署和管理
- 成本管理: 资源使用分析和成本优化
13.2 包管理工具
13.2.1 Helm
Helm是Kubernetes的包管理器,类似于Linux的apt或yum。
基本概念
# Helm基本概念
echo "=== Helm基本概念 ==="
# Chart: Helm包,包含运行应用所需的所有资源定义
# Release: Chart的运行实例
# Repository: Chart仓库
# Values: 配置参数
# 安装Helm
curl https://get.helm.sh/helm-v3.12.0-windows-amd64.zip -o helm.zip
Unzip helm.zip
mv windows-amd64/helm.exe /usr/local/bin/helm
# 验证安装
helm version
Chart开发
#!/bin/bash
# Helm Chart开发脚本
echo "=== 创建Helm Chart ==="
# 创建新的Chart
helm create myapp
echo "Chart结构:"
tree myapp/
# Chart目录结构
# myapp/
# ├── Chart.yaml # Chart元数据
# ├── values.yaml # 默认配置值
# ├── charts/ # 依赖Chart
# ├── templates/ # 模板文件
# │ ├── deployment.yaml
# │ ├── service.yaml
# │ ├── ingress.yaml
# │ ├── _helpers.tpl # 模板助手
# │ └── NOTES.txt # 安装说明
# └── .helmignore # 忽略文件
echo "\n=== 自定义Chart配置 ==="
# 修改Chart.yaml
cat > myapp/Chart.yaml << 'EOF'
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 0.1.0
appVersion: "1.0.0"
keywords:
- web
- application
home: https://example.com
sources:
- https://github.com/example/myapp
maintainers:
- name: Developer
email: dev@example.com
dependencies:
- name: redis
version: "17.3.7"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
EOF
# 修改values.yaml
cat > myapp/values.yaml << 'EOF'
# 默认配置值
replicaCount: 2
image:
repository: nginx
pullPolicy: IfNotPresent
tag: "1.21"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations: {}
podSecurityContext:
fsGroup: 2000
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
service:
type: ClusterIP
port: 80
ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
pathType: Prefix
tls: []
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity: {}
# Redis配置
redis:
enabled: true
auth:
enabled: false
EOF
# 创建自定义模板
cat > myapp/templates/configmap.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "myapp.fullname" . }}-config
labels:
{{- include "myapp.labels" . | nindent 4 }}
data:
app.properties: |
server.port={{ .Values.service.port }}
redis.enabled={{ .Values.redis.enabled }}
{{- if .Values.redis.enabled }}
redis.host={{ include "myapp.fullname" . }}-redis-master
{{- end }}
EOF
# 更新deployment模板以使用ConfigMap
cat > myapp/templates/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "myapp.fullname" . }}
labels:
{{- include "myapp.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "myapp.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "myapp.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "myapp.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
volumes:
- name: config
configMap:
name: {{ include "myapp.fullname" . }}-config
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
EOF
echo "\n=== Chart验证和测试 ==="
# 验证Chart语法
helm lint myapp/
# 渲染模板(不安装)
helm template myapp myapp/ --debug
# 模拟安装(dry-run)
helm install myapp-test myapp/ --dry-run --debug
# 打包Chart
helm package myapp/
echo "\n=== Chart部署 ==="
# 安装Chart
helm install myapp myapp/
# 查看Release
helm list
# 查看Release状态
helm status myapp
# 升级Release
helm upgrade myapp myapp/ --set replicaCount=3
# 回滚Release
helm rollback myapp 1
# 卸载Release
# helm uninstall myapp
echo "\n=== Chart仓库管理 ==="
# 添加官方仓库
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
# 更新仓库索引
helm repo update
# 搜索Chart
helm search repo nginx
# 查看Chart信息
helm show chart bitnami/nginx
helm show values bitnami/nginx
# 安装第三方Chart
helm install my-nginx bitnami/nginx --set service.type=LoadBalancer
echo "\n=== Chart开发完成 ==="
13.2.2 Kustomize
Kustomize是Kubernetes原生的配置管理工具,通过声明式的方式管理配置。
#!/bin/bash
# Kustomize使用脚本
echo "=== Kustomize配置管理 ==="
# 创建基础配置
mkdir -p kustomize-demo/{base,overlays/{dev,staging,prod}}
# 基础配置
cat > kustomize-demo/base/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx:1.21
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
EOF
cat > kustomize-demo/base/service.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
cat > kustomize-demo/base/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
commonLabels:
version: v1.0.0
managed-by: kustomize
commonAnnotations:
description: "Base configuration for myapp"
EOF
# 开发环境配置
cat > kustomize-demo/overlays/dev/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: dev
namePrefix: dev-
nameSuffix: -v1
resources:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
- service-patch.yaml
replicas:
- name: myapp
count: 1
images:
- name: nginx
newTag: 1.21-alpine
commonLabels:
environment: dev
configMapGenerator:
- name: app-config
literals:
- ENV=development
- DEBUG=true
- LOG_LEVEL=debug
EOF
cat > kustomize-demo/overlays/dev/deployment-patch.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
env:
- name: ENV
valueFrom:
configMapKeyRef:
name: app-config
key: ENV
- name: DEBUG
valueFrom:
configMapKeyRef:
name: app-config
key: DEBUG
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
EOF
cat > kustomize-demo/overlays/dev/service-patch.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
type: NodePort
EOF
# 生产环境配置
cat > kustomize-demo/overlays/prod/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: prod
namePrefix: prod-
resources:
- ../../base
- ingress.yaml
- hpa.yaml
patchesStrategicMerge:
- deployment-patch.yaml
replicas:
- name: myapp
count: 3
images:
- name: nginx
newTag: 1.21
commonLabels:
environment: prod
secretGenerator:
- name: app-secrets
literals:
- DATABASE_URL=postgresql://prod-db:5432/myapp
- API_KEY=prod-api-key-12345
configMapGenerator:
- name: app-config
literals:
- ENV=production
- DEBUG=false
- LOG_LEVEL=info
EOF
cat > kustomize-demo/overlays/prod/deployment-patch.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
env:
- name: ENV
valueFrom:
configMapKeyRef:
name: app-config
key: ENV
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: DATABASE_URL
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
EOF
cat > kustomize-demo/overlays/prod/ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80
EOF
cat > kustomize-demo/overlays/prod/hpa.yaml << 'EOF'
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
EOF
echo "\n=== Kustomize构建和部署 ==="
# 构建开发环境配置
echo "开发环境配置:"
kustomize build kustomize-demo/overlays/dev
# 构建生产环境配置
echo "\n生产环境配置:"
kustomize build kustomize-demo/overlays/prod
# 应用配置
echo "\n部署到开发环境:"
kustomize build kustomize-demo/overlays/dev | kubectl apply -f -
echo "\n部署到生产环境:"
kustomize build kustomize-demo/overlays/prod | kubectl apply -f -
# 验证部署
kubectl get all -n dev
kubectl get all -n prod
echo "\n=== Kustomize配置管理完成 ==="
13.3 CI/CD工具
13.3.1 Tekton
Tekton是Kubernetes原生的CI/CD框架。
# Tekton Pipeline示例
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: build-and-deploy
namespace: tekton-pipelines
spec:
description: |
This pipeline clones a git repo, builds a Docker image with Kaniko and
deploys it to Kubernetes
params:
- name: repo-url
type: string
description: The git repo URL to clone from.
- name: image-reference
type: string
description: The image reference for the image to build
- name: deployment-name
type: string
description: The name of the deployment to update
- name: deployment-namespace
type: string
description: The namespace of the deployment to update
default: default
workspaces:
- name: shared-data
description: |
This workspace contains the cloned repo files, so they can be read by the
next task.
- name: docker-credentials
description: Docker registry credentials
tasks:
- name: fetch-source
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-data
params:
- name: url
value: $(params.repo-url)
- name: build-image
runAfter: ["fetch-source"]
taskRef:
name: kaniko
workspaces:
- name: source
workspace: shared-data
- name: dockerconfig
workspace: docker-credentials
params:
- name: IMAGE
value: $(params.image-reference)
- name: deploy
runAfter: ["build-image"]
taskRef:
name: kubernetes-actions
params:
- name: script
value: |
kubectl set image deployment/$(params.deployment-name) \
app=$(params.image-reference) \
-n $(params.deployment-namespace)
kubectl rollout status deployment/$(params.deployment-name) \
-n $(params.deployment-namespace)
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: kubernetes-actions
namespace: tekton-pipelines
spec:
description: |
This task performs kubernetes actions like kubectl apply, delete, etc.
params:
- name: script
description: The kubectl script to run
type: string
steps:
- name: kubectl
image: bitnami/kubectl:latest
script: |
#!/bin/bash
set -e
$(params.script)
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: build-and-deploy-run
namespace: tekton-pipelines
spec:
pipelineRef:
name: build-and-deploy
workspaces:
- name: shared-data
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- name: docker-credentials
secret:
secretName: docker-credentials
params:
- name: repo-url
value: https://github.com/example/myapp.git
- name: image-reference
value: docker.io/example/myapp:latest
- name: deployment-name
value: myapp
- name: deployment-namespace
value: default
13.3.2 ArgoCD
ArgoCD是声明式的GitOps持续部署工具。
#!/bin/bash
# ArgoCD安装和配置脚本
echo "=== ArgoCD安装 ==="
# 创建命名空间
kubectl create namespace argocd
# 安装ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 等待Pod就绪
kubectl wait --for=condition=available --timeout=300s deployment/argocd-server -n argocd
# 获取初始密码
echo "\n=== 获取ArgoCD初始密码 ==="
ARGO_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD初始密码: $ARGO_PASSWORD"
# 端口转发访问ArgoCD UI
echo "\n=== 访问ArgoCD UI ==="
echo "运行以下命令进行端口转发:"
echo "kubectl port-forward svc/argocd-server -n argocd 8080:443"
echo "然后访问: https://localhost:8080"
echo "用户名: admin"
echo "密码: $ARGO_PASSWORD"
# 安装ArgoCD CLI
echo "\n=== 安装ArgoCD CLI ==="
curl -sSL -o argocd-windows-amd64.exe https://github.com/argoproj/argo-cd/releases/latest/download/argocd-windows-amd64.exe
mv argocd-windows-amd64.exe /usr/local/bin/argocd
chmod +x /usr/local/bin/argocd
# 登录ArgoCD
echo "\n=== 登录ArgoCD ==="
argocd login localhost:8080 --username admin --password $ARGO_PASSWORD --insecure
# 创建应用配置
echo "\n=== 创建ArgoCD应用 ==="
cat > argocd-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/example/myapp-config.git
targetRevision: HEAD
path: k8s
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- Validate=false
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
EOF
# 应用配置
kubectl apply -f argocd-app.yaml
# 创建项目配置
cat > argocd-project.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: myproject
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
description: My Project
sourceRepos:
- 'https://github.com/example/*'
destinations:
- namespace: 'default'
server: https://kubernetes.default.svc
- namespace: 'staging'
server: https://kubernetes.default.svc
- namespace: 'prod'
server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: ''
kind: Namespace
- group: 'rbac.authorization.k8s.io'
kind: ClusterRole
- group: 'rbac.authorization.k8s.io'
kind: ClusterRoleBinding
namespaceResourceWhitelist:
- group: ''
kind: ConfigMap
- group: ''
kind: Secret
- group: ''
kind: Service
- group: 'apps'
kind: Deployment
- group: 'apps'
kind: ReplicaSet
- group: 'networking.k8s.io'
kind: Ingress
roles:
- name: developer
description: Developer role
policies:
- p, proj:myproject:developer, applications, get, myproject/*, allow
- p, proj:myproject:developer, applications, sync, myproject/*, allow
groups:
- myorg:developers
- name: admin
description: Admin role
policies:
- p, proj:myproject:admin, applications, *, myproject/*, allow
- p, proj:myproject:admin, repositories, *, *, allow
groups:
- myorg:admins
EOF
kubectl apply -f argocd-project.yaml
echo "\n=== ArgoCD配置完成 ==="
echo "查看应用状态:"
argocd app list
argocd app get myapp
13.4 监控和可观测性工具
13.4.1 Prometheus生态系统
#!/bin/bash
# Prometheus生态系统部署脚本
echo "=== 部署Prometheus生态系统 ==="
# 添加Prometheus Helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 创建监控命名空间
kubectl create namespace monitoring
# 创建Prometheus配置
cat > prometheus-values.yaml << 'EOF'
# Prometheus配置
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
resources:
requests:
memory: 2Gi
cpu: 1000m
limits:
memory: 4Gi
cpu: 2000m
additionalScrapeConfigs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Grafana配置
grafana:
enabled: true
adminPassword: admin123
persistence:
enabled: true
size: 10Gi
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
kubernetes-cluster:
gnetId: 7249
revision: 1
datasource: Prometheus
kubernetes-pods:
gnetId: 6417
revision: 1
datasource: Prometheus
node-exporter:
gnetId: 1860
revision: 27
datasource: Prometheus
# AlertManager配置
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
config:
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: 'Kubernetes Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
webhook_configs:
- url: 'http://webhook-service:9093/webhook'
send_resolved: true
# Node Exporter配置
nodeExporter:
enabled: true
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
# kube-state-metrics配置
kubeStateMetrics:
enabled: true
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
EOF
# 安装Prometheus Stack
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yaml
# 等待部署完成
kubectl wait --for=condition=available --timeout=300s deployment/prometheus-grafana -n monitoring
echo "\n=== 配置服务访问 ==="
# 创建Ingress
cat > monitoring-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: monitoring-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: prometheus.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-kube-prometheus-prometheus
port:
number: 9090
- host: grafana.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-grafana
port:
number: 80
- host: alertmanager.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-kube-prometheus-alertmanager
port:
number: 9093
EOF
kubectl apply -f monitoring-ingress.yaml
echo "\n=== 访问信息 ==="
echo "Prometheus: http://prometheus.local"
echo "Grafana: http://grafana.local (admin/admin123)"
echo "AlertManager: http://alertmanager.local"
echo "\n=== 端口转发访问 ==="
echo "Prometheus: kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring"
echo "Grafana: kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring"
echo "AlertManager: kubectl port-forward svc/prometheus-kube-prometheus-alertmanager 9093:9093 -n monitoring"
echo "\n=== Prometheus生态系统部署完成 ==="
13.4.2 Jaeger分布式追踪
#!/bin/bash
# Jaeger分布式追踪部署脚本
echo "=== 部署Jaeger分布式追踪 ==="
# 安装Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.41.0/jaeger-operator.yaml -n observability
# 等待Operator就绪
kubectl wait --for=condition=available --timeout=300s deployment/jaeger-operator -n observability
# 创建Jaeger实例
cat > jaeger-instance.yaml << 'EOF'
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
storage:
type: elasticsearch
elasticsearch:
nodeCount: 3
redundancyPolicy: SingleRedundancy
resources:
requests:
memory: 2Gi
cpu: 500m
limits:
memory: 4Gi
cpu: 1000m
storage:
storageClassName: standard
size: 50Gi
collector:
replicas: 2
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
query:
replicas: 2
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
agent:
strategy: DaemonSet
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
EOF
kubectl apply -f jaeger-instance.yaml
# 等待Jaeger部署完成
kubectl wait --for=condition=available --timeout=600s deployment/jaeger-query -n observability
# 创建示例应用
cat > jaeger-demo-app.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger-demo
namespace: observability
labels:
app: jaeger-demo
spec:
replicas: 2
selector:
matchLabels:
app: jaeger-demo
template:
metadata:
labels:
app: jaeger-demo
annotations:
sidecar.jaegertracing.io/inject: "true"
spec:
containers:
- name: demo
image: jaegertracing/example-hotrod:latest
ports:
- containerPort: 8080
env:
- name: JAEGER_AGENT_HOST
value: "jaeger-agent"
- name: JAEGER_AGENT_PORT
value: "6831"
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-demo-service
namespace: observability
spec:
selector:
app: jaeger-demo
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jaeger-ingress
namespace: observability
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: jaeger.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jaeger-query
port:
number: 16686
- host: jaeger-demo.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jaeger-demo-service
port:
number: 8080
EOF
kubectl apply -f jaeger-demo-app.yaml
echo "\n=== 访问信息 ==="
echo "Jaeger UI: http://jaeger.local"
echo "Demo App: http://jaeger-demo.local"
echo "\n=== 端口转发访问 ==="
echo "Jaeger UI: kubectl port-forward svc/jaeger-query 16686:16686 -n observability"
echo "Demo App: kubectl port-forward svc/jaeger-demo-service 8080:8080 -n observability"
echo "\n=== Jaeger分布式追踪部署完成 ==="
13.5 安全工具
13.5.1 Falco运行时安全监控
#!/bin/bash
# Falco运行时安全监控部署脚本
echo "=== 部署Falco运行时安全监控 ==="
# 添加Falco Helm仓库
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
# 创建安全命名空间
kubectl create namespace falco-system
# 创建Falco配置
cat > falco-values.yaml << 'EOF'
# Falco配置
falco:
grpc:
enabled: true
bind_address: "0.0.0.0:5060"
threadiness: 8
grpc_output:
enabled: true
json_output: true
json_include_output_property: true
log_stderr: true
log_syslog: false
log_level: info
priority: debug
rules_file:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco_rules.local.yaml
- /etc/falco/k8s_audit_rules.yaml
- /etc/falco/rules.d
plugins:
- name: k8saudit
library_path: libk8saudit.so
init_config:
maxEventBytes: 1048576
webhookMaxBatchSize: 12582912
open_params: 'http://:9765/k8s-audit'
- name: json
library_path: libjson.so
init_config: ""
# 自定义规则
customRules:
custom_rules.yaml: |-
- rule: Detect crypto miners
desc: Detect cryptocurrency miners
condition: >
spawned_process and
(proc.name in (xmrig, cpuminer, t-rex, gminer, nbminer, claymore) or
proc.cmdline contains "stratum+tcp" or
proc.cmdline contains "mining.pool")
output: >
Cryptocurrency miner detected (user=%user.name command=%proc.cmdline
container=%container.name image=%container.image.repository)
priority: CRITICAL
tags: [cryptocurrency, mining, malware]
- rule: Detect privilege escalation
desc: Detect attempts to escalate privileges
condition: >
spawned_process and
(proc.name in (sudo, su, doas) or
proc.cmdline contains "chmod +s" or
proc.cmdline contains "setuid")
output: >
Privilege escalation attempt (user=%user.name command=%proc.cmdline
container=%container.name image=%container.image.repository)
priority: WARNING
tags: [privilege_escalation, security]
- rule: Detect suspicious network activity
desc: Detect suspicious network connections
condition: >
(inbound_outbound) and
(fd.sport in (4444, 5555, 6666, 7777, 8888, 9999) or
fd.dport in (4444, 5555, 6666, 7777, 8888, 9999))
output: >
Suspicious network activity (connection=%fd.name sport=%fd.sport dport=%fd.dport
container=%container.name image=%container.image.repository)
priority: WARNING
tags: [network, suspicious]
# 资源配置
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
# 容忍度配置
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
# 节点选择器
nodeSelector: {}
# 服务配置
services:
- name: grpc
type: ClusterIP
ports:
- port: 5060
targetPort: 5060
protocol: TCP
name: grpc
- name: grpc-metrics
type: ClusterIP
ports:
- port: 8765
targetPort: 8765
protocol: TCP
name: metrics
EOF
# 安装Falco
helm install falco falcosecurity/falco \
--namespace falco-system \
--values falco-values.yaml
# 等待Falco就绪
kubectl wait --for=condition=ready --timeout=300s pod -l app.kubernetes.io/name=falco -n falco-system
# 创建Falco事件处理器
cat > falco-sidekick.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: falco-sidekick
namespace: falco-system
labels:
app: falco-sidekick
spec:
replicas: 1
selector:
matchLabels:
app: falco-sidekick
template:
metadata:
labels:
app: falco-sidekick
spec:
containers:
- name: falco-sidekick
image: falcosecurity/falcosidekick:latest
ports:
- containerPort: 2801
env:
- name: WEBHOOK_URL
value: "http://webhook-service:9093/falco"
- name: SLACK_WEBHOOKURL
value: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
- name: SLACK_CHANNEL
value: "#security-alerts"
- name: ELASTICSEARCH_HOSTPORT
value: "elasticsearch:9200"
- name: ELASTICSEARCH_INDEX
value: "falco"
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: falco-sidekick
namespace: falco-system
spec:
selector:
app: falco-sidekick
ports:
- port: 2801
targetPort: 2801
type: ClusterIP
EOF
kubectl apply -f falco-sidekick.yaml
# 创建测试Pod来触发Falco规则
cat > falco-test.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: falco-test
namespace: default
spec:
containers:
- name: test
image: ubuntu:20.04
command: ["/bin/bash"]
args: ["-c", "while true; do sleep 30; done"]
securityContext:
privileged: true
restartPolicy: Never
EOF
kubectl apply -f falco-test.yaml
echo "\n=== 测试Falco规则 ==="
echo "执行以下命令来触发Falco规则:"
echo "kubectl exec -it falco-test -- bash"
echo "然后在容器内执行:"
echo " apt update && apt install -y netcat"
echo " nc -l 4444 &"
echo " chmod +s /bin/bash"
echo "\n=== 查看Falco日志 ==="
echo "kubectl logs -f daemonset/falco -n falco-system"
echo "\n=== Falco运行时安全监控部署完成 ==="
13.5.2 OPA Gatekeeper策略引擎
#!/bin/bash
# OPA Gatekeeper策略引擎部署脚本
echo "=== 部署OPA Gatekeeper策略引擎 ==="
# 安装Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.14/deploy/gatekeeper.yaml
# 等待Gatekeeper就绪
kubectl wait --for=condition=available --timeout=300s deployment/gatekeeper-controller-manager -n gatekeeper-system
kubectl wait --for=condition=available --timeout=300s deployment/gatekeeper-audit -n gatekeeper-system
echo "\n=== 创建约束模板 ==="
# 创建必须有标签的约束模板
cat > required-labels-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg}] {
required := input.parameters.labels
provided := input.review.object.metadata.labels
missing := required[_]
not provided[missing]
msg := sprintf("Missing required label: %v", [missing])
}
EOF
kubectl apply -f required-labels-template.yaml
# 创建资源限制约束模板
cat > resource-limits-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8sresourcelimits
spec:
crd:
spec:
names:
kind: K8sResourceLimits
validation:
type: object
properties:
cpu:
type: string
memory:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sresourcelimits
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.cpu
msg := "Container must have CPU limits"
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.memory
msg := "Container must have memory limits"
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
cpu_limit := container.resources.limits.cpu
cpu_limit_num := units.parse_bytes(cpu_limit)
max_cpu := units.parse_bytes(input.parameters.cpu)
cpu_limit_num > max_cpu
msg := sprintf("CPU limit %v exceeds maximum %v", [cpu_limit, input.parameters.cpu])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
memory_limit := container.resources.limits.memory
memory_limit_num := units.parse_bytes(memory_limit)
max_memory := units.parse_bytes(input.parameters.memory)
memory_limit_num > max_memory
msg := sprintf("Memory limit %v exceeds maximum %v", [memory_limit, input.parameters.memory])
}
EOF
kubectl apply -f resource-limits-template.yaml
# 创建禁止特权容器约束模板
cat > no-privileged-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8snoprivileged
spec:
crd:
spec:
names:
kind: K8sNoPrivileged
validation:
type: object
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8snoprivileged
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := "Privileged containers are not allowed"
}
violation[{"msg": msg}] {
input.review.object.spec.securityContext.privileged == true
msg := "Privileged pods are not allowed"
}
EOF
kubectl apply -f no-privileged-template.yaml
echo "\n=== 创建约束实例 ==="
# 创建必须有标签的约束
cat > required-labels-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: must-have-labels
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment"]
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
parameters:
labels: ["app", "version", "environment"]
EOF
kubectl apply -f required-labels-constraint.yaml
# 创建资源限制约束
cat > resource-limits-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sResourceLimits
metadata:
name: resource-limits
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment"]
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
parameters:
cpu: "2000m"
memory: "2Gi"
EOF
kubectl apply -f resource-limits-constraint.yaml
# 创建禁止特权容器约束
cat > no-privileged-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sNoPrivileged
metadata:
name: no-privileged
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment"]
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
EOF
kubectl apply -f no-privileged-constraint.yaml
echo "\n=== 测试策略 ==="
# 创建违反策略的测试Pod
cat > test-violation.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: test-violation
namespace: default
spec:
containers:
- name: test
image: nginx:1.21
securityContext:
privileged: true
EOF
echo "尝试创建违反策略的Pod(应该被拒绝):"
kubectl apply -f test-violation.yaml || echo "Pod被策略拒绝(预期行为)"
# 创建符合策略的测试Pod
cat > test-compliant.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: test-compliant
namespace: default
labels:
app: test
version: v1.0.0
environment: dev
spec:
containers:
- name: test
image: nginx:1.21
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
privileged: false
runAsNonRoot: true
runAsUser: 1000
EOF
echo "创建符合策略的Pod:"
kubectl apply -f test-compliant.yaml
echo "\n=== 查看约束状态 ==="
kubectl get constraints
kubectl describe k8srequiredlabels must-have-labels
echo "\n=== OPA Gatekeeper策略引擎部署完成 ==="
13.6 服务网格
13.6.1 Istio服务网格
#!/bin/bash
# Istio服务网格部署脚本
echo "=== 部署Istio服务网格 ==="
# 下载Istio
ISTIO_VERSION=1.19.0
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=$ISTIO_VERSION sh -
cd istio-$ISTIO_VERSION
export PATH=$PWD/bin:$PATH
# 安装Istio
istioctl install --set values.defaultRevision=default -y
# 启用自动注入
kubectl label namespace default istio-injection=enabled
# 安装Istio插件
kubectl apply -f samples/addons/
# 等待组件就绪
kubectl wait --for=condition=available --timeout=300s deployment/istiod -n istio-system
kubectl wait --for=condition=available --timeout=300s deployment/kiali -n istio-system
echo "\n=== 部署示例应用 ==="
# 部署Bookinfo示例应用
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
# 等待应用就绪
kubectl wait --for=condition=available --timeout=300s deployment/productpage-v1
kubectl wait --for=condition=available --timeout=300s deployment/details-v1
kubectl wait --for=condition=available --timeout=300s deployment/ratings-v1
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v1
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v2
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v3
# 创建Gateway
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
echo "\n=== 配置流量管理 ==="
# 创建DestinationRule
cat > destination-rules.yaml << 'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: productpage
spec:
host: productpage
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
- name: v3
labels:
version: v3
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: ratings
spec:
host: ratings
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
- name: v2-mysql
labels:
version: v2-mysql
- name: v2-mysql-vm
labels:
version: v2-mysql-vm
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: details
spec:
host: details
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
EOF
kubectl apply -f destination-rules.yaml
# 创建VirtualService进行流量分割
cat > virtual-services.yaml << 'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
http:
- match:
- headers:
end-user:
exact: jason
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
weight: 50
- destination:
host: reviews
subset: v3
weight: 50
EOF
kubectl apply -f virtual-services.yaml
echo "\n=== 配置安全策略 ==="
# 创建PeerAuthentication
cat > peer-authentication.yaml << 'EOF'
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT
EOF
kubectl apply -f peer-authentication.yaml
# 创建AuthorizationPolicy
cat > authorization-policy.yaml << 'EOF'
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: productpage-viewer
namespace: default
spec:
selector:
matchLabels:
app: productpage
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/bookinfo-productpage"]
- to:
- operation:
methods: ["GET"]
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: reviews-viewer
namespace: default
spec:
selector:
matchLabels:
app: reviews
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/bookinfo-reviews"]
- to:
- operation:
methods: ["GET"]
EOF
kubectl apply -f authorization-policy.yaml
echo "\n=== 访问信息 ==="
echo "获取Ingress Gateway地址:"
kubectl get svc istio-ingressgateway -n istio-system
echo "\n访问应用:"
echo "Bookinfo: http://<GATEWAY_IP>/productpage"
echo "Kiali: kubectl port-forward svc/kiali 20001:20001 -n istio-system"
echo "Grafana: kubectl port-forward svc/grafana 3000:3000 -n istio-system"
echo "Jaeger: kubectl port-forward svc/tracing 16686:80 -n istio-system"
echo "\n=== Istio服务网格部署完成 ==="
13.6.2 Linkerd轻量级服务网格
#!/bin/bash
# Linkerd轻量级服务网格部署脚本
echo "=== 部署Linkerd轻量级服务网格 ==="
# 下载Linkerd CLI
curl -sL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin
# 验证集群
linkerd check --pre
# 安装Linkerd控制平面
linkerd install | kubectl apply -f -
# 等待控制平面就绪
linkerd check
# 安装可视化组件
linkerd viz install | kubectl apply -f -
# 等待可视化组件就绪
linkerd check
echo "\n=== 部署示例应用 ==="
# 创建示例应用
cat > linkerd-demo.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
labels:
app: web
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: buoyantio/bb:v0.0.6
args:
- terminus
- "--h1-server-port=8080"
- "--grpc-server-port=9090"
ports:
- containerPort: 8080
- containerPort: 9090
env:
- name: TERMINUS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
---
apiVersion: v1
kind: Service
metadata:
name: web-svc
spec:
type: ClusterIP
selector:
app: web
ports:
- name: http
port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: authors
labels:
app: authors
spec:
replicas: 1
selector:
matchLabels:
app: authors
template:
metadata:
labels:
app: authors
spec:
containers:
- name: authors
image: buoyantio/bb:v0.0.6
args:
- terminus
- "--h1-server-port=7001"
- "--grpc-server-port=7002"
ports:
- containerPort: 7001
- containerPort: 7002
env:
- name: TERMINUS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
---
apiVersion: v1
kind: Service
metadata:
name: authors-svc
spec:
type: ClusterIP
selector:
app: authors
ports:
- name: http
port: 7001
targetPort: 7001
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: books
labels:
app: books
spec:
replicas: 1
selector:
matchLabels:
app: books
template:
metadata:
labels:
app: books
spec:
containers:
- name: books
image: buoyantio/bb:v0.0.6
args:
- terminus
- "--h1-server-port=7000"
- "--grpc-server-port=7002"
- "--fire-and-forget"
ports:
- containerPort: 7000
- containerPort: 7002
env:
- name: TERMINUS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
---
apiVersion: v1
kind: Service
metadata:
name: books-svc
spec:
type: ClusterIP
selector:
app: books
ports:
- name: http
port: 7000
targetPort: 7000
EOF
kubectl apply -f linkerd-demo.yaml
# 注入Linkerd代理
kubectl get deploy -o yaml | linkerd inject - | kubectl apply -f -
# 等待应用就绪
kubectl wait --for=condition=available --timeout=300s deployment/web
kubectl wait --for=condition=available --timeout=300s deployment/authors
kubectl wait --for=condition=available --timeout=300s deployment/books
echo "\n=== 配置流量策略 ==="
# 创建TrafficSplit
cat > traffic-split.yaml << 'EOF'
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: authors-split
spec:
service: authors-svc
backends:
- service: authors-svc
weight: 100
EOF
kubectl apply -f traffic-split.yaml
echo "\n=== 访问信息 ==="
echo "Linkerd Dashboard: linkerd viz dashboard"
echo "或者: kubectl port-forward svc/web-svc 8080:8080"
echo "\n查看服务网格状态:"
linkerd viz stat deploy
linkerd viz top deploy
linkerd viz routes deploy
echo "\n=== Linkerd轻量级服务网格部署完成 ==="
13.7 多集群管理
13.7.1 Rancher多集群管理平台
#!/bin/bash
# Rancher多集群管理平台部署脚本
echo "=== 部署Rancher多集群管理平台 ==="
# 创建命名空间
kubectl create namespace cattle-system
# 添加Rancher Helm仓库
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update
# 安装cert-manager
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.13.0/cert-manager.crds.yaml
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--version v1.13.0
# 等待cert-manager就绪
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager -n cert-manager
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager-cainjector -n cert-manager
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager-webhook -n cert-manager
# 安装Rancher
helm install rancher rancher-latest/rancher \
--namespace cattle-system \
--set hostname=rancher.local \
--set bootstrapPassword=admin123456 \
--set ingress.tls.source=rancher \
--set replicas=1
# 等待Rancher就绪
kubectl wait --for=condition=available --timeout=600s deployment/rancher -n cattle-system
echo "\n=== 配置Ingress ==="
# 创建Ingress
cat > rancher-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rancher
namespace: cattle-system
annotations:
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
spec:
tls:
- hosts:
- rancher.local
secretName: tls-rancher-ingress
rules:
- host: rancher.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rancher
port:
number: 80
EOF
kubectl apply -f rancher-ingress.yaml
echo "\n=== 集群管理脚本 ==="
# 创建集群管理脚本
cat > cluster-management.sh << 'EOF'
#!/bin/bash
# Rancher集群管理脚本
# 获取Rancher API Token
get_rancher_token() {
local username="admin"
local password="admin123456"
local rancher_url="https://rancher.local"
# 登录获取token
local login_response=$(curl -s -k -X POST \
"${rancher_url}/v3-public/localProviders/local?action=login" \
-H 'content-type: application/json' \
-d '{"username":"'${username}'","password":"'${password}'"}')
local token=$(echo $login_response | jq -r .token)
echo $token
}
# 列出所有集群
list_clusters() {
local token=$(get_rancher_token)
local rancher_url="https://rancher.local"
curl -s -k -H "Authorization: Bearer ${token}" \
"${rancher_url}/v3/clusters" | jq -r '.data[] | "\(.id): \(.name) (\(.state))"'
}
# 创建集群
create_cluster() {
local cluster_name=$1
local token=$(get_rancher_token)
local rancher_url="https://rancher.local"
curl -s -k -X POST \
-H "Authorization: Bearer ${token}" \
-H "Content-Type: application/json" \
"${rancher_url}/v3/clusters" \
-d '{
"type": "cluster",
"name": "'${cluster_name}'",
"description": "Cluster created via API",
"rancherKubernetesEngineConfig": {
"kubernetesVersion": "v1.28.2-rancher1-1",
"ignoreDockerVersion": false
}
}'
}
# 删除集群
delete_cluster() {
local cluster_id=$1
local token=$(get_rancher_token)
local rancher_url="https://rancher.local"
curl -s -k -X DELETE \
-H "Authorization: Bearer ${token}" \
"${rancher_url}/v3/clusters/${cluster_id}"
}
# 获取集群状态
get_cluster_status() {
local cluster_id=$1
local token=$(get_rancher_token)
local rancher_url="https://rancher.local"
curl -s -k -H "Authorization: Bearer ${token}" \
"${rancher_url}/v3/clusters/${cluster_id}" | jq -r '.state'
}
# 主函数
case "$1" in
list)
echo "=== 集群列表 ==="
list_clusters
;;
create)
if [ -z "$2" ]; then
echo "用法: $0 create <cluster_name>"
exit 1
fi
echo "=== 创建集群: $2 ==="
create_cluster "$2"
;;
delete)
if [ -z "$2" ]; then
echo "用法: $0 delete <cluster_id>"
exit 1
fi
echo "=== 删除集群: $2 ==="
delete_cluster "$2"
;;
status)
if [ -z "$2" ]; then
echo "用法: $0 status <cluster_id>"
exit 1
fi
echo "=== 集群状态: $2 ==="
get_cluster_status "$2"
;;
*)
echo "用法: $0 {list|create|delete|status} [参数]"
echo " list - 列出所有集群"
echo " create <cluster_name> - 创建新集群"
echo " delete <cluster_id> - 删除集群"
echo " status <cluster_id> - 获取集群状态"
exit 1
;;
esac
EOF
chmod +x cluster-management.sh
echo "\n=== 访问信息 ==="
echo "Rancher URL: https://rancher.local"
echo "用户名: admin"
echo "密码: admin123456"
echo "\n请将 rancher.local 添加到 /etc/hosts 文件中"
echo "集群管理: ./cluster-management.sh list"
echo "\n=== Rancher多集群管理平台部署完成 ==="
13.7.2 Admiral多集群服务网格
#!/bin/bash
# Admiral多集群服务网格部署脚本
echo "=== 部署Admiral多集群服务网格 ==="
# 创建命名空间
kubectl create namespace admiral-system
# 添加Admiral Helm仓库
helm repo add admiral https://istio-ecosystem.github.io/admiral
helm repo update
# 安装Admiral
helm install admiral admiral/admiral \
--namespace admiral-system \
--set admiral.image.tag=v1.7.0 \
--set admiral.config.argoRollouts.enabled=true \
--set admiral.config.profile=default
# 等待Admiral就绪
kubectl wait --for=condition=available --timeout=300s deployment/admiral -n admiral-system
echo "\n=== 配置多集群 ==="
# 创建集群配置
cat > cluster-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: admiral-config
namespace: admiral-system
data:
config.yaml: |
clusters:
cluster1:
endpoint: https://cluster1.example.com
locality: region1/zone1
network: network1
secret: cluster1-secret
cluster2:
endpoint: https://cluster2.example.com
locality: region2/zone1
network: network2
secret: cluster2-secret
syncNamespace: admiral-sync
cacheRefreshDuration: 5m
clusterRegistriesNamespace: admiral-system
dependencyNamespace: admiral-system
globalTrafficPolicy:
policy:
- dns: greeting.global
match:
- sourceCluster: cluster1
- sourceCluster: cluster2
target:
- region: region1
weight: 50
- region: region2
weight: 50
EOF
kubectl apply -f cluster-config.yaml
# 创建服务依赖配置
cat > service-dependency.yaml << 'EOF'
apiVersion: admiral.io/v1alpha1
kind: Dependency
metadata:
name: greeting-dependency
namespace: admiral-system
spec:
source: greeting
destinations:
- greeting
- user-service
- notification-service
EOF
kubectl apply -f service-dependency.yaml
# 创建全局流量策略
cat > global-traffic-policy.yaml << 'EOF'
apiVersion: admiral.io/v1alpha1
kind: GlobalTrafficPolicy
metadata:
name: greeting-gtp
namespace: admiral-system
spec:
policy:
- dns: greeting.global
match:
- sourceCluster: cluster1
- sourceCluster: cluster2
target:
- region: region1
weight: 70
- region: region2
weight: 30
outlierDetection:
consecutiveErrors: 3
interval: 30s
baseEjectionTime: 30s
EOF
kubectl apply -f global-traffic-policy.yaml
echo "\n=== 多集群监控脚本 ==="
# 创建多集群监控脚本
cat > multi-cluster-monitor.sh << 'EOF'
#!/bin/bash
# Admiral多集群监控脚本
# 检查Admiral状态
check_admiral_status() {
echo "=== Admiral组件状态 ==="
kubectl get pods -n admiral-system
kubectl get svc -n admiral-system
echo "\n=== Admiral配置 ==="
kubectl get configmap admiral-config -n admiral-system -o yaml
}
# 检查服务依赖
check_dependencies() {
echo "=== 服务依赖 ==="
kubectl get dependencies -n admiral-system
kubectl describe dependencies -n admiral-system
}
# 检查全局流量策略
check_global_traffic_policies() {
echo "=== 全局流量策略 ==="
kubectl get globaltrafficpolicies -n admiral-system
kubectl describe globaltrafficpolicies -n admiral-system
}
# 检查跨集群服务发现
check_service_discovery() {
echo "=== 跨集群服务发现 ==="
kubectl get serviceentries -A
kubectl get destinationrules -A
kubectl get virtualservices -A
}
# 检查网络连通性
check_network_connectivity() {
echo "=== 网络连通性检查 ==="
# 检查Istio网关
kubectl get gateways -A
# 检查服务端点
kubectl get endpoints -A | grep -E "(greeting|user-service|notification-service)"
# 检查DNS解析
kubectl run test-dns --image=busybox --rm -it --restart=Never -- nslookup greeting.global
}
# 生成监控报告
generate_report() {
local report_file="admiral-report-$(date +%Y%m%d-%H%M%S).txt"
echo "=== Admiral多集群监控报告 ===" > $report_file
echo "生成时间: $(date)" >> $report_file
echo "" >> $report_file
echo "Admiral组件状态:" >> $report_file
kubectl get pods -n admiral-system >> $report_file 2>&1
echo "" >> $report_file
echo "服务依赖:" >> $report_file
kubectl get dependencies -n admiral-system >> $report_file 2>&1
echo "" >> $report_file
echo "全局流量策略:" >> $report_file
kubectl get globaltrafficpolicies -n admiral-system >> $report_file 2>&1
echo "" >> $report_file
echo "跨集群服务:" >> $report_file
kubectl get serviceentries -A >> $report_file 2>&1
echo "" >> $report_file
echo "报告已生成: $report_file"
}
# 主函数
case "$1" in
status)
check_admiral_status
;;
dependencies)
check_dependencies
;;
policies)
check_global_traffic_policies
;;
discovery)
check_service_discovery
;;
network)
check_network_connectivity
;;
report)
generate_report
;;
all)
check_admiral_status
check_dependencies
check_global_traffic_policies
check_service_discovery
check_network_connectivity
;;
*)
echo "用法: $0 {status|dependencies|policies|discovery|network|report|all}"
echo " status - 检查Admiral组件状态"
echo " dependencies - 检查服务依赖"
echo " policies - 检查全局流量策略"
echo " discovery - 检查跨集群服务发现"
echo " network - 检查网络连通性"
echo " report - 生成监控报告"
echo " all - 执行所有检查"
exit 1
;;
esac
EOF
chmod +x multi-cluster-monitor.sh
echo "\n=== 访问信息 ==="
echo "Admiral Dashboard: kubectl port-forward svc/admiral 8080:8080 -n admiral-system"
echo "多集群监控: ./multi-cluster-monitor.sh all"
echo "\n=== Admiral多集群服务网格部署完成 ==="
13.8 成本管理和优化
13.8.1 KubeCost成本分析
#!/bin/bash
# KubeCost成本分析部署脚本
echo "=== 部署KubeCost成本分析 ==="
# 创建命名空间
kubectl create namespace kubecost
# 添加KubeCost Helm仓库
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
# 安装KubeCost
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--set kubecostToken="your-kubecost-token" \
--set prometheus.server.persistentVolume.size=10Gi \
--set prometheus.alertmanager.persistentVolume.size=2Gi
# 等待KubeCost就绪
kubectl wait --for=condition=available --timeout=300s deployment/kubecost-cost-analyzer -n kubecost
kubectl wait --for=condition=available --timeout=300s deployment/kubecost-prometheus-server -n kubecost
echo "\n=== 配置成本分析 ==="
# 创建成本分析配置
cat > cost-analysis-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-analysis-config
namespace: kubecost
data:
config.yaml: |
# 云提供商配置
cloudProvider:
name: "aws" # aws, gcp, azure
region: "us-west-2"
# 定价配置
pricing:
cpu: 0.031611 # 每小时每核心价格
memory: 0.004237 # 每小时每GB价格
storage: 0.00014 # 每小时每GB价格
# 折扣配置
discounts:
cpu: 0.30 # CPU折扣30%
memory: 0.30 # 内存折扣30%
storage: 0.10 # 存储折扣10%
# 分配策略
allocation:
idleByNode: false
shareIdle: false
shareNamespaces: ["kube-system", "kubecost"]
EOF
kubectl apply -f cost-analysis-config.yaml
# 创建成本报告脚本
cat > cost-report.sh << 'EOF'
#!/bin/bash
# KubeCost成本报告脚本
KUBECOST_URL="http://localhost:9090"
# 获取成本数据
get_cost_data() {
local window="$1" # 时间窗口: 1d, 7d, 30d
local aggregate="$2" # 聚合维度: namespace, deployment, service
curl -s "${KUBECOST_URL}/model/allocation" \
-d "window=${window}" \
-d "aggregate=${aggregate}" \
-d "accumulate=false" \
-d "shareIdle=false"
}
# 生成命名空间成本报告
generate_namespace_report() {
local window="${1:-7d}"
echo "=== 命名空间成本报告 (${window}) ==="
echo "时间: $(date)"
echo ""
local data=$(get_cost_data "$window" "namespace")
echo "$data" | jq -r '
.data[] |
select(.totalCost > 0) |
"\(.name): $\(.totalCost | tonumber | . * 100 | round / 100) (CPU: $\(.cpuCost | tonumber | . * 100 | round / 100), Memory: $\(.ramCost | tonumber | . * 100 | round / 100), Storage: $\(.pvCost | tonumber | . * 100 | round / 100))"'
}
# 生成应用成本报告
generate_app_report() {
local window="${1:-7d}"
echo "=== 应用成本报告 (${window}) ==="
echo "时间: $(date)"
echo ""
local data=$(get_cost_data "$window" "deployment")
echo "$data" | jq -r '
.data[] |
select(.totalCost > 0) |
"\(.name): $\(.totalCost | tonumber | . * 100 | round / 100) (效率: \(.efficiency | tonumber | . * 100 | round)%)"'
}
# 生成成本优化建议
generate_optimization_suggestions() {
echo "=== 成本优化建议 ==="
echo "时间: $(date)"
echo ""
# 获取资源利用率数据
local utilization_data=$(curl -s "${KUBECOST_URL}/model/allocation" \
-d "window=7d" \
-d "aggregate=deployment" \
-d "accumulate=false")
echo "低效率应用 (CPU利用率 < 50%):"
echo "$utilization_data" | jq -r '
.data[] |
select(.cpuEfficiency < 0.5 and .totalCost > 1) |
"- \(.name): CPU效率 \(.cpuEfficiency | tonumber | . * 100 | round)%, 成本 $\(.totalCost | tonumber | . * 100 | round / 100)"'
echo ""
echo "内存过度分配应用 (内存利用率 < 30%):"
echo "$utilization_data" | jq -r '
.data[] |
select(.ramEfficiency < 0.3 and .totalCost > 1) |
"- \(.name): 内存效率 \(.ramEfficiency | tonumber | . * 100 | round)%, 成本 $\(.totalCost | tonumber | . * 100 | round / 100)"'
echo ""
echo "建议操作:"
echo "1. 调整低效率应用的资源请求和限制"
echo "2. 考虑使用HPA进行自动扩缩容"
echo "3. 评估是否可以合并小型应用"
echo "4. 使用Spot实例降低成本"
}
# 生成完整成本报告
generate_full_report() {
local window="${1:-7d}"
local report_file="cost-report-$(date +%Y%m%d-%H%M%S).txt"
{
generate_namespace_report "$window"
echo ""
generate_app_report "$window"
echo ""
generate_optimization_suggestions
} > "$report_file"
echo "完整成本报告已生成: $report_file"
}
# 启动端口转发
start_port_forward() {
echo "启动KubeCost端口转发..."
kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost &
local pf_pid=$!
echo "端口转发PID: $pf_pid"
sleep 5
return $pf_pid
}
# 停止端口转发
stop_port_forward() {
local pf_pid=$1
if [ ! -z "$pf_pid" ]; then
kill $pf_pid 2>/dev/null
echo "端口转发已停止"
fi
}
# 主函数
case "$1" in
namespace)
start_port_forward
pf_pid=$!
generate_namespace_report "${2:-7d}"
stop_port_forward $pf_pid
;;
app)
start_port_forward
pf_pid=$!
generate_app_report "${2:-7d}"
stop_port_forward $pf_pid
;;
optimize)
start_port_forward
pf_pid=$!
generate_optimization_suggestions
stop_port_forward $pf_pid
;;
report)
start_port_forward
pf_pid=$!
generate_full_report "${2:-7d}"
stop_port_forward $pf_pid
;;
*)
echo "用法: $0 {namespace|app|optimize|report} [时间窗口]"
echo " namespace [window] - 生成命名空间成本报告"
echo " app [window] - 生成应用成本报告"
echo " optimize - 生成成本优化建议"
echo " report [window] - 生成完整成本报告"
echo ""
echo "时间窗口选项: 1d, 7d, 30d (默认: 7d)"
exit 1
;;
esac
EOF
chmod +x cost-report.sh
echo "\n=== 访问信息 ==="
echo "KubeCost Dashboard: kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost"
echo "访问地址: http://localhost:9090"
echo "成本报告: ./cost-report.sh report"
echo "\n=== KubeCost成本分析部署完成 ==="
13.8.2 资源优化脚本
#!/bin/bash
# Kubernetes资源优化脚本
echo "=== Kubernetes资源优化分析 ==="
# 分析未使用的资源
analyze_unused_resources() {
echo "=== 未使用资源分析 ==="
echo "未使用的ConfigMaps:"
kubectl get configmaps --all-namespaces -o json | jq -r '
.items[] |
select(.metadata.name != "kube-root-ca.crt") |
"\(.metadata.namespace)/\(.metadata.name)"' |
while read cm; do
namespace=$(echo $cm | cut -d'/' -f1)
name=$(echo $cm | cut -d'/' -f2)
# 检查是否被Pod使用
used=$(kubectl get pods -n $namespace -o json | jq -r --arg cm "$name" '
.items[] |
select(
(.spec.volumes[]?.configMap.name == $cm) or
(.spec.containers[]?.env[]?.valueFrom.configMapKeyRef.name == $cm) or
(.spec.containers[]?.envFrom[]?.configMapRef.name == $cm)
) | .metadata.name')
if [ -z "$used" ]; then
echo " - $cm (未使用)"
fi
done
echo "\n未使用的Secrets:"
kubectl get secrets --all-namespaces -o json | jq -r '
.items[] |
select(.type != "kubernetes.io/service-account-token") |
select(.metadata.name | startswith("default-token-") | not) |
"\(.metadata.namespace)/\(.metadata.name)"' |
while read secret; do
namespace=$(echo $secret | cut -d'/' -f1)
name=$(echo $secret | cut -d'/' -f2)
# 检查是否被Pod使用
used=$(kubectl get pods -n $namespace -o json | jq -r --arg secret "$name" '
.items[] |
select(
(.spec.volumes[]?.secret.secretName == $secret) or
(.spec.containers[]?.env[]?.valueFrom.secretKeyRef.name == $secret) or
(.spec.containers[]?.envFrom[]?.secretRef.name == $secret) or
(.spec.imagePullSecrets[]?.name == $secret)
) | .metadata.name')
if [ -z "$used" ]; then
echo " - $secret (未使用)"
fi
done
echo "\n未使用的PersistentVolumes:"
kubectl get pv -o json | jq -r '
.items[] |
select(.status.phase == "Available") |
"\(.metadata.name) (\(.spec.capacity.storage))"'
}
# 分析资源请求和限制
analyze_resource_requests() {
echo "=== 资源请求和限制分析 ==="
echo "没有资源请求的Pod:"
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
select(
.spec.containers[] |
(.resources.requests.cpu // .resources.requests.memory) == null
) |
"\(.metadata.namespace)/\(.metadata.name)"'
echo "\n没有资源限制的Pod:"
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
select(
.spec.containers[] |
(.resources.limits.cpu // .resources.limits.memory) == null
) |
"\(.metadata.namespace)/\(.metadata.name)"'
echo "\n资源请求过高的Pod (请求 > 限制的80%):"
kubectl get pods --all-namespaces -o json | jq -r '
.items[] |
.spec.containers[] |
select(
(.resources.requests.cpu and .resources.limits.cpu) and
((.resources.requests.cpu | rtrimstr("m") | tonumber) >
(.resources.limits.cpu | rtrimstr("m") | tonumber) * 0.8)
) |
"\(.name): CPU请求 \(.resources.requests.cpu), 限制 \(.resources.limits.cpu)"'
}
# 分析节点资源利用率
analyze_node_utilization() {
echo "=== 节点资源利用率分析 ==="
kubectl top nodes 2>/dev/null || echo "需要安装metrics-server"
echo "\n节点容量和分配:"
kubectl describe nodes | grep -A 5 "Allocated resources" |
grep -E "(Name:|cpu|memory)" |
awk '/Name:/ {node=$2} /cpu/ {cpu=$2" "$3} /memory/ {mem=$2" "$3; print node": CPU "cpu", Memory "mem}'
}
# 生成优化建议
generate_optimization_recommendations() {
echo "=== 优化建议 ==="
echo "1. 资源清理建议:"
echo " - 删除未使用的ConfigMaps和Secrets"
echo " - 回收未使用的PersistentVolumes"
echo " - 清理已完成的Jobs和失败的Pods"
echo "\n2. 资源配置建议:"
echo " - 为所有Pod设置资源请求和限制"
echo " - 使用VPA (Vertical Pod Autoscaler) 自动调整资源"
echo " - 实施HPA (Horizontal Pod Autoscaler) 进行水平扩缩容"
echo "\n3. 成本优化建议:"
echo " - 使用Spot实例降低计算成本"
echo " - 实施集群自动扩缩容"
echo " - 优化镜像大小减少存储和传输成本"
echo "\n4. 性能优化建议:"
echo " - 使用节点亲和性优化Pod调度"
echo " - 实施资源配额防止资源争用"
echo " - 使用PodDisruptionBudgets确保高可用性"
}
# 生成清理脚本
generate_cleanup_script() {
echo "=== 生成资源清理脚本 ==="
cat > cleanup-resources.sh << 'EOF'
#!/bin/bash
# Kubernetes资源清理脚本
echo "=== 开始资源清理 ==="
# 清理已完成的Jobs
echo "清理已完成的Jobs..."
kubectl get jobs --all-namespaces --field-selector status.successful=1 -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
echo "删除Job: $namespace/$name"
kubectl delete job "$name" -n "$namespace"
done
# 清理失败的Pods
echo "\n清理失败的Pods..."
kubectl get pods --all-namespaces --field-selector status.phase=Failed -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
echo "删除Pod: $namespace/$name"
kubectl delete pod "$name" -n "$namespace"
done
# 清理已完成的Pods
echo "\n清理已完成的Pods..."
kubectl get pods --all-namespaces --field-selector status.phase=Succeeded -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
echo "删除Pod: $namespace/$name"
kubectl delete pod "$name" -n "$namespace"
done
# 清理Evicted Pods
echo "\n清理Evicted Pods..."
kubectl get pods --all-namespaces | grep Evicted | \
awk '{print $1" "$2}' | \
while read namespace name; do
echo "删除Evicted Pod: $namespace/$name"
kubectl delete pod "$name" -n "$namespace"
done
echo "\n=== 资源清理完成 ==="
EOF
chmod +x cleanup-resources.sh
echo "清理脚本已生成: cleanup-resources.sh"
}
# 主函数
case "$1" in
unused)
analyze_unused_resources
;;
requests)
analyze_resource_requests
;;
nodes)
analyze_node_utilization
;;
recommendations)
generate_optimization_recommendations
;;
cleanup)
generate_cleanup_script
;;
all)
analyze_unused_resources
echo ""
analyze_resource_requests
echo ""
analyze_node_utilization
echo ""
generate_optimization_recommendations
echo ""
generate_cleanup_script
;;
*)
echo "用法: $0 {unused|requests|nodes|recommendations|cleanup|all}"
echo " unused - 分析未使用的资源"
echo " requests - 分析资源请求和限制"
echo " nodes - 分析节点资源利用率"
echo " recommendations - 生成优化建议"
echo " cleanup - 生成清理脚本"
echo " all - 执行所有分析"
exit 1
;;
esac
13.9 生态系统最佳实践
13.9.1 工具选择指南
# 工具选择决策矩阵
apiVersion: v1
kind: ConfigMap
metadata:
name: tool-selection-guide
namespace: kube-system
data:
selection-matrix.yaml: |
# Kubernetes生态系统工具选择指南
# 包管理工具
package_management:
helm:
use_cases: ["复杂应用部署", "模板化配置", "版本管理"]
pros: ["成熟生态", "丰富Chart库", "版本回滚"]
cons: ["学习曲线", "模板复杂性"]
recommendation: "推荐用于生产环境"
kustomize:
use_cases: ["配置管理", "环境差异化", "GitOps"]
pros: ["原生支持", "声明式", "无模板"]
cons: ["功能相对简单", "复杂场景限制"]
recommendation: "推荐用于简单到中等复杂度场景"
# CI/CD工具
cicd:
tekton:
use_cases: ["云原生CI/CD", "Kubernetes集成", "事件驱动"]
pros: ["Kubernetes原生", "可扩展性", "标准化"]
cons: ["相对新颖", "学习成本"]
recommendation: "推荐用于Kubernetes环境"
argocd:
use_cases: ["GitOps", "持续部署", "多集群管理"]
pros: ["GitOps最佳实践", "可视化界面", "多集群支持"]
cons: ["主要专注CD", "Git依赖"]
recommendation: "推荐用于GitOps实践"
jenkins:
use_cases: ["传统CI/CD", "复杂流水线", "插件生态"]
pros: ["成熟稳定", "丰富插件", "灵活配置"]
cons: ["资源消耗", "维护复杂"]
recommendation: "适用于传统环境迁移"
# 监控工具
monitoring:
prometheus:
use_cases: ["指标监控", "告警", "时序数据"]
pros: ["行业标准", "丰富生态", "高性能"]
cons: ["存储限制", "配置复杂"]
recommendation: "监控系统首选"
grafana:
use_cases: ["可视化", "仪表盘", "多数据源"]
pros: ["强大可视化", "多数据源", "丰富模板"]
cons: ["主要用于展示"]
recommendation: "可视化首选"
jaeger:
use_cases: ["分布式追踪", "性能分析", "调用链"]
pros: ["OpenTracing标准", "详细追踪", "性能分析"]
cons: ["存储开销", "复杂度"]
recommendation: "微服务追踪首选"
# 安全工具
security:
falco:
use_cases: ["运行时安全", "异常检测", "合规监控"]
pros: ["实时监控", "规则灵活", "CNCF项目"]
cons: ["性能影响", "规则复杂"]
recommendation: "运行时安全首选"
opa_gatekeeper:
use_cases: ["策略管理", "准入控制", "合规检查"]
pros: ["策略即代码", "灵活规则", "标准化"]
cons: ["学习曲线", "调试困难"]
recommendation: "策略管理首选"
# 服务网格
service_mesh:
istio:
use_cases: ["复杂微服务", "高级流量管理", "安全策略"]
pros: ["功能丰富", "成熟稳定", "强大生态"]
cons: ["复杂度高", "资源消耗"]
recommendation: "复杂微服务环境首选"
linkerd:
use_cases: ["轻量级服务网格", "简单场景", "性能优先"]
pros: ["轻量级", "易用性", "性能好"]
cons: ["功能相对简单"]
recommendation: "简单微服务环境首选"
# 多集群管理
multi_cluster:
rancher:
use_cases: ["多集群管理", "统一界面", "企业级"]
pros: ["统一管理", "用户友好", "企业功能"]
cons: ["额外复杂性", "供应商锁定"]
recommendation: "企业多集群环境首选"
admiral:
use_cases: ["多集群服务网格", "跨集群通信", "流量管理"]
pros: ["服务网格集成", "跨集群服务发现"]
cons: ["相对新颖", "Istio依赖"]
recommendation: "多集群服务网格场景"
13.9.2 集成最佳实践
#!/bin/bash
# Kubernetes生态系统集成最佳实践脚本
echo "=== Kubernetes生态系统集成最佳实践 ==="
# 创建最佳实践检查清单
create_best_practices_checklist() {
cat > best-practices-checklist.md << 'EOF'
# Kubernetes生态系统集成最佳实践检查清单
## 1. 工具选择和规划
### 1.1 需求分析
- [ ] 明确业务需求和技术要求
- [ ] 评估团队技能和学习成本
- [ ] 考虑现有基础设施和工具
- [ ] 制定工具演进路线图
### 1.2 工具评估
- [ ] 对比多个候选工具
- [ ] 进行POC验证
- [ ] 评估社区活跃度和支持
- [ ] 考虑长期维护成本
## 2. 架构设计
### 2.1 整体架构
- [ ] 设计清晰的架构图
- [ ] 定义组件间的接口
- [ ] 考虑扩展性和可维护性
- [ ] 规划数据流和控制流
### 2.2 安全设计
- [ ] 实施最小权限原则
- [ ] 配置网络隔离
- [ ] 启用审计日志
- [ ] 实施密钥管理
## 3. 部署和配置
### 3.1 环境管理
- [ ] 使用Infrastructure as Code
- [ ] 实施环境一致性
- [ ] 配置环境隔离
- [ ] 建立环境升级流程
### 3.2 配置管理
- [ ] 使用ConfigMaps和Secrets
- [ ] 实施配置版本控制
- [ ] 配置环境差异化
- [ ] 建立配置审核流程
## 4. 监控和可观测性
### 4.1 监控策略
- [ ] 定义关键指标
- [ ] 设置合理告警
- [ ] 实施分层监控
- [ ] 建立监控仪表盘
### 4.2 日志管理
- [ ] 统一日志格式
- [ ] 集中日志收集
- [ ] 实施日志分析
- [ ] 配置日志保留策略
### 4.3 追踪和调试
- [ ] 实施分布式追踪
- [ ] 配置性能监控
- [ ] 建立调试工具链
- [ ] 实施错误追踪
## 5. 安全和合规
### 5.1 安全策略
- [ ] 实施Pod安全策略
- [ ] 配置网络策略
- [ ] 启用RBAC
- [ ] 实施镜像安全扫描
### 5.2 合规管理
- [ ] 定义合规要求
- [ ] 实施策略即代码
- [ ] 配置合规检查
- [ ] 建立审计流程
## 6. 运维和维护
### 6.1 自动化运维
- [ ] 实施GitOps
- [ ] 配置自动化部署
- [ ] 建立自动化测试
- [ ] 实施自动化回滚
### 6.2 容量管理
- [ ] 监控资源使用
- [ ] 实施自动扩缩容
- [ ] 配置资源配额
- [ ] 建立容量规划
### 6.3 故障处理
- [ ] 建立故障响应流程
- [ ] 配置自动故障恢复
- [ ] 实施混沌工程
- [ ] 建立事后分析机制
## 7. 团队和流程
### 7.1 团队建设
- [ ] 培训团队技能
- [ ] 建立知识分享
- [ ] 定义角色职责
- [ ] 实施轮岗机制
### 7.2 流程优化
- [ ] 建立开发流程
- [ ] 实施代码审查
- [ ] 配置质量门禁
- [ ] 建立发布流程
## 8. 成本优化
### 8.1 成本监控
- [ ] 实施成本分析
- [ ] 配置成本告警
- [ ] 建立成本报告
- [ ] 实施成本归因
### 8.2 资源优化
- [ ] 优化资源配置
- [ ] 实施资源回收
- [ ] 使用Spot实例
- [ ] 配置集群自动扩缩容
EOF
echo "最佳实践检查清单已创建: best-practices-checklist.md"
}
# 创建集成验证脚本
create_integration_validation() {
cat > validate-integration.sh << 'EOF'
#!/bin/bash
# Kubernetes生态系统集成验证脚本
echo "=== 开始集成验证 ==="
# 验证基础组件
validate_basic_components() {
echo "=== 验证基础组件 ==="
# 检查Kubernetes集群
echo "检查Kubernetes集群状态:"
kubectl cluster-info
kubectl get nodes
# 检查系统Pod
echo "\n检查系统Pod状态:"
kubectl get pods -n kube-system
# 检查存储类
echo "\n检查存储类:"
kubectl get storageclass
}
# 验证监控组件
validate_monitoring() {
echo "=== 验证监控组件 ==="
# 检查Prometheus
if kubectl get namespace monitoring &>/dev/null; then
echo "检查Prometheus:"
kubectl get pods -n monitoring | grep prometheus
echo "\n检查Grafana:"
kubectl get pods -n monitoring | grep grafana
echo "\n检查AlertManager:"
kubectl get pods -n monitoring | grep alertmanager
else
echo "监控命名空间不存在"
fi
}
# 验证日志组件
validate_logging() {
echo "=== 验证日志组件 ==="
# 检查日志收集
if kubectl get namespace logging &>/dev/null; then
echo "检查日志收集组件:"
kubectl get pods -n logging
else
echo "日志命名空间不存在"
fi
}
# 验证安全组件
validate_security() {
echo "=== 验证安全组件 ==="
# 检查RBAC
echo "检查RBAC配置:"
kubectl get clusterroles | head -10
kubectl get clusterrolebindings | head -10
# 检查网络策略
echo "\n检查网络策略:"
kubectl get networkpolicies --all-namespaces
# 检查Pod安全策略
echo "\n检查Pod安全策略:"
kubectl get podsecuritypolicies 2>/dev/null || echo "PSP未启用"
}
# 验证CI/CD组件
validate_cicd() {
echo "=== 验证CI/CD组件 ==="
# 检查ArgoCD
if kubectl get namespace argocd &>/dev/null; then
echo "检查ArgoCD:"
kubectl get pods -n argocd
fi
# 检查Tekton
if kubectl get namespace tekton-pipelines &>/dev/null; then
echo "\n检查Tekton:"
kubectl get pods -n tekton-pipelines
fi
}
# 验证服务网格
validate_service_mesh() {
echo "=== 验证服务网格 ==="
# 检查Istio
if kubectl get namespace istio-system &>/dev/null; then
echo "检查Istio:"
kubectl get pods -n istio-system
echo "\n检查Istio配置:"
kubectl get gateways --all-namespaces
kubectl get virtualservices --all-namespaces
fi
# 检查Linkerd
if kubectl get namespace linkerd &>/dev/null; then
echo "\n检查Linkerd:"
kubectl get pods -n linkerd
fi
}
# 生成验证报告
generate_validation_report() {
local report_file="integration-validation-$(date +%Y%m%d-%H%M%S).txt"
{
echo "=== Kubernetes生态系统集成验证报告 ==="
echo "生成时间: $(date)"
echo ""
validate_basic_components
echo ""
validate_monitoring
echo ""
validate_logging
echo ""
validate_security
echo ""
validate_cicd
echo ""
validate_service_mesh
} > "$report_file"
echo "验证报告已生成: $report_file"
}
# 主函数
case "$1" in
basic)
validate_basic_components
;;
monitoring)
validate_monitoring
;;
logging)
validate_logging
;;
security)
validate_security
;;
cicd)
validate_cicd
;;
mesh)
validate_service_mesh
;;
report)
generate_validation_report
;;
all)
validate_basic_components
validate_monitoring
validate_logging
validate_security
validate_cicd
validate_service_mesh
;;
*)
echo "用法: $0 {basic|monitoring|logging|security|cicd|mesh|report|all}"
exit 1
;;
esac
EOF
chmod +x validate-integration.sh
echo "集成验证脚本已创建: validate-integration.sh"
}
# 主函数
case "$1" in
checklist)
create_best_practices_checklist
;;
validation)
create_integration_validation
;;
all)
create_best_practices_checklist
create_integration_validation
;;
*)
echo "用法: $0 {checklist|validation|all}"
echo " checklist - 创建最佳实践检查清单"
echo " validation - 创建集成验证脚本"
echo " all - 创建所有文档和脚本"
exit 1
;;
esac
13.10 总结
Kubernetes生态系统是一个庞大而丰富的技术栈,涵盖了从应用开发到生产运维的各个环节。通过本章的学习,我们深入了解了:
13.10.1 核心工具类别
- 包管理工具:Helm和Kustomize为应用部署和配置管理提供了强大支持
- CI/CD工具:Tekton和ArgoCD实现了云原生的持续集成和部署
- 监控可观测性:Prometheus、Grafana、Jaeger构建了完整的监控体系
- 安全工具:Falco和OPA Gatekeeper提供了运行时安全和策略管理
- 服务网格:Istio和Linkerd为微服务提供了流量管理和安全保障
- 多集群管理:Rancher和Admiral支持大规模集群管理和跨集群服务
- 成本管理:KubeCost等工具帮助优化资源使用和成本控制
13.10.2 选择和集成原则
- 需求驱动:根据实际业务需求选择合适的工具
- 渐进式采用:从简单工具开始,逐步引入复杂功能
- 标准化优先:选择符合CNCF标准的工具
- 社区活跃:优先选择社区活跃、文档完善的项目
- 集成友好:考虑工具间的集成复杂度和兼容性
13.10.3 最佳实践要点
- 统一管理:使用GitOps实现配置和部署的版本控制
- 安全第一:在每个环节都要考虑安全性
- 可观测性:建立完整的监控、日志和追踪体系
- 自动化:尽可能自动化运维操作
- 成本意识:持续监控和优化资源使用
13.10.4 发展趋势
- 云原生化:更多工具原生支持Kubernetes
- AI/ML集成:智能化运维和自动优化
- 边缘计算:支持边缘和混合云场景
- 安全增强:零信任和供应链安全
- 可持续发展:绿色计算和碳中和
Kubernetes生态系统将继续快速发展,掌握这些核心工具和最佳实践,将帮助我们构建更加稳定、安全、高效的云原生应用平台。
下一章预告:第14章将学习Kubernetes的未来发展趋势和新兴技术,包括边缘计算、AI/ML工作负载、WebAssembly集成等前沿话题。