13.1 概述

Kubernetes生态系统是一个庞大而活跃的社区,围绕着容器编排平台构建了丰富的工具链和解决方案。本章将介绍Kubernetes生态系统中的重要组件和工具,帮助您了解如何利用这些工具提升开发和运维效率。

13.1.1 生态系统架构

┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes生态系统                            │
├─────────────────────────────────────────────────────────────────┤
│  开发工具                │  部署工具              │  运维工具      │
│  ├─ Skaffold            │  ├─ Helm               │  ├─ Prometheus │
│  ├─ Tilt                │  ├─ Kustomize          │  ├─ Grafana    │
│  ├─ Draft               │  ├─ ArgoCD             │  ├─ Jaeger     │
│  └─ DevSpace            │  ├─ Flux               │  └─ Fluentd    │
├─────────────────────────────────────────────────────────────────┤
│  安全工具                │  网络工具              │  存储工具      │
│  ├─ Falco               │  ├─ Istio              │  ├─ Rook       │
│  ├─ OPA Gatekeeper      │  ├─ Linkerd            │  ├─ Longhorn   │
│  ├─ Twistlock           │  ├─ Cilium             │  ├─ OpenEBS    │
│  └─ Aqua Security       │  └─ Calico             │  └─ Portworx   │
├─────────────────────────────────────────────────────────────────┤
│  CI/CD工具              │  多集群管理            │  成本管理      │
│  ├─ Tekton              │  ├─ Rancher            │  ├─ KubeCost   │
│  ├─ Jenkins X           │  ├─ Cluster API        │  ├─ Goldilocks │
│  ├─ GitLab CI           │  ├─ Admiral            │  └─ Fairwinds  │
│  └─ GitHub Actions      │  └─ Loft               │                │
└─────────────────────────────────────────────────────────────────┘

13.1.2 工具分类

  1. 开发工具: 简化应用开发和调试流程
  2. 部署工具: 自动化应用部署和配置管理
  3. 运维工具: 监控、日志、追踪和故障排查
  4. 安全工具: 安全扫描、策略执行和合规检查
  5. 网络工具: 服务网格、网络策略和流量管理
  6. 存储工具: 持久化存储和数据管理
  7. CI/CD工具: 持续集成和持续部署
  8. 多集群管理: 跨集群部署和管理
  9. 成本管理: 资源使用分析和成本优化

13.2 包管理工具

13.2.1 Helm

Helm是Kubernetes的包管理器,类似于Linux的apt或yum。

基本概念

# Helm基本概念
echo "=== Helm基本概念 ==="

# Chart: Helm包,包含运行应用所需的所有资源定义
# Release: Chart的运行实例
# Repository: Chart仓库
# Values: 配置参数

# 安装Helm
curl https://get.helm.sh/helm-v3.12.0-windows-amd64.zip -o helm.zip
Unzip helm.zip
mv windows-amd64/helm.exe /usr/local/bin/helm

# 验证安装
helm version

Chart开发

#!/bin/bash
# Helm Chart开发脚本

echo "=== 创建Helm Chart ==="

# 创建新的Chart
helm create myapp

echo "Chart结构:"
tree myapp/

# Chart目录结构
# myapp/
# ├── Chart.yaml          # Chart元数据
# ├── values.yaml         # 默认配置值
# ├── charts/             # 依赖Chart
# ├── templates/          # 模板文件
# │   ├── deployment.yaml
# │   ├── service.yaml
# │   ├── ingress.yaml
# │   ├── _helpers.tpl    # 模板助手
# │   └── NOTES.txt       # 安装说明
# └── .helmignore         # 忽略文件

echo "\n=== 自定义Chart配置 ==="

# 修改Chart.yaml
cat > myapp/Chart.yaml << 'EOF'
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 0.1.0
appVersion: "1.0.0"
keywords:
  - web
  - application
home: https://example.com
sources:
  - https://github.com/example/myapp
maintainers:
  - name: Developer
    email: dev@example.com
dependencies:
  - name: redis
    version: "17.3.7"
    repository: "https://charts.bitnami.com/bitnami"
    condition: redis.enabled
EOF

# 修改values.yaml
cat > myapp/values.yaml << 'EOF'
# 默认配置值
replicaCount: 2

image:
  repository: nginx
  pullPolicy: IfNotPresent
  tag: "1.21"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  fsGroup: 2000

securityContext:
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: false
  className: ""
  annotations: {}
  hosts:
    - host: chart-example.local
      paths:
        - path: /
          pathType: Prefix
  tls: []

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 100
  targetCPUUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity: {}

# Redis配置
redis:
  enabled: true
  auth:
    enabled: false
EOF

# 创建自定义模板
cat > myapp/templates/configmap.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "myapp.fullname" . }}-config
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
data:
  app.properties: |
    server.port={{ .Values.service.port }}
    redis.enabled={{ .Values.redis.enabled }}
    {{- if .Values.redis.enabled }}
    redis.host={{ include "myapp.fullname" . }}-redis-master
    {{- end }}
EOF

# 更新deployment模板以使用ConfigMap
cat > myapp/templates/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "myapp.fullname" . }}
  labels:
    {{- include "myapp.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
        {{- include "myapp.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "myapp.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.port }}
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: http
          readinessProbe:
            httpGet:
              path: /
              port: http
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          volumeMounts:
            - name: config
              mountPath: /etc/config
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: {{ include "myapp.fullname" . }}-config
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
EOF

echo "\n=== Chart验证和测试 ==="

# 验证Chart语法
helm lint myapp/

# 渲染模板(不安装)
helm template myapp myapp/ --debug

# 模拟安装(dry-run)
helm install myapp-test myapp/ --dry-run --debug

# 打包Chart
helm package myapp/

echo "\n=== Chart部署 ==="

# 安装Chart
helm install myapp myapp/

# 查看Release
helm list

# 查看Release状态
helm status myapp

# 升级Release
helm upgrade myapp myapp/ --set replicaCount=3

# 回滚Release
helm rollback myapp 1

# 卸载Release
# helm uninstall myapp

echo "\n=== Chart仓库管理 ==="

# 添加官方仓库
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami

# 更新仓库索引
helm repo update

# 搜索Chart
helm search repo nginx

# 查看Chart信息
helm show chart bitnami/nginx
helm show values bitnami/nginx

# 安装第三方Chart
helm install my-nginx bitnami/nginx --set service.type=LoadBalancer

echo "\n=== Chart开发完成 ==="

13.2.2 Kustomize

Kustomize是Kubernetes原生的配置管理工具,通过声明式的方式管理配置。

#!/bin/bash
# Kustomize使用脚本

echo "=== Kustomize配置管理 ==="

# 创建基础配置
mkdir -p kustomize-demo/{base,overlays/{dev,staging,prod}}

# 基础配置
cat > kustomize-demo/base/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
EOF

cat > kustomize-demo/base/service.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP
EOF

cat > kustomize-demo/base/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- deployment.yaml
- service.yaml

commonLabels:
  version: v1.0.0
  managed-by: kustomize

commonAnnotations:
  description: "Base configuration for myapp"
EOF

# 开发环境配置
cat > kustomize-demo/overlays/dev/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: dev
namePrefix: dev-
nameSuffix: -v1

resources:
- ../../base

patchesStrategicMerge:
- deployment-patch.yaml
- service-patch.yaml

replicas:
- name: myapp
  count: 1

images:
- name: nginx
  newTag: 1.21-alpine

commonLabels:
  environment: dev

configMapGenerator:
- name: app-config
  literals:
  - ENV=development
  - DEBUG=true
  - LOG_LEVEL=debug
EOF

cat > kustomize-demo/overlays/dev/deployment-patch.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        env:
        - name: ENV
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: ENV
        - name: DEBUG
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: DEBUG
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 200m
            memory: 256Mi
EOF

cat > kustomize-demo/overlays/dev/service-patch.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  type: NodePort
EOF

# 生产环境配置
cat > kustomize-demo/overlays/prod/kustomization.yaml << 'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: prod
namePrefix: prod-

resources:
- ../../base
- ingress.yaml
- hpa.yaml

patchesStrategicMerge:
- deployment-patch.yaml

replicas:
- name: myapp
  count: 3

images:
- name: nginx
  newTag: 1.21

commonLabels:
  environment: prod

secretGenerator:
- name: app-secrets
  literals:
  - DATABASE_URL=postgresql://prod-db:5432/myapp
  - API_KEY=prod-api-key-12345

configMapGenerator:
- name: app-config
  literals:
  - ENV=production
  - DEBUG=false
  - LOG_LEVEL=info
EOF

cat > kustomize-demo/overlays/prod/deployment-patch.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        env:
        - name: ENV
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: ENV
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: DATABASE_URL
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
EOF

cat > kustomize-demo/overlays/prod/ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-service
            port:
              number: 80
EOF

cat > kustomize-demo/overlays/prod/hpa.yaml << 'EOF'
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
EOF

echo "\n=== Kustomize构建和部署 ==="

# 构建开发环境配置
echo "开发环境配置:"
kustomize build kustomize-demo/overlays/dev

# 构建生产环境配置
echo "\n生产环境配置:"
kustomize build kustomize-demo/overlays/prod

# 应用配置
echo "\n部署到开发环境:"
kustomize build kustomize-demo/overlays/dev | kubectl apply -f -

echo "\n部署到生产环境:"
kustomize build kustomize-demo/overlays/prod | kubectl apply -f -

# 验证部署
kubectl get all -n dev
kubectl get all -n prod

echo "\n=== Kustomize配置管理完成 ==="

13.3 CI/CD工具

13.3.1 Tekton

Tekton是Kubernetes原生的CI/CD框架。

# Tekton Pipeline示例
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: build-and-deploy
  namespace: tekton-pipelines
spec:
  description: |
    This pipeline clones a git repo, builds a Docker image with Kaniko and
    deploys it to Kubernetes
  params:
  - name: repo-url
    type: string
    description: The git repo URL to clone from.
  - name: image-reference
    type: string
    description: The image reference for the image to build
  - name: deployment-name
    type: string
    description: The name of the deployment to update
  - name: deployment-namespace
    type: string
    description: The namespace of the deployment to update
    default: default
  workspaces:
  - name: shared-data
    description: |
      This workspace contains the cloned repo files, so they can be read by the
      next task.
  - name: docker-credentials
    description: Docker registry credentials
  tasks:
  - name: fetch-source
    taskRef:
      name: git-clone
    workspaces:
    - name: output
      workspace: shared-data
    params:
    - name: url
      value: $(params.repo-url)
  - name: build-image
    runAfter: ["fetch-source"]
    taskRef:
      name: kaniko
    workspaces:
    - name: source
      workspace: shared-data
    - name: dockerconfig
      workspace: docker-credentials
    params:
    - name: IMAGE
      value: $(params.image-reference)
  - name: deploy
    runAfter: ["build-image"]
    taskRef:
      name: kubernetes-actions
    params:
    - name: script
      value: |
        kubectl set image deployment/$(params.deployment-name) \
          app=$(params.image-reference) \
          -n $(params.deployment-namespace)
        kubectl rollout status deployment/$(params.deployment-name) \
          -n $(params.deployment-namespace)

---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: kubernetes-actions
  namespace: tekton-pipelines
spec:
  description: |
    This task performs kubernetes actions like kubectl apply, delete, etc.
  params:
  - name: script
    description: The kubectl script to run
    type: string
  steps:
  - name: kubectl
    image: bitnami/kubectl:latest
    script: |
      #!/bin/bash
      set -e
      $(params.script)

---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: build-and-deploy-run
  namespace: tekton-pipelines
spec:
  pipelineRef:
    name: build-and-deploy
  workspaces:
  - name: shared-data
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  - name: docker-credentials
    secret:
      secretName: docker-credentials
  params:
  - name: repo-url
    value: https://github.com/example/myapp.git
  - name: image-reference
    value: docker.io/example/myapp:latest
  - name: deployment-name
    value: myapp
  - name: deployment-namespace
    value: default

13.3.2 ArgoCD

ArgoCD是声明式的GitOps持续部署工具。

#!/bin/bash
# ArgoCD安装和配置脚本

echo "=== ArgoCD安装 ==="

# 创建命名空间
kubectl create namespace argocd

# 安装ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 等待Pod就绪
kubectl wait --for=condition=available --timeout=300s deployment/argocd-server -n argocd

# 获取初始密码
echo "\n=== 获取ArgoCD初始密码 ==="
ARGO_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD初始密码: $ARGO_PASSWORD"

# 端口转发访问ArgoCD UI
echo "\n=== 访问ArgoCD UI ==="
echo "运行以下命令进行端口转发:"
echo "kubectl port-forward svc/argocd-server -n argocd 8080:443"
echo "然后访问: https://localhost:8080"
echo "用户名: admin"
echo "密码: $ARGO_PASSWORD"

# 安装ArgoCD CLI
echo "\n=== 安装ArgoCD CLI ==="
curl -sSL -o argocd-windows-amd64.exe https://github.com/argoproj/argo-cd/releases/latest/download/argocd-windows-amd64.exe
mv argocd-windows-amd64.exe /usr/local/bin/argocd
chmod +x /usr/local/bin/argocd

# 登录ArgoCD
echo "\n=== 登录ArgoCD ==="
argocd login localhost:8080 --username admin --password $ARGO_PASSWORD --insecure

# 创建应用配置
echo "\n=== 创建ArgoCD应用 ==="

cat > argocd-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/example/myapp-config.git
    targetRevision: HEAD
    path: k8s
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
EOF

# 应用配置
kubectl apply -f argocd-app.yaml

# 创建项目配置
cat > argocd-project.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: myproject
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  description: My Project
  sourceRepos:
  - 'https://github.com/example/*'
  destinations:
  - namespace: 'default'
    server: https://kubernetes.default.svc
  - namespace: 'staging'
    server: https://kubernetes.default.svc
  - namespace: 'prod'
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  - group: 'rbac.authorization.k8s.io'
    kind: ClusterRole
  - group: 'rbac.authorization.k8s.io'
    kind: ClusterRoleBinding
  namespaceResourceWhitelist:
  - group: ''
    kind: ConfigMap
  - group: ''
    kind: Secret
  - group: ''
    kind: Service
  - group: 'apps'
    kind: Deployment
  - group: 'apps'
    kind: ReplicaSet
  - group: 'networking.k8s.io'
    kind: Ingress
  roles:
  - name: developer
    description: Developer role
    policies:
    - p, proj:myproject:developer, applications, get, myproject/*, allow
    - p, proj:myproject:developer, applications, sync, myproject/*, allow
    groups:
    - myorg:developers
  - name: admin
    description: Admin role
    policies:
    - p, proj:myproject:admin, applications, *, myproject/*, allow
    - p, proj:myproject:admin, repositories, *, *, allow
    groups:
    - myorg:admins
EOF

kubectl apply -f argocd-project.yaml

echo "\n=== ArgoCD配置完成 ==="
echo "查看应用状态:"
argocd app list
argocd app get myapp

13.4 监控和可观测性工具

13.4.1 Prometheus生态系统

#!/bin/bash
# Prometheus生态系统部署脚本

echo "=== 部署Prometheus生态系统 ==="

# 添加Prometheus Helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 创建监控命名空间
kubectl create namespace monitoring

# 创建Prometheus配置
cat > prometheus-values.yaml << 'EOF'
# Prometheus配置
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    resources:
      requests:
        memory: 2Gi
        cpu: 1000m
      limits:
        memory: 4Gi
        cpu: 2000m
    additionalScrapeConfigs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

# Grafana配置
grafana:
  enabled: true
  adminPassword: admin123
  persistence:
    enabled: true
    size: 10Gi
  resources:
    requests:
      memory: 512Mi
      cpu: 250m
    limits:
      memory: 1Gi
      cpu: 500m
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
      - name: 'default'
        orgId: 1
        folder: ''
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
  dashboards:
    default:
      kubernetes-cluster:
        gnetId: 7249
        revision: 1
        datasource: Prometheus
      kubernetes-pods:
        gnetId: 6417
        revision: 1
        datasource: Prometheus
      node-exporter:
        gnetId: 1860
        revision: 27
        datasource: Prometheus

# AlertManager配置
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 512Mi
        cpu: 200m
  config:
    global:
      smtp_smarthost: 'smtp.gmail.com:587'
      smtp_from: 'alerts@example.com'
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
      email_configs:
      - to: 'admin@example.com'
        subject: 'Kubernetes Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
      webhook_configs:
      - url: 'http://webhook-service:9093/webhook'
        send_resolved: true

# Node Exporter配置
nodeExporter:
  enabled: true
  resources:
    requests:
      memory: 128Mi
      cpu: 100m
    limits:
      memory: 256Mi
      cpu: 200m

# kube-state-metrics配置
kubeStateMetrics:
  enabled: true
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 200m
EOF

# 安装Prometheus Stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

# 等待部署完成
kubectl wait --for=condition=available --timeout=300s deployment/prometheus-grafana -n monitoring

echo "\n=== 配置服务访问 ==="

# 创建Ingress
cat > monitoring-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: prometheus.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-prometheus
            port:
              number: 9090
  - host: grafana.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80
  - host: alertmanager.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-alertmanager
            port:
              number: 9093
EOF

kubectl apply -f monitoring-ingress.yaml

echo "\n=== 访问信息 ==="
echo "Prometheus: http://prometheus.local"
echo "Grafana: http://grafana.local (admin/admin123)"
echo "AlertManager: http://alertmanager.local"

echo "\n=== 端口转发访问 ==="
echo "Prometheus: kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring"
echo "Grafana: kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring"
echo "AlertManager: kubectl port-forward svc/prometheus-kube-prometheus-alertmanager 9093:9093 -n monitoring"

echo "\n=== Prometheus生态系统部署完成 ==="

13.4.2 Jaeger分布式追踪

#!/bin/bash
# Jaeger分布式追踪部署脚本

echo "=== 部署Jaeger分布式追踪 ==="

# 安装Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.41.0/jaeger-operator.yaml -n observability

# 等待Operator就绪
kubectl wait --for=condition=available --timeout=300s deployment/jaeger-operator -n observability

# 创建Jaeger实例
cat > jaeger-instance.yaml << 'EOF'
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: observability
spec:
  strategy: production
  storage:
    type: elasticsearch
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        requests:
          memory: 2Gi
          cpu: 500m
        limits:
          memory: 4Gi
          cpu: 1000m
      storage:
        storageClassName: standard
        size: 50Gi
  collector:
    replicas: 2
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
      limits:
        memory: 1Gi
        cpu: 500m
  query:
    replicas: 2
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 512Mi
        cpu: 200m
  agent:
    strategy: DaemonSet
    resources:
      requests:
        memory: 128Mi
        cpu: 100m
      limits:
        memory: 256Mi
        cpu: 200m
EOF

kubectl apply -f jaeger-instance.yaml

# 等待Jaeger部署完成
kubectl wait --for=condition=available --timeout=600s deployment/jaeger-query -n observability

# 创建示例应用
cat > jaeger-demo-app.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-demo
  namespace: observability
  labels:
    app: jaeger-demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: jaeger-demo
  template:
    metadata:
      labels:
        app: jaeger-demo
      annotations:
        sidecar.jaegertracing.io/inject: "true"
    spec:
      containers:
      - name: demo
        image: jaegertracing/example-hotrod:latest
        ports:
        - containerPort: 8080
        env:
        - name: JAEGER_AGENT_HOST
          value: "jaeger-agent"
        - name: JAEGER_AGENT_PORT
          value: "6831"
        resources:
          requests:
            memory: 128Mi
            cpu: 100m
          limits:
            memory: 256Mi
            cpu: 200m

---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-demo-service
  namespace: observability
spec:
  selector:
    app: jaeger-demo
  ports:
  - port: 8080
    targetPort: 8080
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: jaeger-ingress
  namespace: observability
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: jaeger.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: jaeger-query
            port:
              number: 16686
  - host: jaeger-demo.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: jaeger-demo-service
            port:
              number: 8080
EOF

kubectl apply -f jaeger-demo-app.yaml

echo "\n=== 访问信息 ==="
echo "Jaeger UI: http://jaeger.local"
echo "Demo App: http://jaeger-demo.local"

echo "\n=== 端口转发访问 ==="
echo "Jaeger UI: kubectl port-forward svc/jaeger-query 16686:16686 -n observability"
echo "Demo App: kubectl port-forward svc/jaeger-demo-service 8080:8080 -n observability"

echo "\n=== Jaeger分布式追踪部署完成 ==="

13.5 安全工具

13.5.1 Falco运行时安全监控

#!/bin/bash
# Falco运行时安全监控部署脚本

echo "=== 部署Falco运行时安全监控 ==="

# 添加Falco Helm仓库
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

# 创建安全命名空间
kubectl create namespace falco-system

# 创建Falco配置
cat > falco-values.yaml << 'EOF'
# Falco配置
falco:
  grpc:
    enabled: true
    bind_address: "0.0.0.0:5060"
    threadiness: 8
  grpc_output:
    enabled: true
  json_output: true
  json_include_output_property: true
  log_stderr: true
  log_syslog: false
  log_level: info
  priority: debug
  rules_file:
    - /etc/falco/falco_rules.yaml
    - /etc/falco/falco_rules.local.yaml
    - /etc/falco/k8s_audit_rules.yaml
    - /etc/falco/rules.d
  plugins:
    - name: k8saudit
      library_path: libk8saudit.so
      init_config:
        maxEventBytes: 1048576
        webhookMaxBatchSize: 12582912
      open_params: 'http://:9765/k8s-audit'
    - name: json
      library_path: libjson.so
      init_config: ""

# 自定义规则
customRules:
  custom_rules.yaml: |-
    - rule: Detect crypto miners
      desc: Detect cryptocurrency miners
      condition: >
        spawned_process and
        (proc.name in (xmrig, cpuminer, t-rex, gminer, nbminer, claymore) or
         proc.cmdline contains "stratum+tcp" or
         proc.cmdline contains "mining.pool")
      output: >
        Cryptocurrency miner detected (user=%user.name command=%proc.cmdline
        container=%container.name image=%container.image.repository)
      priority: CRITICAL
      tags: [cryptocurrency, mining, malware]
    
    - rule: Detect privilege escalation
      desc: Detect attempts to escalate privileges
      condition: >
        spawned_process and
        (proc.name in (sudo, su, doas) or
         proc.cmdline contains "chmod +s" or
         proc.cmdline contains "setuid")
      output: >
        Privilege escalation attempt (user=%user.name command=%proc.cmdline
        container=%container.name image=%container.image.repository)
      priority: WARNING
      tags: [privilege_escalation, security]
    
    - rule: Detect suspicious network activity
      desc: Detect suspicious network connections
      condition: >
        (inbound_outbound) and
        (fd.sport in (4444, 5555, 6666, 7777, 8888, 9999) or
         fd.dport in (4444, 5555, 6666, 7777, 8888, 9999))
      output: >
        Suspicious network activity (connection=%fd.name sport=%fd.sport dport=%fd.dport
        container=%container.name image=%container.image.repository)
      priority: WARNING
      tags: [network, suspicious]

# 资源配置
resources:
  requests:
    cpu: 100m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1024Mi

# 容忍度配置
tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane

# 节点选择器
nodeSelector: {}

# 服务配置
services:
  - name: grpc
    type: ClusterIP
    ports:
      - port: 5060
        targetPort: 5060
        protocol: TCP
        name: grpc
  - name: grpc-metrics
    type: ClusterIP
    ports:
      - port: 8765
        targetPort: 8765
        protocol: TCP
        name: metrics
EOF

# 安装Falco
helm install falco falcosecurity/falco \
  --namespace falco-system \
  --values falco-values.yaml

# 等待Falco就绪
kubectl wait --for=condition=ready --timeout=300s pod -l app.kubernetes.io/name=falco -n falco-system

# 创建Falco事件处理器
cat > falco-sidekick.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: falco-sidekick
  namespace: falco-system
  labels:
    app: falco-sidekick
spec:
  replicas: 1
  selector:
    matchLabels:
      app: falco-sidekick
  template:
    metadata:
      labels:
        app: falco-sidekick
    spec:
      containers:
      - name: falco-sidekick
        image: falcosecurity/falcosidekick:latest
        ports:
        - containerPort: 2801
        env:
        - name: WEBHOOK_URL
          value: "http://webhook-service:9093/falco"
        - name: SLACK_WEBHOOKURL
          value: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
        - name: SLACK_CHANNEL
          value: "#security-alerts"
        - name: ELASTICSEARCH_HOSTPORT
          value: "elasticsearch:9200"
        - name: ELASTICSEARCH_INDEX
          value: "falco"
        resources:
          requests:
            memory: 128Mi
            cpu: 100m
          limits:
            memory: 256Mi
            cpu: 200m

---
apiVersion: v1
kind: Service
metadata:
  name: falco-sidekick
  namespace: falco-system
spec:
  selector:
    app: falco-sidekick
  ports:
  - port: 2801
    targetPort: 2801
  type: ClusterIP
EOF

kubectl apply -f falco-sidekick.yaml

# 创建测试Pod来触发Falco规则
cat > falco-test.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: falco-test
  namespace: default
spec:
  containers:
  - name: test
    image: ubuntu:20.04
    command: ["/bin/bash"]
    args: ["-c", "while true; do sleep 30; done"]
    securityContext:
      privileged: true
  restartPolicy: Never
EOF

kubectl apply -f falco-test.yaml

echo "\n=== 测试Falco规则 ==="
echo "执行以下命令来触发Falco规则:"
echo "kubectl exec -it falco-test -- bash"
echo "然后在容器内执行:"
echo "  apt update && apt install -y netcat"
echo "  nc -l 4444 &"
echo "  chmod +s /bin/bash"

echo "\n=== 查看Falco日志 ==="
echo "kubectl logs -f daemonset/falco -n falco-system"

echo "\n=== Falco运行时安全监控部署完成 ==="

13.5.2 OPA Gatekeeper策略引擎

#!/bin/bash
# OPA Gatekeeper策略引擎部署脚本

echo "=== 部署OPA Gatekeeper策略引擎 ==="

# 安装Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.14/deploy/gatekeeper.yaml

# 等待Gatekeeper就绪
kubectl wait --for=condition=available --timeout=300s deployment/gatekeeper-controller-manager -n gatekeeper-system
kubectl wait --for=condition=available --timeout=300s deployment/gatekeeper-audit -n gatekeeper-system

echo "\n=== 创建约束模板 ==="

# 创建必须有标签的约束模板
cat > required-labels-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        type: object
        properties:
          labels:
            type: array
            items:
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        
        violation[{"msg": msg}] {
          required := input.parameters.labels
          provided := input.review.object.metadata.labels
          missing := required[_]
          not provided[missing]
          msg := sprintf("Missing required label: %v", [missing])
        }
EOF

kubectl apply -f required-labels-template.yaml

# 创建资源限制约束模板
cat > resource-limits-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8sresourcelimits
spec:
  crd:
    spec:
      names:
        kind: K8sResourceLimits
      validation:
        type: object
        properties:
          cpu:
            type: string
          memory:
            type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sresourcelimits
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.cpu
          msg := "Container must have CPU limits"
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.memory
          msg := "Container must have memory limits"
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          cpu_limit := container.resources.limits.cpu
          cpu_limit_num := units.parse_bytes(cpu_limit)
          max_cpu := units.parse_bytes(input.parameters.cpu)
          cpu_limit_num > max_cpu
          msg := sprintf("CPU limit %v exceeds maximum %v", [cpu_limit, input.parameters.cpu])
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          memory_limit := container.resources.limits.memory
          memory_limit_num := units.parse_bytes(memory_limit)
          max_memory := units.parse_bytes(input.parameters.memory)
          memory_limit_num > max_memory
          msg := sprintf("Memory limit %v exceeds maximum %v", [memory_limit, input.parameters.memory])
        }
EOF

kubectl apply -f resource-limits-template.yaml

# 创建禁止特权容器约束模板
cat > no-privileged-template.yaml << 'EOF'
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8snoprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sNoPrivileged
      validation:
        type: object
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8snoprivileged
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.privileged == true
          msg := "Privileged containers are not allowed"
        }
        
        violation[{"msg": msg}] {
          input.review.object.spec.securityContext.privileged == true
          msg := "Privileged pods are not allowed"
        }
EOF

kubectl apply -f no-privileged-template.yaml

echo "\n=== 创建约束实例 ==="

# 创建必须有标签的约束
cat > required-labels-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: must-have-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
  parameters:
    labels: ["app", "version", "environment"]
EOF

kubectl apply -f required-labels-constraint.yaml

# 创建资源限制约束
cat > resource-limits-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sResourceLimits
metadata:
  name: resource-limits
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
  parameters:
    cpu: "2000m"
    memory: "2Gi"
EOF

kubectl apply -f resource-limits-constraint.yaml

# 创建禁止特权容器约束
cat > no-privileged-constraint.yaml << 'EOF'
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sNoPrivileged
metadata:
  name: no-privileged
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system", "gatekeeper-system", "kube-public"]
EOF

kubectl apply -f no-privileged-constraint.yaml

echo "\n=== 测试策略 ==="

# 创建违反策略的测试Pod
cat > test-violation.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: test-violation
  namespace: default
spec:
  containers:
  - name: test
    image: nginx:1.21
    securityContext:
      privileged: true
EOF

echo "尝试创建违反策略的Pod(应该被拒绝):"
kubectl apply -f test-violation.yaml || echo "Pod被策略拒绝(预期行为)"

# 创建符合策略的测试Pod
cat > test-compliant.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: test-compliant
  namespace: default
  labels:
    app: test
    version: v1.0.0
    environment: dev
spec:
  containers:
  - name: test
    image: nginx:1.21
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      privileged: false
      runAsNonRoot: true
      runAsUser: 1000
EOF

echo "创建符合策略的Pod:"
kubectl apply -f test-compliant.yaml

echo "\n=== 查看约束状态 ==="
kubectl get constraints
kubectl describe k8srequiredlabels must-have-labels

echo "\n=== OPA Gatekeeper策略引擎部署完成 ==="

13.6 服务网格

13.6.1 Istio服务网格

#!/bin/bash
# Istio服务网格部署脚本

echo "=== 部署Istio服务网格 ==="

# 下载Istio
ISTIO_VERSION=1.19.0
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=$ISTIO_VERSION sh -
cd istio-$ISTIO_VERSION
export PATH=$PWD/bin:$PATH

# 安装Istio
istioctl install --set values.defaultRevision=default -y

# 启用自动注入
kubectl label namespace default istio-injection=enabled

# 安装Istio插件
kubectl apply -f samples/addons/

# 等待组件就绪
kubectl wait --for=condition=available --timeout=300s deployment/istiod -n istio-system
kubectl wait --for=condition=available --timeout=300s deployment/kiali -n istio-system

echo "\n=== 部署示例应用 ==="

# 部署Bookinfo示例应用
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml

# 等待应用就绪
kubectl wait --for=condition=available --timeout=300s deployment/productpage-v1
kubectl wait --for=condition=available --timeout=300s deployment/details-v1
kubectl wait --for=condition=available --timeout=300s deployment/ratings-v1
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v1
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v2
kubectl wait --for=condition=available --timeout=300s deployment/reviews-v3

# 创建Gateway
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml

echo "\n=== 配置流量管理 ==="

# 创建DestinationRule
cat > destination-rules.yaml << 'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: productpage
spec:
  host: productpage
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: v1
    labels:
      version: v1

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: ratings
spec:
  host: ratings
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v2-mysql
    labels:
      version: v2-mysql
  - name: v2-mysql-vm
    labels:
      version: v2-mysql-vm

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: details
spec:
  host: details
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
EOF

kubectl apply -f destination-rules.yaml

# 创建VirtualService进行流量分割
cat > virtual-services.yaml << 'EOF'
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 50
    - destination:
        host: reviews
        subset: v3
      weight: 50
EOF

kubectl apply -f virtual-services.yaml

echo "\n=== 配置安全策略 ==="

# 创建PeerAuthentication
cat > peer-authentication.yaml << 'EOF'
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT
EOF

kubectl apply -f peer-authentication.yaml

# 创建AuthorizationPolicy
cat > authorization-policy.yaml << 'EOF'
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: productpage-viewer
  namespace: default
spec:
  selector:
    matchLabels:
      app: productpage
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/default/sa/bookinfo-productpage"]
  - to:
    - operation:
        methods: ["GET"]

---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: reviews-viewer
  namespace: default
spec:
  selector:
    matchLabels:
      app: reviews
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/default/sa/bookinfo-reviews"]
  - to:
    - operation:
        methods: ["GET"]
EOF

kubectl apply -f authorization-policy.yaml

echo "\n=== 访问信息 ==="
echo "获取Ingress Gateway地址:"
kubectl get svc istio-ingressgateway -n istio-system

echo "\n访问应用:"
echo "Bookinfo: http://<GATEWAY_IP>/productpage"
echo "Kiali: kubectl port-forward svc/kiali 20001:20001 -n istio-system"
echo "Grafana: kubectl port-forward svc/grafana 3000:3000 -n istio-system"
echo "Jaeger: kubectl port-forward svc/tracing 16686:80 -n istio-system"

echo "\n=== Istio服务网格部署完成 ==="

13.6.2 Linkerd轻量级服务网格

#!/bin/bash
# Linkerd轻量级服务网格部署脚本

echo "=== 部署Linkerd轻量级服务网格 ==="

# 下载Linkerd CLI
curl -sL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin

# 验证集群
linkerd check --pre

# 安装Linkerd控制平面
linkerd install | kubectl apply -f -

# 等待控制平面就绪
linkerd check

# 安装可视化组件
linkerd viz install | kubectl apply -f -

# 等待可视化组件就绪
linkerd check

echo "\n=== 部署示例应用 ==="

# 创建示例应用
cat > linkerd-demo.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: buoyantio/bb:v0.0.6
        args:
        - terminus
        - "--h1-server-port=8080"
        - "--grpc-server-port=9090"
        ports:
        - containerPort: 8080
        - containerPort: 9090
        env:
        - name: TERMINUS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

---
apiVersion: v1
kind: Service
metadata:
  name: web-svc
spec:
  type: ClusterIP
  selector:
    app: web
  ports:
  - name: http
    port: 8080
    targetPort: 8080

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: authors
  labels:
    app: authors
spec:
  replicas: 1
  selector:
    matchLabels:
      app: authors
  template:
    metadata:
      labels:
        app: authors
    spec:
      containers:
      - name: authors
        image: buoyantio/bb:v0.0.6
        args:
        - terminus
        - "--h1-server-port=7001"
        - "--grpc-server-port=7002"
        ports:
        - containerPort: 7001
        - containerPort: 7002
        env:
        - name: TERMINUS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

---
apiVersion: v1
kind: Service
metadata:
  name: authors-svc
spec:
  type: ClusterIP
  selector:
    app: authors
  ports:
  - name: http
    port: 7001
    targetPort: 7001

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: books
  labels:
    app: books
spec:
  replicas: 1
  selector:
    matchLabels:
      app: books
  template:
    metadata:
      labels:
        app: books
    spec:
      containers:
      - name: books
        image: buoyantio/bb:v0.0.6
        args:
        - terminus
        - "--h1-server-port=7000"
        - "--grpc-server-port=7002"
        - "--fire-and-forget"
        ports:
        - containerPort: 7000
        - containerPort: 7002
        env:
        - name: TERMINUS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

---
apiVersion: v1
kind: Service
metadata:
  name: books-svc
spec:
  type: ClusterIP
  selector:
    app: books
  ports:
  - name: http
    port: 7000
    targetPort: 7000
EOF

kubectl apply -f linkerd-demo.yaml

# 注入Linkerd代理
kubectl get deploy -o yaml | linkerd inject - | kubectl apply -f -

# 等待应用就绪
kubectl wait --for=condition=available --timeout=300s deployment/web
kubectl wait --for=condition=available --timeout=300s deployment/authors
kubectl wait --for=condition=available --timeout=300s deployment/books

echo "\n=== 配置流量策略 ==="

# 创建TrafficSplit
cat > traffic-split.yaml << 'EOF'
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
  name: authors-split
spec:
  service: authors-svc
  backends:
  - service: authors-svc
    weight: 100
EOF

kubectl apply -f traffic-split.yaml

echo "\n=== 访问信息 ==="
echo "Linkerd Dashboard: linkerd viz dashboard"
echo "或者: kubectl port-forward svc/web-svc 8080:8080"

echo "\n查看服务网格状态:"
linkerd viz stat deploy
linkerd viz top deploy
linkerd viz routes deploy

echo "\n=== Linkerd轻量级服务网格部署完成 ==="

13.7 多集群管理

13.7.1 Rancher多集群管理平台

#!/bin/bash
# Rancher多集群管理平台部署脚本

echo "=== 部署Rancher多集群管理平台 ==="

# 创建命名空间
kubectl create namespace cattle-system

# 添加Rancher Helm仓库
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update

# 安装cert-manager
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.13.0/cert-manager.crds.yaml
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v1.13.0

# 等待cert-manager就绪
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager -n cert-manager
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager-cainjector -n cert-manager
kubectl wait --for=condition=available --timeout=300s deployment/cert-manager-webhook -n cert-manager

# 安装Rancher
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.local \
  --set bootstrapPassword=admin123456 \
  --set ingress.tls.source=rancher \
  --set replicas=1

# 等待Rancher就绪
kubectl wait --for=condition=available --timeout=600s deployment/rancher -n cattle-system

echo "\n=== 配置Ingress ==="

# 创建Ingress
cat > rancher-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rancher
  namespace: cattle-system
  annotations:
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
spec:
  tls:
  - hosts:
    - rancher.local
    secretName: tls-rancher-ingress
  rules:
  - host: rancher.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rancher
            port:
              number: 80
EOF

kubectl apply -f rancher-ingress.yaml

echo "\n=== 集群管理脚本 ==="

# 创建集群管理脚本
cat > cluster-management.sh << 'EOF'
#!/bin/bash
# Rancher集群管理脚本

# 获取Rancher API Token
get_rancher_token() {
    local username="admin"
    local password="admin123456"
    local rancher_url="https://rancher.local"
    
    # 登录获取token
    local login_response=$(curl -s -k -X POST \
        "${rancher_url}/v3-public/localProviders/local?action=login" \
        -H 'content-type: application/json' \
        -d '{"username":"'${username}'","password":"'${password}'"}')
    
    local token=$(echo $login_response | jq -r .token)
    echo $token
}

# 列出所有集群
list_clusters() {
    local token=$(get_rancher_token)
    local rancher_url="https://rancher.local"
    
    curl -s -k -H "Authorization: Bearer ${token}" \
        "${rancher_url}/v3/clusters" | jq -r '.data[] | "\(.id): \(.name) (\(.state))"'
}

# 创建集群
create_cluster() {
    local cluster_name=$1
    local token=$(get_rancher_token)
    local rancher_url="https://rancher.local"
    
    curl -s -k -X POST \
        -H "Authorization: Bearer ${token}" \
        -H "Content-Type: application/json" \
        "${rancher_url}/v3/clusters" \
        -d '{
            "type": "cluster",
            "name": "'${cluster_name}'",
            "description": "Cluster created via API",
            "rancherKubernetesEngineConfig": {
                "kubernetesVersion": "v1.28.2-rancher1-1",
                "ignoreDockerVersion": false
            }
        }'
}

# 删除集群
delete_cluster() {
    local cluster_id=$1
    local token=$(get_rancher_token)
    local rancher_url="https://rancher.local"
    
    curl -s -k -X DELETE \
        -H "Authorization: Bearer ${token}" \
        "${rancher_url}/v3/clusters/${cluster_id}"
}

# 获取集群状态
get_cluster_status() {
    local cluster_id=$1
    local token=$(get_rancher_token)
    local rancher_url="https://rancher.local"
    
    curl -s -k -H "Authorization: Bearer ${token}" \
        "${rancher_url}/v3/clusters/${cluster_id}" | jq -r '.state'
}

# 主函数
case "$1" in
    list)
        echo "=== 集群列表 ==="
        list_clusters
        ;;
    create)
        if [ -z "$2" ]; then
            echo "用法: $0 create <cluster_name>"
            exit 1
        fi
        echo "=== 创建集群: $2 ==="
        create_cluster "$2"
        ;;
    delete)
        if [ -z "$2" ]; then
            echo "用法: $0 delete <cluster_id>"
            exit 1
        fi
        echo "=== 删除集群: $2 ==="
        delete_cluster "$2"
        ;;
    status)
        if [ -z "$2" ]; then
            echo "用法: $0 status <cluster_id>"
            exit 1
        fi
        echo "=== 集群状态: $2 ==="
        get_cluster_status "$2"
        ;;
    *)
        echo "用法: $0 {list|create|delete|status} [参数]"
        echo "  list                    - 列出所有集群"
        echo "  create <cluster_name>   - 创建新集群"
        echo "  delete <cluster_id>     - 删除集群"
        echo "  status <cluster_id>     - 获取集群状态"
        exit 1
        ;;
esac
EOF

chmod +x cluster-management.sh

echo "\n=== 访问信息 ==="
echo "Rancher URL: https://rancher.local"
echo "用户名: admin"
echo "密码: admin123456"
echo "\n请将 rancher.local 添加到 /etc/hosts 文件中"
echo "集群管理: ./cluster-management.sh list"

echo "\n=== Rancher多集群管理平台部署完成 ==="

13.7.2 Admiral多集群服务网格

#!/bin/bash
# Admiral多集群服务网格部署脚本

echo "=== 部署Admiral多集群服务网格 ==="

# 创建命名空间
kubectl create namespace admiral-system

# 添加Admiral Helm仓库
helm repo add admiral https://istio-ecosystem.github.io/admiral
helm repo update

# 安装Admiral
helm install admiral admiral/admiral \
  --namespace admiral-system \
  --set admiral.image.tag=v1.7.0 \
  --set admiral.config.argoRollouts.enabled=true \
  --set admiral.config.profile=default

# 等待Admiral就绪
kubectl wait --for=condition=available --timeout=300s deployment/admiral -n admiral-system

echo "\n=== 配置多集群 ==="

# 创建集群配置
cat > cluster-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: admiral-config
  namespace: admiral-system
data:
  config.yaml: |
    clusters:
      cluster1:
        endpoint: https://cluster1.example.com
        locality: region1/zone1
        network: network1
        secret: cluster1-secret
      cluster2:
        endpoint: https://cluster2.example.com
        locality: region2/zone1
        network: network2
        secret: cluster2-secret
    
    syncNamespace: admiral-sync
    cacheRefreshDuration: 5m
    clusterRegistriesNamespace: admiral-system
    dependencyNamespace: admiral-system
    
    globalTrafficPolicy:
      policy:
      - dns: greeting.global
        match:
        - sourceCluster: cluster1
        - sourceCluster: cluster2
        target:
        - region: region1
          weight: 50
        - region: region2
          weight: 50
EOF

kubectl apply -f cluster-config.yaml

# 创建服务依赖配置
cat > service-dependency.yaml << 'EOF'
apiVersion: admiral.io/v1alpha1
kind: Dependency
metadata:
  name: greeting-dependency
  namespace: admiral-system
spec:
  source: greeting
  destinations:
  - greeting
  - user-service
  - notification-service
EOF

kubectl apply -f service-dependency.yaml

# 创建全局流量策略
cat > global-traffic-policy.yaml << 'EOF'
apiVersion: admiral.io/v1alpha1
kind: GlobalTrafficPolicy
metadata:
  name: greeting-gtp
  namespace: admiral-system
spec:
  policy:
  - dns: greeting.global
    match:
    - sourceCluster: cluster1
    - sourceCluster: cluster2
    target:
    - region: region1
      weight: 70
    - region: region2
      weight: 30
    outlierDetection:
      consecutiveErrors: 3
      interval: 30s
      baseEjectionTime: 30s
EOF

kubectl apply -f global-traffic-policy.yaml

echo "\n=== 多集群监控脚本 ==="

# 创建多集群监控脚本
cat > multi-cluster-monitor.sh << 'EOF'
#!/bin/bash
# Admiral多集群监控脚本

# 检查Admiral状态
check_admiral_status() {
    echo "=== Admiral组件状态 ==="
    kubectl get pods -n admiral-system
    kubectl get svc -n admiral-system
    
    echo "\n=== Admiral配置 ==="
    kubectl get configmap admiral-config -n admiral-system -o yaml
}

# 检查服务依赖
check_dependencies() {
    echo "=== 服务依赖 ==="
    kubectl get dependencies -n admiral-system
    kubectl describe dependencies -n admiral-system
}

# 检查全局流量策略
check_global_traffic_policies() {
    echo "=== 全局流量策略 ==="
    kubectl get globaltrafficpolicies -n admiral-system
    kubectl describe globaltrafficpolicies -n admiral-system
}

# 检查跨集群服务发现
check_service_discovery() {
    echo "=== 跨集群服务发现 ==="
    kubectl get serviceentries -A
    kubectl get destinationrules -A
    kubectl get virtualservices -A
}

# 检查网络连通性
check_network_connectivity() {
    echo "=== 网络连通性检查 ==="
    
    # 检查Istio网关
    kubectl get gateways -A
    
    # 检查服务端点
    kubectl get endpoints -A | grep -E "(greeting|user-service|notification-service)"
    
    # 检查DNS解析
    kubectl run test-dns --image=busybox --rm -it --restart=Never -- nslookup greeting.global
}

# 生成监控报告
generate_report() {
    local report_file="admiral-report-$(date +%Y%m%d-%H%M%S).txt"
    
    echo "=== Admiral多集群监控报告 ===" > $report_file
    echo "生成时间: $(date)" >> $report_file
    echo "" >> $report_file
    
    echo "Admiral组件状态:" >> $report_file
    kubectl get pods -n admiral-system >> $report_file 2>&1
    echo "" >> $report_file
    
    echo "服务依赖:" >> $report_file
    kubectl get dependencies -n admiral-system >> $report_file 2>&1
    echo "" >> $report_file
    
    echo "全局流量策略:" >> $report_file
    kubectl get globaltrafficpolicies -n admiral-system >> $report_file 2>&1
    echo "" >> $report_file
    
    echo "跨集群服务:" >> $report_file
    kubectl get serviceentries -A >> $report_file 2>&1
    echo "" >> $report_file
    
    echo "报告已生成: $report_file"
}

# 主函数
case "$1" in
    status)
        check_admiral_status
        ;;
    dependencies)
        check_dependencies
        ;;
    policies)
        check_global_traffic_policies
        ;;
    discovery)
        check_service_discovery
        ;;
    network)
        check_network_connectivity
        ;;
    report)
        generate_report
        ;;
    all)
        check_admiral_status
        check_dependencies
        check_global_traffic_policies
        check_service_discovery
        check_network_connectivity
        ;;
    *)
        echo "用法: $0 {status|dependencies|policies|discovery|network|report|all}"
        echo "  status       - 检查Admiral组件状态"
        echo "  dependencies - 检查服务依赖"
        echo "  policies     - 检查全局流量策略"
        echo "  discovery    - 检查跨集群服务发现"
        echo "  network      - 检查网络连通性"
        echo "  report       - 生成监控报告"
        echo "  all          - 执行所有检查"
        exit 1
        ;;
esac
EOF

chmod +x multi-cluster-monitor.sh

echo "\n=== 访问信息 ==="
echo "Admiral Dashboard: kubectl port-forward svc/admiral 8080:8080 -n admiral-system"
echo "多集群监控: ./multi-cluster-monitor.sh all"

echo "\n=== Admiral多集群服务网格部署完成 ==="

13.8 成本管理和优化

13.8.1 KubeCost成本分析

#!/bin/bash
# KubeCost成本分析部署脚本

echo "=== 部署KubeCost成本分析 ==="

# 创建命名空间
kubectl create namespace kubecost

# 添加KubeCost Helm仓库
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

# 安装KubeCost
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --set kubecostToken="your-kubecost-token" \
  --set prometheus.server.persistentVolume.size=10Gi \
  --set prometheus.alertmanager.persistentVolume.size=2Gi

# 等待KubeCost就绪
kubectl wait --for=condition=available --timeout=300s deployment/kubecost-cost-analyzer -n kubecost
kubectl wait --for=condition=available --timeout=300s deployment/kubecost-prometheus-server -n kubecost

echo "\n=== 配置成本分析 ==="

# 创建成本分析配置
cat > cost-analysis-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-analysis-config
  namespace: kubecost
data:
  config.yaml: |
    # 云提供商配置
    cloudProvider:
      name: "aws"  # aws, gcp, azure
      region: "us-west-2"
      
    # 定价配置
    pricing:
      cpu: 0.031611  # 每小时每核心价格
      memory: 0.004237  # 每小时每GB价格
      storage: 0.00014  # 每小时每GB价格
      
    # 折扣配置
    discounts:
      cpu: 0.30  # CPU折扣30%
      memory: 0.30  # 内存折扣30%
      storage: 0.10  # 存储折扣10%
      
    # 分配策略
    allocation:
      idleByNode: false
      shareIdle: false
      shareNamespaces: ["kube-system", "kubecost"]
EOF

kubectl apply -f cost-analysis-config.yaml

# 创建成本报告脚本
cat > cost-report.sh << 'EOF'
#!/bin/bash
# KubeCost成本报告脚本

KUBECOST_URL="http://localhost:9090"

# 获取成本数据
get_cost_data() {
    local window="$1"  # 时间窗口: 1d, 7d, 30d
    local aggregate="$2"  # 聚合维度: namespace, deployment, service
    
    curl -s "${KUBECOST_URL}/model/allocation" \
        -d "window=${window}" \
        -d "aggregate=${aggregate}" \
        -d "accumulate=false" \
        -d "shareIdle=false"
}

# 生成命名空间成本报告
generate_namespace_report() {
    local window="${1:-7d}"
    
    echo "=== 命名空间成本报告 (${window}) ==="
    echo "时间: $(date)"
    echo ""
    
    local data=$(get_cost_data "$window" "namespace")
    
    echo "$data" | jq -r '
        .data[] | 
        select(.totalCost > 0) | 
        "\(.name): $\(.totalCost | tonumber | . * 100 | round / 100) (CPU: $\(.cpuCost | tonumber | . * 100 | round / 100), Memory: $\(.ramCost | tonumber | . * 100 | round / 100), Storage: $\(.pvCost | tonumber | . * 100 | round / 100))"'
}

# 生成应用成本报告
generate_app_report() {
    local window="${1:-7d}"
    
    echo "=== 应用成本报告 (${window}) ==="
    echo "时间: $(date)"
    echo ""
    
    local data=$(get_cost_data "$window" "deployment")
    
    echo "$data" | jq -r '
        .data[] | 
        select(.totalCost > 0) | 
        "\(.name): $\(.totalCost | tonumber | . * 100 | round / 100) (效率: \(.efficiency | tonumber | . * 100 | round)%)"'
}

# 生成成本优化建议
generate_optimization_suggestions() {
    echo "=== 成本优化建议 ==="
    echo "时间: $(date)"
    echo ""
    
    # 获取资源利用率数据
    local utilization_data=$(curl -s "${KUBECOST_URL}/model/allocation" \
        -d "window=7d" \
        -d "aggregate=deployment" \
        -d "accumulate=false")
    
    echo "低效率应用 (CPU利用率 < 50%):"
    echo "$utilization_data" | jq -r '
        .data[] | 
        select(.cpuEfficiency < 0.5 and .totalCost > 1) | 
        "- \(.name): CPU效率 \(.cpuEfficiency | tonumber | . * 100 | round)%, 成本 $\(.totalCost | tonumber | . * 100 | round / 100)"'
    
    echo ""
    echo "内存过度分配应用 (内存利用率 < 30%):"
    echo "$utilization_data" | jq -r '
        .data[] | 
        select(.ramEfficiency < 0.3 and .totalCost > 1) | 
        "- \(.name): 内存效率 \(.ramEfficiency | tonumber | . * 100 | round)%, 成本 $\(.totalCost | tonumber | . * 100 | round / 100)"'
    
    echo ""
    echo "建议操作:"
    echo "1. 调整低效率应用的资源请求和限制"
    echo "2. 考虑使用HPA进行自动扩缩容"
    echo "3. 评估是否可以合并小型应用"
    echo "4. 使用Spot实例降低成本"
}

# 生成完整成本报告
generate_full_report() {
    local window="${1:-7d}"
    local report_file="cost-report-$(date +%Y%m%d-%H%M%S).txt"
    
    {
        generate_namespace_report "$window"
        echo ""
        generate_app_report "$window"
        echo ""
        generate_optimization_suggestions
    } > "$report_file"
    
    echo "完整成本报告已生成: $report_file"
}

# 启动端口转发
start_port_forward() {
    echo "启动KubeCost端口转发..."
    kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost &
    local pf_pid=$!
    echo "端口转发PID: $pf_pid"
    sleep 5
    return $pf_pid
}

# 停止端口转发
stop_port_forward() {
    local pf_pid=$1
    if [ ! -z "$pf_pid" ]; then
        kill $pf_pid 2>/dev/null
        echo "端口转发已停止"
    fi
}

# 主函数
case "$1" in
    namespace)
        start_port_forward
        pf_pid=$!
        generate_namespace_report "${2:-7d}"
        stop_port_forward $pf_pid
        ;;
    app)
        start_port_forward
        pf_pid=$!
        generate_app_report "${2:-7d}"
        stop_port_forward $pf_pid
        ;;
    optimize)
        start_port_forward
        pf_pid=$!
        generate_optimization_suggestions
        stop_port_forward $pf_pid
        ;;
    report)
        start_port_forward
        pf_pid=$!
        generate_full_report "${2:-7d}"
        stop_port_forward $pf_pid
        ;;
    *)
        echo "用法: $0 {namespace|app|optimize|report} [时间窗口]"
        echo "  namespace [window] - 生成命名空间成本报告"
        echo "  app [window]       - 生成应用成本报告"
        echo "  optimize           - 生成成本优化建议"
        echo "  report [window]    - 生成完整成本报告"
        echo ""
        echo "时间窗口选项: 1d, 7d, 30d (默认: 7d)"
        exit 1
        ;;
esac
EOF

chmod +x cost-report.sh

echo "\n=== 访问信息 ==="
echo "KubeCost Dashboard: kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost"
echo "访问地址: http://localhost:9090"
echo "成本报告: ./cost-report.sh report"

echo "\n=== KubeCost成本分析部署完成 ==="

13.8.2 资源优化脚本

#!/bin/bash
# Kubernetes资源优化脚本

echo "=== Kubernetes资源优化分析 ==="

# 分析未使用的资源
analyze_unused_resources() {
    echo "=== 未使用资源分析 ==="
    
    echo "未使用的ConfigMaps:"
    kubectl get configmaps --all-namespaces -o json | jq -r '
        .items[] | 
        select(.metadata.name != "kube-root-ca.crt") |
        "\(.metadata.namespace)/\(.metadata.name)"' | 
    while read cm; do
        namespace=$(echo $cm | cut -d'/' -f1)
        name=$(echo $cm | cut -d'/' -f2)
        
        # 检查是否被Pod使用
        used=$(kubectl get pods -n $namespace -o json | jq -r --arg cm "$name" '
            .items[] | 
            select(
                (.spec.volumes[]?.configMap.name == $cm) or
                (.spec.containers[]?.env[]?.valueFrom.configMapKeyRef.name == $cm) or
                (.spec.containers[]?.envFrom[]?.configMapRef.name == $cm)
            ) | .metadata.name')
        
        if [ -z "$used" ]; then
            echo "  - $cm (未使用)"
        fi
    done
    
    echo "\n未使用的Secrets:"
    kubectl get secrets --all-namespaces -o json | jq -r '
        .items[] | 
        select(.type != "kubernetes.io/service-account-token") |
        select(.metadata.name | startswith("default-token-") | not) |
        "\(.metadata.namespace)/\(.metadata.name)"' | 
    while read secret; do
        namespace=$(echo $secret | cut -d'/' -f1)
        name=$(echo $secret | cut -d'/' -f2)
        
        # 检查是否被Pod使用
        used=$(kubectl get pods -n $namespace -o json | jq -r --arg secret "$name" '
            .items[] | 
            select(
                (.spec.volumes[]?.secret.secretName == $secret) or
                (.spec.containers[]?.env[]?.valueFrom.secretKeyRef.name == $secret) or
                (.spec.containers[]?.envFrom[]?.secretRef.name == $secret) or
                (.spec.imagePullSecrets[]?.name == $secret)
            ) | .metadata.name')
        
        if [ -z "$used" ]; then
            echo "  - $secret (未使用)"
        fi
    done
    
    echo "\n未使用的PersistentVolumes:"
    kubectl get pv -o json | jq -r '
        .items[] | 
        select(.status.phase == "Available") |
        "\(.metadata.name) (\(.spec.capacity.storage))"'
}

# 分析资源请求和限制
analyze_resource_requests() {
    echo "=== 资源请求和限制分析 ==="
    
    echo "没有资源请求的Pod:"
    kubectl get pods --all-namespaces -o json | jq -r '
        .items[] | 
        select(
            .spec.containers[] | 
            (.resources.requests.cpu // .resources.requests.memory) == null
        ) |
        "\(.metadata.namespace)/\(.metadata.name)"'
    
    echo "\n没有资源限制的Pod:"
    kubectl get pods --all-namespaces -o json | jq -r '
        .items[] | 
        select(
            .spec.containers[] | 
            (.resources.limits.cpu // .resources.limits.memory) == null
        ) |
        "\(.metadata.namespace)/\(.metadata.name)"'
    
    echo "\n资源请求过高的Pod (请求 > 限制的80%):"
    kubectl get pods --all-namespaces -o json | jq -r '
        .items[] | 
        .spec.containers[] | 
        select(
            (.resources.requests.cpu and .resources.limits.cpu) and
            ((.resources.requests.cpu | rtrimstr("m") | tonumber) > 
             (.resources.limits.cpu | rtrimstr("m") | tonumber) * 0.8)
        ) |
        "\(.name): CPU请求 \(.resources.requests.cpu), 限制 \(.resources.limits.cpu)"'
}

# 分析节点资源利用率
analyze_node_utilization() {
    echo "=== 节点资源利用率分析 ==="
    
    kubectl top nodes 2>/dev/null || echo "需要安装metrics-server"
    
    echo "\n节点容量和分配:"
    kubectl describe nodes | grep -A 5 "Allocated resources" | 
    grep -E "(Name:|cpu|memory)" | 
    awk '/Name:/ {node=$2} /cpu/ {cpu=$2" "$3} /memory/ {mem=$2" "$3; print node": CPU "cpu", Memory "mem}'
}

# 生成优化建议
generate_optimization_recommendations() {
    echo "=== 优化建议 ==="
    
    echo "1. 资源清理建议:"
    echo "   - 删除未使用的ConfigMaps和Secrets"
    echo "   - 回收未使用的PersistentVolumes"
    echo "   - 清理已完成的Jobs和失败的Pods"
    
    echo "\n2. 资源配置建议:"
    echo "   - 为所有Pod设置资源请求和限制"
    echo "   - 使用VPA (Vertical Pod Autoscaler) 自动调整资源"
    echo "   - 实施HPA (Horizontal Pod Autoscaler) 进行水平扩缩容"
    
    echo "\n3. 成本优化建议:"
    echo "   - 使用Spot实例降低计算成本"
    echo "   - 实施集群自动扩缩容"
    echo "   - 优化镜像大小减少存储和传输成本"
    
    echo "\n4. 性能优化建议:"
    echo "   - 使用节点亲和性优化Pod调度"
    echo "   - 实施资源配额防止资源争用"
    echo "   - 使用PodDisruptionBudgets确保高可用性"
}

# 生成清理脚本
generate_cleanup_script() {
    echo "=== 生成资源清理脚本 ==="
    
    cat > cleanup-resources.sh << 'EOF'
#!/bin/bash
# Kubernetes资源清理脚本

echo "=== 开始资源清理 ==="

# 清理已完成的Jobs
echo "清理已完成的Jobs..."
kubectl get jobs --all-namespaces --field-selector status.successful=1 -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
    echo "删除Job: $namespace/$name"
    kubectl delete job "$name" -n "$namespace"
done

# 清理失败的Pods
echo "\n清理失败的Pods..."
kubectl get pods --all-namespaces --field-selector status.phase=Failed -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
    echo "删除Pod: $namespace/$name"
    kubectl delete pod "$name" -n "$namespace"
done

# 清理已完成的Pods
echo "\n清理已完成的Pods..."
kubectl get pods --all-namespaces --field-selector status.phase=Succeeded -o json | \
jq -r '.items[] | "\(.metadata.namespace) \(.metadata.name)"' | \
while read namespace name; do
    echo "删除Pod: $namespace/$name"
    kubectl delete pod "$name" -n "$namespace"
done

# 清理Evicted Pods
echo "\n清理Evicted Pods..."
kubectl get pods --all-namespaces | grep Evicted | \
awk '{print $1" "$2}' | \
while read namespace name; do
    echo "删除Evicted Pod: $namespace/$name"
    kubectl delete pod "$name" -n "$namespace"
done

echo "\n=== 资源清理完成 ==="
EOF
    
    chmod +x cleanup-resources.sh
    echo "清理脚本已生成: cleanup-resources.sh"
}

# 主函数
case "$1" in
    unused)
        analyze_unused_resources
        ;;
    requests)
        analyze_resource_requests
        ;;
    nodes)
        analyze_node_utilization
        ;;
    recommendations)
        generate_optimization_recommendations
        ;;
    cleanup)
        generate_cleanup_script
        ;;
    all)
        analyze_unused_resources
        echo ""
        analyze_resource_requests
        echo ""
        analyze_node_utilization
        echo ""
        generate_optimization_recommendations
        echo ""
        generate_cleanup_script
        ;;
    *)
        echo "用法: $0 {unused|requests|nodes|recommendations|cleanup|all}"
        echo "  unused          - 分析未使用的资源"
        echo "  requests        - 分析资源请求和限制"
        echo "  nodes           - 分析节点资源利用率"
        echo "  recommendations - 生成优化建议"
        echo "  cleanup         - 生成清理脚本"
        echo "  all             - 执行所有分析"
        exit 1
        ;;
esac

13.9 生态系统最佳实践

13.9.1 工具选择指南

# 工具选择决策矩阵
apiVersion: v1
kind: ConfigMap
metadata:
  name: tool-selection-guide
  namespace: kube-system
data:
  selection-matrix.yaml: |
    # Kubernetes生态系统工具选择指南
    
    # 包管理工具
    package_management:
      helm:
        use_cases: ["复杂应用部署", "模板化配置", "版本管理"]
        pros: ["成熟生态", "丰富Chart库", "版本回滚"]
        cons: ["学习曲线", "模板复杂性"]
        recommendation: "推荐用于生产环境"
      
      kustomize:
        use_cases: ["配置管理", "环境差异化", "GitOps"]
        pros: ["原生支持", "声明式", "无模板"]
        cons: ["功能相对简单", "复杂场景限制"]
        recommendation: "推荐用于简单到中等复杂度场景"
    
    # CI/CD工具
    cicd:
      tekton:
        use_cases: ["云原生CI/CD", "Kubernetes集成", "事件驱动"]
        pros: ["Kubernetes原生", "可扩展性", "标准化"]
        cons: ["相对新颖", "学习成本"]
        recommendation: "推荐用于Kubernetes环境"
      
      argocd:
        use_cases: ["GitOps", "持续部署", "多集群管理"]
        pros: ["GitOps最佳实践", "可视化界面", "多集群支持"]
        cons: ["主要专注CD", "Git依赖"]
        recommendation: "推荐用于GitOps实践"
      
      jenkins:
        use_cases: ["传统CI/CD", "复杂流水线", "插件生态"]
        pros: ["成熟稳定", "丰富插件", "灵活配置"]
        cons: ["资源消耗", "维护复杂"]
        recommendation: "适用于传统环境迁移"
    
    # 监控工具
    monitoring:
      prometheus:
        use_cases: ["指标监控", "告警", "时序数据"]
        pros: ["行业标准", "丰富生态", "高性能"]
        cons: ["存储限制", "配置复杂"]
        recommendation: "监控系统首选"
      
      grafana:
        use_cases: ["可视化", "仪表盘", "多数据源"]
        pros: ["强大可视化", "多数据源", "丰富模板"]
        cons: ["主要用于展示"]
        recommendation: "可视化首选"
      
      jaeger:
        use_cases: ["分布式追踪", "性能分析", "调用链"]
        pros: ["OpenTracing标准", "详细追踪", "性能分析"]
        cons: ["存储开销", "复杂度"]
        recommendation: "微服务追踪首选"
    
    # 安全工具
    security:
      falco:
        use_cases: ["运行时安全", "异常检测", "合规监控"]
        pros: ["实时监控", "规则灵活", "CNCF项目"]
        cons: ["性能影响", "规则复杂"]
        recommendation: "运行时安全首选"
      
      opa_gatekeeper:
        use_cases: ["策略管理", "准入控制", "合规检查"]
        pros: ["策略即代码", "灵活规则", "标准化"]
        cons: ["学习曲线", "调试困难"]
        recommendation: "策略管理首选"
    
    # 服务网格
    service_mesh:
      istio:
        use_cases: ["复杂微服务", "高级流量管理", "安全策略"]
        pros: ["功能丰富", "成熟稳定", "强大生态"]
        cons: ["复杂度高", "资源消耗"]
        recommendation: "复杂微服务环境首选"
      
      linkerd:
        use_cases: ["轻量级服务网格", "简单场景", "性能优先"]
        pros: ["轻量级", "易用性", "性能好"]
        cons: ["功能相对简单"]
        recommendation: "简单微服务环境首选"
    
    # 多集群管理
    multi_cluster:
      rancher:
        use_cases: ["多集群管理", "统一界面", "企业级"]
        pros: ["统一管理", "用户友好", "企业功能"]
        cons: ["额外复杂性", "供应商锁定"]
        recommendation: "企业多集群环境首选"
      
      admiral:
        use_cases: ["多集群服务网格", "跨集群通信", "流量管理"]
        pros: ["服务网格集成", "跨集群服务发现"]
        cons: ["相对新颖", "Istio依赖"]
        recommendation: "多集群服务网格场景"

13.9.2 集成最佳实践

#!/bin/bash
# Kubernetes生态系统集成最佳实践脚本

echo "=== Kubernetes生态系统集成最佳实践 ==="

# 创建最佳实践检查清单
create_best_practices_checklist() {
    cat > best-practices-checklist.md << 'EOF'
# Kubernetes生态系统集成最佳实践检查清单

## 1. 工具选择和规划

### 1.1 需求分析
- [ ] 明确业务需求和技术要求
- [ ] 评估团队技能和学习成本
- [ ] 考虑现有基础设施和工具
- [ ] 制定工具演进路线图

### 1.2 工具评估
- [ ] 对比多个候选工具
- [ ] 进行POC验证
- [ ] 评估社区活跃度和支持
- [ ] 考虑长期维护成本

## 2. 架构设计

### 2.1 整体架构
- [ ] 设计清晰的架构图
- [ ] 定义组件间的接口
- [ ] 考虑扩展性和可维护性
- [ ] 规划数据流和控制流

### 2.2 安全设计
- [ ] 实施最小权限原则
- [ ] 配置网络隔离
- [ ] 启用审计日志
- [ ] 实施密钥管理

## 3. 部署和配置

### 3.1 环境管理
- [ ] 使用Infrastructure as Code
- [ ] 实施环境一致性
- [ ] 配置环境隔离
- [ ] 建立环境升级流程

### 3.2 配置管理
- [ ] 使用ConfigMaps和Secrets
- [ ] 实施配置版本控制
- [ ] 配置环境差异化
- [ ] 建立配置审核流程

## 4. 监控和可观测性

### 4.1 监控策略
- [ ] 定义关键指标
- [ ] 设置合理告警
- [ ] 实施分层监控
- [ ] 建立监控仪表盘

### 4.2 日志管理
- [ ] 统一日志格式
- [ ] 集中日志收集
- [ ] 实施日志分析
- [ ] 配置日志保留策略

### 4.3 追踪和调试
- [ ] 实施分布式追踪
- [ ] 配置性能监控
- [ ] 建立调试工具链
- [ ] 实施错误追踪

## 5. 安全和合规

### 5.1 安全策略
- [ ] 实施Pod安全策略
- [ ] 配置网络策略
- [ ] 启用RBAC
- [ ] 实施镜像安全扫描

### 5.2 合规管理
- [ ] 定义合规要求
- [ ] 实施策略即代码
- [ ] 配置合规检查
- [ ] 建立审计流程

## 6. 运维和维护

### 6.1 自动化运维
- [ ] 实施GitOps
- [ ] 配置自动化部署
- [ ] 建立自动化测试
- [ ] 实施自动化回滚

### 6.2 容量管理
- [ ] 监控资源使用
- [ ] 实施自动扩缩容
- [ ] 配置资源配额
- [ ] 建立容量规划

### 6.3 故障处理
- [ ] 建立故障响应流程
- [ ] 配置自动故障恢复
- [ ] 实施混沌工程
- [ ] 建立事后分析机制

## 7. 团队和流程

### 7.1 团队建设
- [ ] 培训团队技能
- [ ] 建立知识分享
- [ ] 定义角色职责
- [ ] 实施轮岗机制

### 7.2 流程优化
- [ ] 建立开发流程
- [ ] 实施代码审查
- [ ] 配置质量门禁
- [ ] 建立发布流程

## 8. 成本优化

### 8.1 成本监控
- [ ] 实施成本分析
- [ ] 配置成本告警
- [ ] 建立成本报告
- [ ] 实施成本归因

### 8.2 资源优化
- [ ] 优化资源配置
- [ ] 实施资源回收
- [ ] 使用Spot实例
- [ ] 配置集群自动扩缩容
EOF

    echo "最佳实践检查清单已创建: best-practices-checklist.md"
}

# 创建集成验证脚本
create_integration_validation() {
    cat > validate-integration.sh << 'EOF'
#!/bin/bash
# Kubernetes生态系统集成验证脚本

echo "=== 开始集成验证 ==="

# 验证基础组件
validate_basic_components() {
    echo "=== 验证基础组件 ==="
    
    # 检查Kubernetes集群
    echo "检查Kubernetes集群状态:"
    kubectl cluster-info
    kubectl get nodes
    
    # 检查系统Pod
    echo "\n检查系统Pod状态:"
    kubectl get pods -n kube-system
    
    # 检查存储类
    echo "\n检查存储类:"
    kubectl get storageclass
}

# 验证监控组件
validate_monitoring() {
    echo "=== 验证监控组件 ==="
    
    # 检查Prometheus
    if kubectl get namespace monitoring &>/dev/null; then
        echo "检查Prometheus:"
        kubectl get pods -n monitoring | grep prometheus
        
        echo "\n检查Grafana:"
        kubectl get pods -n monitoring | grep grafana
        
        echo "\n检查AlertManager:"
        kubectl get pods -n monitoring | grep alertmanager
    else
        echo "监控命名空间不存在"
    fi
}

# 验证日志组件
validate_logging() {
    echo "=== 验证日志组件 ==="
    
    # 检查日志收集
    if kubectl get namespace logging &>/dev/null; then
        echo "检查日志收集组件:"
        kubectl get pods -n logging
    else
        echo "日志命名空间不存在"
    fi
}

# 验证安全组件
validate_security() {
    echo "=== 验证安全组件 ==="
    
    # 检查RBAC
    echo "检查RBAC配置:"
    kubectl get clusterroles | head -10
    kubectl get clusterrolebindings | head -10
    
    # 检查网络策略
    echo "\n检查网络策略:"
    kubectl get networkpolicies --all-namespaces
    
    # 检查Pod安全策略
    echo "\n检查Pod安全策略:"
    kubectl get podsecuritypolicies 2>/dev/null || echo "PSP未启用"
}

# 验证CI/CD组件
validate_cicd() {
    echo "=== 验证CI/CD组件 ==="
    
    # 检查ArgoCD
    if kubectl get namespace argocd &>/dev/null; then
        echo "检查ArgoCD:"
        kubectl get pods -n argocd
    fi
    
    # 检查Tekton
    if kubectl get namespace tekton-pipelines &>/dev/null; then
        echo "\n检查Tekton:"
        kubectl get pods -n tekton-pipelines
    fi
}

# 验证服务网格
validate_service_mesh() {
    echo "=== 验证服务网格 ==="
    
    # 检查Istio
    if kubectl get namespace istio-system &>/dev/null; then
        echo "检查Istio:"
        kubectl get pods -n istio-system
        
        echo "\n检查Istio配置:"
        kubectl get gateways --all-namespaces
        kubectl get virtualservices --all-namespaces
    fi
    
    # 检查Linkerd
    if kubectl get namespace linkerd &>/dev/null; then
        echo "\n检查Linkerd:"
        kubectl get pods -n linkerd
    fi
}

# 生成验证报告
generate_validation_report() {
    local report_file="integration-validation-$(date +%Y%m%d-%H%M%S).txt"
    
    {
        echo "=== Kubernetes生态系统集成验证报告 ==="
        echo "生成时间: $(date)"
        echo ""
        
        validate_basic_components
        echo ""
        validate_monitoring
        echo ""
        validate_logging
        echo ""
        validate_security
        echo ""
        validate_cicd
        echo ""
        validate_service_mesh
    } > "$report_file"
    
    echo "验证报告已生成: $report_file"
}

# 主函数
case "$1" in
    basic)
        validate_basic_components
        ;;
    monitoring)
        validate_monitoring
        ;;
    logging)
        validate_logging
        ;;
    security)
        validate_security
        ;;
    cicd)
        validate_cicd
        ;;
    mesh)
        validate_service_mesh
        ;;
    report)
        generate_validation_report
        ;;
    all)
        validate_basic_components
        validate_monitoring
        validate_logging
        validate_security
        validate_cicd
        validate_service_mesh
        ;;
    *)
        echo "用法: $0 {basic|monitoring|logging|security|cicd|mesh|report|all}"
        exit 1
        ;;
esac
EOF
    
    chmod +x validate-integration.sh
    echo "集成验证脚本已创建: validate-integration.sh"
}

# 主函数
case "$1" in
    checklist)
        create_best_practices_checklist
        ;;
    validation)
        create_integration_validation
        ;;
    all)
        create_best_practices_checklist
        create_integration_validation
        ;;
    *)
        echo "用法: $0 {checklist|validation|all}"
        echo "  checklist   - 创建最佳实践检查清单"
        echo "  validation  - 创建集成验证脚本"
        echo "  all         - 创建所有文档和脚本"
        exit 1
        ;;
esac

13.10 总结

Kubernetes生态系统是一个庞大而丰富的技术栈,涵盖了从应用开发到生产运维的各个环节。通过本章的学习,我们深入了解了:

13.10.1 核心工具类别

  1. 包管理工具:Helm和Kustomize为应用部署和配置管理提供了强大支持
  2. CI/CD工具:Tekton和ArgoCD实现了云原生的持续集成和部署
  3. 监控可观测性:Prometheus、Grafana、Jaeger构建了完整的监控体系
  4. 安全工具:Falco和OPA Gatekeeper提供了运行时安全和策略管理
  5. 服务网格:Istio和Linkerd为微服务提供了流量管理和安全保障
  6. 多集群管理:Rancher和Admiral支持大规模集群管理和跨集群服务
  7. 成本管理:KubeCost等工具帮助优化资源使用和成本控制

13.10.2 选择和集成原则

  1. 需求驱动:根据实际业务需求选择合适的工具
  2. 渐进式采用:从简单工具开始,逐步引入复杂功能
  3. 标准化优先:选择符合CNCF标准的工具
  4. 社区活跃:优先选择社区活跃、文档完善的项目
  5. 集成友好:考虑工具间的集成复杂度和兼容性

13.10.3 最佳实践要点

  1. 统一管理:使用GitOps实现配置和部署的版本控制
  2. 安全第一:在每个环节都要考虑安全性
  3. 可观测性:建立完整的监控、日志和追踪体系
  4. 自动化:尽可能自动化运维操作
  5. 成本意识:持续监控和优化资源使用

13.10.4 发展趋势

  1. 云原生化:更多工具原生支持Kubernetes
  2. AI/ML集成:智能化运维和自动优化
  3. 边缘计算:支持边缘和混合云场景
  4. 安全增强:零信任和供应链安全
  5. 可持续发展:绿色计算和碳中和

Kubernetes生态系统将继续快速发展,掌握这些核心工具和最佳实践,将帮助我们构建更加稳定、安全、高效的云原生应用平台。


下一章预告:第14章将学习Kubernetes的未来发展趋势和新兴技术,包括边缘计算、AI/ML工作负载、WebAssembly集成等前沿话题。