概述

微服务部署与运维是微服务架构成功实施的关键环节。本章将深入探讨微服务的部署策略、CI/CD流水线设计、容器化技术、服务网格运维以及故障排查等核心内容。

部署与运维挑战

  1. 复杂性管理:多服务协调部署
  2. 环境一致性:开发、测试、生产环境统一
  3. 版本管理:服务间版本兼容性
  4. 故障隔离:快速定位和解决问题
  5. 性能监控:实时监控和优化
  6. 安全保障:部署过程和运行时安全

运维架构概览

# 微服务运维架构
apiVersion: v1
kind: ConfigMap
metadata:
  name: devops-architecture
  namespace: devops
data:
  architecture.yml: |
    # 源码管理
    source_control:
      git:
        repositories:
          - name: user-service
            url: https://github.com/company/user-service
          - name: order-service
            url: https://github.com/company/order-service
          - name: payment-service
            url: https://github.com/company/payment-service
        branching_strategy: GitFlow
        
    # CI/CD流水线
    cicd_pipeline:
      tools:
        ci: Jenkins/GitLab CI/GitHub Actions
        cd: ArgoCD/Flux
        registry: Harbor/Docker Hub
        scanning: Trivy/Clair
      stages:
        - source_checkout
        - unit_tests
        - integration_tests
        - security_scan
        - build_image
        - push_registry
        - deploy_staging
        - e2e_tests
        - deploy_production
        
    # 容器化平台
    containerization:
      runtime: Docker/Containerd
      orchestration: Kubernetes
      service_mesh: Istio/Linkerd
      ingress: Nginx/Traefik
      
    # 部署策略
    deployment_strategies:
      - blue_green
      - canary
      - rolling_update
      - recreate
      
    # 监控运维
    operations:
      monitoring:
        metrics: Prometheus
        logging: ELK Stack
        tracing: Jaeger
        alerting: Alertmanager
      backup:
        databases: Velero
        configurations: Git
      disaster_recovery:
        rpo: 1h
        rto: 30min

容器化部署策略

Docker容器化

多阶段构建Dockerfile

# 用户服务Dockerfile
# 构建阶段
FROM golang:1.21-alpine AS builder

# 设置工作目录
WORKDIR /app

# 安装依赖
RUN apk add --no-cache git ca-certificates tzdata

# 复制go mod文件
COPY go.mod go.sum ./
RUN go mod download

# 复制源码
COPY . .

# 构建应用
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd/server

# 运行阶段
FROM alpine:latest

# 安装ca证书
RUN apk --no-cache add ca-certificates

# 创建非root用户
RUN addgroup -g 1001 appgroup && \
    adduser -D -s /bin/sh -u 1001 -G appgroup appuser

# 设置工作目录
WORKDIR /root/

# 从构建阶段复制二进制文件
COPY --from=builder /app/main .
COPY --from=builder /app/configs ./configs

# 设置文件权限
RUN chown -R appuser:appgroup /root

# 切换到非root用户
USER appuser

# 暴露端口
EXPOSE 8080

# 健康检查
HEALTHCHEK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1

# 启动应用
CMD ["./main"]

Docker Compose开发环境

# docker-compose.yml
version: '3.8'

services:
  # 用户服务
  user-service:
    build:
      context: ./user-service
      dockerfile: Dockerfile
    ports:
      - "8081:8080"
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=userdb
      - DB_USER=postgres
      - DB_PASSWORD=password
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    depends_on:
      - postgres
      - redis
    networks:
      - microservices
    volumes:
      - ./user-service/configs:/app/configs
    restart: unless-stopped
    
  # 订单服务
  order-service:
    build:
      context: ./order-service
      dockerfile: Dockerfile
    ports:
      - "8082:8080"
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=orderdb
      - DB_USER=postgres
      - DB_PASSWORD=password
      - USER_SERVICE_URL=http://user-service:8080
      - PAYMENT_SERVICE_URL=http://payment-service:8080
    depends_on:
      - postgres
      - user-service
    networks:
      - microservices
    restart: unless-stopped
    
  # 支付服务
  payment-service:
    build:
      context: ./payment-service
      dockerfile: Dockerfile
    ports:
      - "8083:8080"
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=paymentdb
      - DB_USER=postgres
      - DB_PASSWORD=password
    depends_on:
      - postgres
    networks:
      - microservices
    restart: unless-stopped
    
  # PostgreSQL数据库
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
      - POSTGRES_MULTIPLE_DATABASES=userdb,orderdb,paymentdb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./scripts/init-databases.sh:/docker-entrypoint-initdb.d/init-databases.sh
    networks:
      - microservices
    restart: unless-stopped
    
  # Redis缓存
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    networks:
      - microservices
    restart: unless-stopped
    command: redis-server --appendonly yes
    
  # API网关
  api-gateway:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/conf.d:/etc/nginx/conf.d
    depends_on:
      - user-service
      - order-service
      - payment-service
    networks:
      - microservices
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

networks:
  microservices:
    driver: bridge

Kubernetes部署配置

用户服务部署

# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: microservices
  labels:
    app: user-service
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
      version: v1
  template:
    metadata:
      labels:
        app: user-service
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: user-service
      containers:
      - name: user-service
        image: harbor.company.com/microservices/user-service:v1.0.0
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: grpc
        env:
        - name: DB_HOST
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: host
        - name: DB_PORT
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: port
        - name: DB_NAME
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: database
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: password
        - name: REDIS_HOST
          valueFrom:
            configMapKeyRef:
              name: redis-config
              key: host
        - name: REDIS_PORT
          valueFrom:
            configMapKeyRef:
              name: redis-config
              key: port
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        volumeMounts:
        - name: config
          mountPath: /app/configs
          readOnly: true
        - name: logs
          mountPath: /app/logs
        securityContext:
          runAsNonRoot: true
          runAsUser: 1001
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
      volumes:
      - name: config
        configMap:
          name: user-service-config
      - name: logs
        emptyDir: {}
      imagePullSecrets:
      - name: harbor-secret
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - user-service
              topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: microservices
  labels:
    app: user-service
spec:
  selector:
    app: user-service
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: grpc
    port: 9090
    targetPort: 9090
    protocol: TCP
  type: ClusterIP
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: user-service
  namespace: microservices
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-service-config
  namespace: microservices
data:
  app.yaml: |
    server:
      port: 8080
      grpc_port: 9090
      read_timeout: 30s
      write_timeout: 30s
      idle_timeout: 60s
    
    logging:
      level: info
      format: json
      output: stdout
    
    metrics:
      enabled: true
      path: /metrics
    
    tracing:
      enabled: true
      jaeger_endpoint: http://jaeger-collector:14268/api/traces
      sample_rate: 0.1
---
apiVersion: v1
kind: Secret
metadata:
  name: database-secret
  namespace: microservices
type: Opaque
data:
  host: cG9zdGdyZXNxbC5kYXRhYmFzZS5zdmMuY2x1c3Rlci5sb2NhbA== # postgresql.database.svc.cluster.local
  port: NTQzMg== # 5432
  database: dXNlcmRi # userdb
  username: cG9zdGdyZXM= # postgres
  password: cGFzc3dvcmQ= # password

HorizontalPodAutoscaler配置

# user-service-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
  namespace: microservices
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max

CI/CD流水线设计

Jenkins流水线

// Jenkinsfile
pipeline {
    agent any
    
    environment {
        DOCKER_REGISTRY = 'harbor.company.com'
        DOCKER_REPO = 'microservices'
        IMAGE_NAME = 'user-service'
        KUBECONFIG = credentials('kubeconfig')
        HARBOR_CREDENTIALS = credentials('harbor-credentials')
        SONAR_TOKEN = credentials('sonar-token')
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
                script {
                    env.GIT_COMMIT_SHORT = sh(
                        script: "git rev-parse --short HEAD",
                        returnStdout: true
                    ).trim()
                    env.BUILD_VERSION = "${env.BUILD_NUMBER}-${env.GIT_COMMIT_SHORT}"
                }
            }
        }
        
        stage('Unit Tests') {
            steps {
                sh '''
                    go mod download
                    go test -v -race -coverprofile=coverage.out ./...
                    go tool cover -html=coverage.out -o coverage.html
                '''
            }
            post {
                always {
                    publishHTML([
                        allowMissing: false,
                        alwaysLinkToLastBuild: true,
                        keepAll: true,
                        reportDir: '.',
                        reportFiles: 'coverage.html',
                        reportName: 'Coverage Report'
                    ])
                }
            }
        }
        
        stage('Code Quality') {
            steps {
                sh '''
                    # 静态代码分析
                    golangci-lint run --out-format checkstyle > golangci-lint-report.xml || true
                    
                    # SonarQube分析
                    sonar-scanner \
                        -Dsonar.projectKey=${IMAGE_NAME} \
                        -Dsonar.sources=. \
                        -Dsonar.host.url=http://sonarqube:9000 \
                        -Dsonar.login=${SONAR_TOKEN}
                '''
            }
            post {
                always {
                    recordIssues(
                        enabledForFailure: true,
                        tools: [checkStyle(pattern: 'golangci-lint-report.xml')]
                    )
                }
            }
        }
        
        stage('Security Scan') {
            steps {
                sh '''
                    # 依赖漏洞扫描
                    nancy sleuth
                    
                    # 代码安全扫描
                    gosec -fmt json -out gosec-report.json ./...
                '''
            }
        }
        
        stage('Build Image') {
            steps {
                script {
                    def image = docker.build("${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}")
                    
                    // 镜像安全扫描
                    sh '''
                        trivy image --format json --output trivy-report.json \
                            ${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}
                    '''
                    
                    // 推送镜像
                    docker.withRegistry("https://${DOCKER_REGISTRY}", 'harbor-credentials') {
                        image.push()
                        image.push('latest')
                    }
                }
            }
        }
        
        stage('Integration Tests') {
            steps {
                sh '''
                    # 启动测试环境
                    docker-compose -f docker-compose.test.yml up -d
                    
                    # 等待服务启动
                    sleep 30
                    
                    # 运行集成测试
                    go test -v -tags=integration ./tests/integration/...
                '''
            }
            post {
                always {
                    sh 'docker-compose -f docker-compose.test.yml down'
                }
            }
        }
        
        stage('Deploy to Staging') {
            when {
                branch 'develop'
            }
            steps {
                sh '''
                    # 更新Kubernetes部署
                    sed -i "s|image: .*|image: ${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}|" \
                        k8s/staging/deployment.yaml
                    
                    kubectl apply -f k8s/staging/ --namespace=staging
                    
                    # 等待部署完成
                    kubectl rollout status deployment/${IMAGE_NAME} --namespace=staging --timeout=300s
                '''
            }
        }
        
        stage('E2E Tests') {
            when {
                branch 'develop'
            }
            steps {
                sh '''
                    # 运行端到端测试
                    npm install
                    npm run test:e2e -- --env staging
                '''
            }
            post {
                always {
                    publishTestResults testResultsPattern: 'test-results.xml'
                }
            }
        }
        
        stage('Deploy to Production') {
            when {
                branch 'main'
            }
            steps {
                script {
                    // 人工审批
                    input message: 'Deploy to production?', ok: 'Deploy',
                          submitterParameter: 'DEPLOYER'
                    
                    sh '''
                        # 蓝绿部署
                        ./scripts/blue-green-deploy.sh \
                            ${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}
                    '''
                }
            }
        }
    }
    
    post {
        always {
            // 清理工作空间
            cleanWs()
        }
        success {
            // 发送成功通知
            slackSend(
                channel: '#deployments',
                color: 'good',
                message: "✅ ${IMAGE_NAME} ${BUILD_VERSION} deployed successfully by ${env.DEPLOYER ?: 'Jenkins'}"
            )
        }
        failure {
            // 发送失败通知
            slackSend(
                channel: '#deployments',
                color: 'danger',
                message: "❌ ${IMAGE_NAME} ${BUILD_VERSION} deployment failed"
            )
        }
    }
}

GitLab CI配置

# .gitlab-ci.yml
stages:
  - test
  - build
  - security
  - deploy-staging
  - deploy-production

variables:
  DOCKER_REGISTRY: harbor.company.com
  DOCKER_REPO: microservices
  IMAGE_NAME: user-service
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

before_script:
  - export BUILD_VERSION="${CI_PIPELINE_ID}-${CI_COMMIT_SHORT_SHA}"

# 单元测试
unit-test:
  stage: test
  image: golang:1.21
  services:
    - postgres:15
    - redis:7
  variables:
    POSTGRES_DB: testdb
    POSTGRES_USER: postgres
    POSTGRES_PASSWORD: password
    DB_HOST: postgres
    REDIS_HOST: redis
  script:
    - go mod download
    - go test -v -race -coverprofile=coverage.out ./...
    - go tool cover -func=coverage.out
  coverage: '/total:.*?(\d+\.\d+)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
    paths:
      - coverage.out
      - coverage.html
    expire_in: 1 week

# 代码质量检查
code-quality:
  stage: test
  image: golangci/golangci-lint:latest
  script:
    - golangci-lint run --out-format code-climate > gl-code-quality-report.json
  artifacts:
    reports:
      codequality: gl-code-quality-report.json
    expire_in: 1 week
  allow_failure: true

# 构建Docker镜像
build-image:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - echo $HARBOR_PASSWORD | docker login $DOCKER_REGISTRY -u $HARBOR_USERNAME --password-stdin
  script:
    - docker build -t $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION .
    - docker push $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
    - docker tag $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:latest
    - docker push $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:latest
  only:
    - develop
    - main

# 安全扫描
security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --format template --template "@contrib/gitlab.tpl" --output gl-container-scanning-report.json $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
  artifacts:
    reports:
      container_scanning: gl-container-scanning-report.json
    expire_in: 1 week
  dependencies:
    - build-image
  only:
    - develop
    - main

# 部署到测试环境
deploy-staging:
  stage: deploy-staging
  image: bitnami/kubectl:latest
  environment:
    name: staging
    url: https://staging.company.com
  script:
    - kubectl config use-context staging
    - sed -i "s|image: .*|image: $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION|" k8s/staging/deployment.yaml
    - kubectl apply -f k8s/staging/ --namespace=staging
    - kubectl rollout status deployment/$IMAGE_NAME --namespace=staging --timeout=300s
  dependencies:
    - build-image
  only:
    - develop

# 部署到生产环境
deploy-production:
  stage: deploy-production
  image: bitnami/kubectl:latest
  environment:
    name: production
    url: https://api.company.com
  script:
    - kubectl config use-context production
    - ./scripts/canary-deploy.sh $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
  when: manual
  dependencies:
    - build-image
  only:
    - main

ArgoCD GitOps配置

# argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: user-service
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: microservices
  source:
    repoURL: https://github.com/company/microservices-manifests
    targetRevision: HEAD
    path: user-service
    helm:
      valueFiles:
        - values.yaml
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: microservices
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
  revisionHistoryLimit: 10
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: microservices
  namespace: argocd
spec:
  description: Microservices project
  sourceRepos:
    - 'https://github.com/company/*'
  destinations:
    - namespace: microservices
      server: https://kubernetes.default.svc
    - namespace: staging
      server: https://kubernetes.default.svc
  clusterResourceWhitelist:
    - group: ''
      kind: Namespace
  namespaceResourceWhitelist:
    - group: ''
      kind: ConfigMap
    - group: ''
      kind: Secret
    - group: ''
      kind: Service
    - group: apps
      kind: Deployment
    - group: apps
      kind: ReplicaSet
    - group: autoscaling
      kind: HorizontalPodAutoscaler
  roles:
    - name: admin
      description: Admin access
      policies:
        - p, proj:microservices:admin, applications, *, microservices/*, allow
      groups:
        - company:devops
    - name: developer
      description: Developer access
      policies:
        - p, proj:microservices:developer, applications, get, microservices/*, allow
        - p, proj:microservices:developer, applications, sync, microservices/*, allow
      groups:
        - company:developers

蓝绿部署和金丝雀发布

蓝绿部署策略

蓝绿部署是一种零停机部署策略,通过维护两个相同的生产环境(蓝环境和绿环境)来实现快速切换和回滚。

蓝绿部署脚本

#!/bin/bash
# blue-green-deploy.sh

set -e

# 配置参数
NAMESPACE="microservices"
SERVICE_NAME="user-service"
NEW_IMAGE="$1"
TIMEOUT="300s"

# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

echo -e "${BLUE}开始蓝绿部署: ${SERVICE_NAME}${NC}"
echo -e "${BLUE}新镜像: ${NEW_IMAGE}${NC}"

# 检查当前活跃环境
ACTIVE_ENV=$(kubectl get service ${SERVICE_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.selector.environment}' 2>/dev/null || echo "blue")
echo -e "${YELLOW}当前活跃环境: ${ACTIVE_ENV}${NC}"

# 确定目标环境
if [ "$ACTIVE_ENV" = "blue" ]; then
    TARGET_ENV="green"
    INACTIVE_ENV="blue"
else
    TARGET_ENV="blue"
    INACTIVE_ENV="green"
fi

echo -e "${YELLOW}目标部署环境: ${TARGET_ENV}${NC}"

# 部署到目标环境
echo -e "${BLUE}部署到 ${TARGET_ENV} 环境...${NC}"
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${SERVICE_NAME}-${TARGET_ENV}
  namespace: ${NAMESPACE}
  labels:
    app: ${SERVICE_NAME}
    environment: ${TARGET_ENV}
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ${SERVICE_NAME}
      environment: ${TARGET_ENV}
  template:
    metadata:
      labels:
        app: ${SERVICE_NAME}
        environment: ${TARGET_ENV}
      annotations:
        deployment.kubernetes.io/revision: "$(date +%s)"
    spec:
      containers:
      - name: ${SERVICE_NAME}
        image: ${NEW_IMAGE}
        ports:
        - containerPort: 8080
        env:
        - name: ENVIRONMENT
          value: ${TARGET_ENV}
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
EOF

# 等待部署完成
echo -e "${BLUE}等待 ${TARGET_ENV} 环境部署完成...${NC}"
kubectl rollout status deployment/${SERVICE_NAME}-${TARGET_ENV} -n ${NAMESPACE} --timeout=${TIMEOUT}

# 健康检查
echo -e "${BLUE}执行健康检查...${NC}"
for i in {1..30}; do
    if kubectl get pods -n ${NAMESPACE} -l app=${SERVICE_NAME},environment=${TARGET_ENV} --field-selector=status.phase=Running | grep -q Running; then
        echo -e "${GREEN}健康检查通过${NC}"
        break
    fi
    echo "等待Pod启动... ($i/30)"
    sleep 10
done

# 烟雾测试
echo -e "${BLUE}执行烟雾测试...${NC}"
TEST_POD=$(kubectl get pods -n ${NAMESPACE} -l app=${SERVICE_NAME},environment=${TARGET_ENV} -o jsonpath='{.items[0].metadata.name}')
TEST_RESULT=$(kubectl exec -n ${NAMESPACE} ${TEST_POD} -- wget -qO- http://localhost:8080/health)

if [[ $TEST_RESULT == *"healthy"* ]]; then
    echo -e "${GREEN}烟雾测试通过${NC}"
else
    echo -e "${RED}烟雾测试失败,回滚部署${NC}"
    kubectl delete deployment ${SERVICE_NAME}-${TARGET_ENV} -n ${NAMESPACE}
    exit 1
fi

# 切换流量
echo -e "${BLUE}切换流量到 ${TARGET_ENV} 环境...${NC}"
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"'${TARGET_ENV}'"}}}'

# 验证切换
echo -e "${BLUE}验证流量切换...${NC}"
sleep 10
for i in {1..5}; do
    RESPONSE=$(curl -s http://${SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/health || echo "failed")
    if [[ $RESPONSE == *"healthy"* ]]; then
        echo -e "${GREEN}流量切换验证通过 ($i/5)${NC}"
    else
        echo -e "${RED}流量切换验证失败 ($i/5)${NC}"
        if [ $i -eq 5 ]; then
            echo -e "${RED}回滚流量到 ${ACTIVE_ENV} 环境${NC}"
            kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"'${ACTIVE_ENV}'"}}}'
            exit 1
        fi
    fi
    sleep 5
done

# 清理旧环境
echo -e "${BLUE}清理旧环境 ${INACTIVE_ENV}...${NC}"
kubectl delete deployment ${SERVICE_NAME}-${INACTIVE_ENV} -n ${NAMESPACE} --ignore-not-found=true

echo -e "${GREEN}蓝绿部署完成!${NC}"
echo -e "${GREEN}当前活跃环境: ${TARGET_ENV}${NC}"
echo -e "${GREEN}新镜像: ${NEW_IMAGE}${NC}"

金丝雀发布策略

金丝雀发布通过逐步增加新版本的流量比例来降低部署风险。

Istio金丝雀发布配置

# canary-virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
  namespace: microservices
spec:
  hosts:
  - user-service
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: user-service
        subset: v2
      weight: 100
  - route:
    - destination:
        host: user-service
        subset: v1
      weight: 90
    - destination:
        host: user-service
        subset: v2
      weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: user-service
  namespace: microservices
spec:
  host: user-service
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

金丝雀发布脚本

#!/bin/bash
# canary-deploy.sh

set -e

# 配置参数
NAMESPACE="microservices"
SERVICE_NAME="user-service"
NEW_IMAGE="$1"
CANARY_STEPS=(5 10 25 50 75 100)
STEP_DURATION="300" # 5分钟
ERROR_THRESHOLD="5" # 错误率阈值5%

echo "开始金丝雀发布: ${SERVICE_NAME}"
echo "新镜像: ${NEW_IMAGE}"

# 部署金丝雀版本
echo "部署金丝雀版本..."
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${SERVICE_NAME}-v2
  namespace: ${NAMESPACE}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ${SERVICE_NAME}
      version: v2
  template:
    metadata:
      labels:
        app: ${SERVICE_NAME}
        version: v2
    spec:
      containers:
      - name: ${SERVICE_NAME}
        image: ${NEW_IMAGE}
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
EOF

# 等待金丝雀版本就绪
kubectl rollout status deployment/${SERVICE_NAME}-v2 -n ${NAMESPACE}

# 逐步增加流量
for step in "${CANARY_STEPS[@]}"; do
    echo "设置金丝雀流量比例: ${step}%"
    
    # 更新VirtualService
    cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ${SERVICE_NAME}
  namespace: ${NAMESPACE}
spec:
  hosts:
  - ${SERVICE_NAME}
  http:
  - route:
    - destination:
        host: ${SERVICE_NAME}
        subset: v1
      weight: $((100 - step))
    - destination:
        host: ${SERVICE_NAME}
        subset: v2
      weight: ${step}
EOF

    echo "等待 ${STEP_DURATION} 秒观察指标..."
    sleep ${STEP_DURATION}
    
    # 检查错误率
    ERROR_RATE=$(kubectl exec -n istio-system deployment/prometheus -- \
        promtool query instant \
        'rate(istio_request_total{destination_service_name="'${SERVICE_NAME}'",response_code!~"2.."}[5m]) / rate(istio_request_total{destination_service_name="'${SERVICE_NAME}'"}[5m]) * 100' \
        | grep -oP '\d+\.\d+' | head -1 || echo "0")
    
    echo "当前错误率: ${ERROR_RATE}%"
    
    if (( $(echo "${ERROR_RATE} > ${ERROR_THRESHOLD}" | bc -l) )); then
        echo "错误率超过阈值,回滚金丝雀发布"
        kubectl delete deployment ${SERVICE_NAME}-v2 -n ${NAMESPACE}
        # 恢复原始流量路由
        cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ${SERVICE_NAME}
  namespace: ${NAMESPACE}
spec:
  hosts:
  - ${SERVICE_NAME}
  http:
  - route:
    - destination:
        host: ${SERVICE_NAME}
        subset: v1
      weight: 100
EOF
        exit 1
    fi
    
    if [ "${step}" = "100" ]; then
        echo "金丝雀发布成功,清理旧版本"
        kubectl delete deployment ${SERVICE_NAME}-v1 -n ${NAMESPACE}
        # 重命名新版本为主版本
        kubectl patch deployment ${SERVICE_NAME}-v2 -n ${NAMESPACE} -p '{"metadata":{"name":"'${SERVICE_NAME}'"},"spec":{"selector":{"matchLabels":{"version":"v1"}},"template":{"metadata":{"labels":{"version":"v1"}}}}}'
        break
    fi
done

echo "金丝雀发布完成"

Flagger自动化金丝雀发布

# flagger-canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: user-service
  namespace: microservices
spec:
  # 目标部署
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  
  # 自动扩缩容
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: user-service
  
  # 服务配置
  service:
    port: 80
    targetPort: 8080
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    hosts:
    - api.company.com
    trafficPolicy:
      tls:
        mode: DISABLE
  
  # 分析配置
  analysis:
    # 分析间隔
    interval: 1m
    # 分析阈值
    threshold: 5
    # 最大权重
    maxWeight: 50
    # 权重增加步长
    stepWeight: 10
    # 成功率阈值
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 30s
    # Webhook测试
    webhooks:
    - name: acceptance-test
      type: pre-rollout
      url: http://flagger-loadtester.test/
      timeout: 30s
      metadata:
        type: bash
        cmd: "curl -sd 'test' http://user-service-canary/api/health | grep healthy"
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://user-service-canary/api/health"
  
  # 跳过分析的条件
  skipAnalysis: false

滚动更新策略

# rolling-update-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: microservices
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      # 最大不可用Pod数量
      maxUnavailable: 1
      # 最大超出期望副本数的Pod数量
      maxSurge: 2
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: harbor.company.com/microservices/user-service:v2.0.0
        ports:
        - containerPort: 8080
        # 就绪探针确保Pod准备好接收流量
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        # 存活探针确保Pod健康运行
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        # 优雅关闭
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 15
        # 资源限制
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      # 优雅关闭时间
      terminationGracePeriodSeconds: 30

CI/CD流水线

GitLab CI/CD配置

# .gitlab-ci.yml
stages:
  - test
  - build
  - security
  - deploy-dev
  - deploy-staging
  - deploy-prod

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"
  REGISTRY: harbor.company.com
  PROJECT_NAME: microservices
  SERVICE_NAME: user-service
  KUBECONFIG_FILE: $KUBECONFIG_CONTENT

# 代码测试阶段
unit-test:
  stage: test
  image: golang:1.21-alpine
  before_script:
    - apk add --no-cache git
    - go mod download
  script:
    - go test -v -race -coverprofile=coverage.out ./...
    - go tool cover -html=coverage.out -o coverage.html
  coverage: '/coverage: \d+\.\d+% of statements/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
    paths:
      - coverage.html
    expire_in: 1 week
  only:
    - merge_requests
    - main
    - develop

# 代码质量检查
code-quality:
  stage: test
  image: golangci/golangci-lint:latest
  script:
    - golangci-lint run --out-format code-climate > gl-code-quality-report.json
  artifacts:
    reports:
      codequality: gl-code-quality-report.json
    expire_in: 1 week
  only:
    - merge_requests
    - main
    - develop

# 构建Docker镜像
build-image:
  stage: build
  image: docker:20.10.16
  services:
    - docker:20.10.16-dind
  before_script:
    - echo $HARBOR_PASSWORD | docker login $REGISTRY -u $HARBOR_USERNAME --password-stdin
  script:
    - |
      if [ "$CI_COMMIT_REF_NAME" = "main" ]; then
        TAG="latest"
      else
        TAG="$CI_COMMIT_SHORT_SHA"
      fi
    - docker build -t $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$TAG .
    - docker push $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$TAG
    - echo "IMAGE_TAG=$TAG" > build.env
  artifacts:
    reports:
      dotenv: build.env
  only:
    - main
    - develop
    - /^release\/.*$/

# 安全扫描
security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --format template --template "@contrib/gitlab.tpl" -o gl-container-scanning-report.json $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
  artifacts:
    reports:
      container_scanning: gl-container-scanning-report.json
  dependencies:
    - build-image
  only:
    - main
    - develop
    - /^release\/.*$/

# 部署到开发环境
deploy-dev:
  stage: deploy-dev
  image: bitnami/kubectl:latest
  environment:
    name: development
    url: https://dev-api.company.com
  before_script:
    - echo "$KUBECONFIG_DEV" | base64 -d > kubeconfig
    - export KUBECONFIG=kubeconfig
  script:
    - |
      cat <<EOF | kubectl apply -f -
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: $SERVICE_NAME
        namespace: microservices-dev
        labels:
          app: $SERVICE_NAME
          environment: development
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: $SERVICE_NAME
        template:
          metadata:
            labels:
              app: $SERVICE_NAME
              environment: development
          spec:
            containers:
            - name: $SERVICE_NAME
              image: $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
              ports:
              - containerPort: 8080
              env:
              - name: ENVIRONMENT
                value: development
              - name: DATABASE_URL
                valueFrom:
                  secretKeyRef:
                    name: database-secret
                    key: url
              livenessProbe:
                httpGet:
                  path: /health
                  port: 8080
                initialDelaySeconds: 30
                periodSeconds: 10
              readinessProbe:
                httpGet:
                  path: /ready
                  port: 8080
                initialDelaySeconds: 5
                periodSeconds: 5
              resources:
                requests:
                  memory: "128Mi"
                  cpu: "100m"
                limits:
                  memory: "256Mi"
                  cpu: "200m"
      EOF
    - kubectl rollout status deployment/$SERVICE_NAME -n microservices-dev --timeout=300s
  dependencies:
    - build-image
  only:
    - develop

# 部署到预发布环境
deploy-staging:
  stage: deploy-staging
  image: bitnami/kubectl:latest
  environment:
    name: staging
    url: https://staging-api.company.com
  before_script:
    - echo "$KUBECONFIG_STAGING" | base64 -d > kubeconfig
    - export KUBECONFIG=kubeconfig
  script:
    - |
      # 使用Helm部署
      helm upgrade --install $SERVICE_NAME ./helm/$SERVICE_NAME \
        --namespace microservices-staging \
        --set image.repository=$REGISTRY/$PROJECT_NAME/$SERVICE_NAME \
        --set image.tag=$IMAGE_TAG \
        --set environment=staging \
        --set replicaCount=3 \
        --set resources.requests.memory=256Mi \
        --set resources.requests.cpu=250m \
        --set resources.limits.memory=512Mi \
        --set resources.limits.cpu=500m \
        --wait --timeout=600s
  dependencies:
    - build-image
  when: manual
  only:
    - /^release\/.*$/

# 部署到生产环境
deploy-prod:
  stage: deploy-prod
  image: bitnami/kubectl:latest
  environment:
    name: production
    url: https://api.company.com
  before_script:
    - echo "$KUBECONFIG_PROD" | base64 -d > kubeconfig
    - export KUBECONFIG=kubeconfig
  script:
    - |
      # 使用蓝绿部署策略
      ./scripts/blue-green-deploy.sh $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
  dependencies:
    - build-image
  when: manual
  allow_failure: false
  only:
    - main

Jenkins Pipeline配置

// Jenkinsfile
pipeline {
    agent any
    
    environment {
        REGISTRY = 'harbor.company.com'
        PROJECT_NAME = 'microservices'
        SERVICE_NAME = 'user-service'
        DOCKER_CREDENTIALS = credentials('harbor-credentials')
        KUBECONFIG_DEV = credentials('kubeconfig-dev')
        KUBECONFIG_STAGING = credentials('kubeconfig-staging')
        KUBECONFIG_PROD = credentials('kubeconfig-prod')
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
                script {
                    env.GIT_COMMIT_SHORT = sh(
                        script: 'git rev-parse --short HEAD',
                        returnStdout: true
                    ).trim()
                    
                    if (env.BRANCH_NAME == 'main') {
                        env.IMAGE_TAG = 'latest'
                    } else {
                        env.IMAGE_TAG = env.GIT_COMMIT_SHORT
                    }
                }
            }
        }
        
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps {
                        sh '''
                            go mod download
                            go test -v -race -coverprofile=coverage.out ./...
                            go tool cover -html=coverage.out -o coverage.html
                        '''
                    }
                    post {
                        always {
                            publishHTML([
                                allowMissing: false,
                                alwaysLinkToLastBuild: true,
                                keepAll: true,
                                reportDir: '.',
                                reportFiles: 'coverage.html',
                                reportName: 'Coverage Report'
                            ])
                        }
                    }
                }
                
                stage('Code Quality') {
                    steps {
                        sh 'golangci-lint run --out-format checkstyle > checkstyle-report.xml'
                    }
                    post {
                        always {
                            recordIssues(
                                enabledForFailure: true,
                                tools: [checkStyle(pattern: 'checkstyle-report.xml')]
                            )
                        }
                    }
                }
            }
        }
        
        stage('Build') {
            when {
                anyOf {
                    branch 'main'
                    branch 'develop'
                    branch 'release/*'
                }
            }
            steps {
                script {
                    docker.withRegistry("https://${REGISTRY}", env.DOCKER_CREDENTIALS) {
                        def image = docker.build("${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}")
                        image.push()
                        
                        if (env.BRANCH_NAME == 'main') {
                            image.push('latest')
                        }
                    }
                }
            }
        }
        
        stage('Security Scan') {
            when {
                anyOf {
                    branch 'main'
                    branch 'develop'
                    branch 'release/*'
                }
            }
            steps {
                sh '''
                    trivy image --format json -o trivy-report.json ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
                    trivy image --format table ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
                '''
            }
            post {
                always {
                    archiveArtifacts artifacts: 'trivy-report.json', fingerprint: true
                }
            }
        }
        
        stage('Deploy to Dev') {
            when {
                branch 'develop'
            }
            steps {
                script {
                    withKubeConfig([credentialsId: 'kubeconfig-dev']) {
                        sh '''
                            kubectl set image deployment/${SERVICE_NAME} \
                                ${SERVICE_NAME}=${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG} \
                                -n microservices-dev
                            kubectl rollout status deployment/${SERVICE_NAME} -n microservices-dev --timeout=300s
                        '''
                    }
                }
            }
        }
        
        stage('Deploy to Staging') {
            when {
                branch 'release/*'
            }
            steps {
                input message: 'Deploy to Staging?', ok: 'Deploy'
                script {
                    withKubeConfig([credentialsId: 'kubeconfig-staging']) {
                        sh '''
                            helm upgrade --install ${SERVICE_NAME} ./helm/${SERVICE_NAME} \
                                --namespace microservices-staging \
                                --set image.repository=${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME} \
                                --set image.tag=${IMAGE_TAG} \
                                --set environment=staging \
                                --wait --timeout=600s
                        '''
                    }
                }
            }
        }
        
        stage('Deploy to Production') {
            when {
                branch 'main'
            }
            steps {
                input message: 'Deploy to Production?', ok: 'Deploy'
                script {
                    withKubeConfig([credentialsId: 'kubeconfig-prod']) {
                        sh '''
                            chmod +x ./scripts/blue-green-deploy.sh
                            ./scripts/blue-green-deploy.sh ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
                        '''
                    }
                }
            }
        }
    }
    
    post {
        always {
            cleanWs()
        }
        success {
            slackSend(
                channel: '#deployments',
                color: 'good',
                message: "✅ ${SERVICE_NAME} deployment successful: ${env.BUILD_URL}"
            )
        }
        failure {
            slackSend(
                channel: '#deployments',
                color: 'danger',
                message: "❌ ${SERVICE_NAME} deployment failed: ${env.BUILD_URL}"
            )
        }
    }
}

GitHub Actions配置

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]
  release:
    types: [ published ]

env:
  REGISTRY: harbor.company.com
  PROJECT_NAME: microservices
  SERVICE_NAME: user-service

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Go
      uses: actions/setup-go@v3
      with:
        go-version: 1.21
    
    - name: Cache Go modules
      uses: actions/cache@v3
      with:
        path: ~/go/pkg/mod
        key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
        restore-keys: |
          ${{ runner.os }}-go-
    
    - name: Download dependencies
      run: go mod download
    
    - name: Run tests
      run: |
        go test -v -race -coverprofile=coverage.out ./...
        go tool cover -html=coverage.out -o coverage.html
    
    - name: Upload coverage reports
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.out
    
    - name: Run golangci-lint
      uses: golangci/golangci-lint-action@v3
      with:
        version: latest

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name != 'pull_request'
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2
    
    - name: Login to Harbor
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ secrets.HARBOR_USERNAME }}
        password: ${{ secrets.HARBOR_PASSWORD }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.PROJECT_NAME }}/${{ env.SERVICE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
    
    - name: Build and push
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

  security-scan:
    needs: build
    runs-on: ubuntu-latest
    if: github.event_name != 'pull_request'
    steps:
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ needs.build.outputs.image-tag }}
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  deploy-dev:
    needs: [build, security-scan]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment: development
    steps:
    - uses: actions/checkout@v3
    
    - name: Configure kubectl
      uses: azure/k8s-set-context@v1
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBECONFIG_DEV }}
    
    - name: Deploy to development
      run: |
        kubectl set image deployment/${{ env.SERVICE_NAME }} \
          ${{ env.SERVICE_NAME }}=${{ needs.build.outputs.image-tag }} \
          -n microservices-dev
        kubectl rollout status deployment/${{ env.SERVICE_NAME }} -n microservices-dev --timeout=300s

  deploy-staging:
    needs: [build, security-scan]
    runs-on: ubuntu-latest
    if: github.event_name == 'release'
    environment: staging
    steps:
    - uses: actions/checkout@v3
    
    - name: Configure kubectl
      uses: azure/k8s-set-context@v1
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}
    
    - name: Install Helm
      uses: azure/setup-helm@v3
      with:
        version: '3.10.0'
    
    - name: Deploy to staging
      run: |
        helm upgrade --install ${{ env.SERVICE_NAME }} ./helm/${{ env.SERVICE_NAME }} \
          --namespace microservices-staging \
          --set image.repository=${{ env.REGISTRY }}/${{ env.PROJECT_NAME }}/${{ env.SERVICE_NAME }} \
          --set image.tag=${{ github.event.release.tag_name }} \
          --set environment=staging \
          --wait --timeout=600s

  deploy-prod:
    needs: [build, security-scan]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
    - uses: actions/checkout@v3
    
    - name: Configure kubectl
      uses: azure/k8s-set-context@v1
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBECONFIG_PROD }}
    
    - name: Deploy to production
      run: |
        chmod +x ./scripts/blue-green-deploy.sh
        ./scripts/blue-green-deploy.sh ${{ needs.build.outputs.image-tag }}

运维自动化

Helm Chart模板

# helm/user-service/Chart.yaml
apiVersion: v2
name: user-service
description: A Helm chart for User Service
type: application
version: 0.1.0
appVersion: "1.0.0"

dependencies:
- name: postgresql
  version: "11.9.13"
  repository: "https://charts.bitnami.com/bitnami"
  condition: postgresql.enabled
- name: redis
  version: "17.3.7"
  repository: "https://charts.bitnami.com/bitnami"
  condition: redis.enabled
# helm/user-service/values.yaml
# 默认配置值
replicaCount: 3

image:
  repository: harbor.company.com/microservices/user-service
  pullPolicy: IfNotPresent
  tag: "latest"

imagePullSecrets:
  - name: harbor-secret

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

podSecurityContext:
  fsGroup: 2000

securityContext:
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: "nginx"
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
  hosts:
    - host: api.company.com
      paths:
        - path: /api/users
          pathType: Prefix
  tls:
    - secretName: api-tls
      hosts:
        - api.company.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - user-service
        topologyKey: kubernetes.io/hostname

# 环境配置
environment: production

# 配置映射
config:
  database:
    host: postgresql
    port: 5432
    name: userdb
    sslmode: require
  redis:
    host: redis-master
    port: 6379
    db: 0
  logging:
    level: info
    format: json
  metrics:
    enabled: true
    port: 8080
    path: /metrics

# 密钥配置
secrets:
  database:
    username: postgres
    password: ""
  redis:
    password: ""
  jwt:
    secret: ""

# 健康检查
healthCheck:
  livenessProbe:
    httpGet:
      path: /health
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3
  readinessProbe:
    httpGet:
      path: /ready
      port: http
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 3
    successThreshold: 1
    failureThreshold: 3

# 依赖服务配置
postgresql:
  enabled: true
  auth:
    postgresPassword: "postgres123"
    username: "userservice"
    password: "userservice123"
    database: "userdb"
  primary:
    persistence:
      enabled: true
      size: 8Gi

redis:
  enabled: true
  auth:
    enabled: true
    password: "redis123"
  master:
    persistence:
      enabled: true
      size: 8Gi
# helm/user-service/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "user-service.fullname" . }}
  labels:
    {{- include "user-service.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "user-service.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "user-service.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "user-service.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: ENVIRONMENT
              value: {{ .Values.environment }}
            - name: DATABASE_HOST
              value: {{ .Values.config.database.host }}
            - name: DATABASE_PORT
              value: "{{ .Values.config.database.port }}"
            - name: DATABASE_NAME
              value: {{ .Values.config.database.name }}
            - name: DATABASE_USERNAME
              valueFrom:
                secretKeyRef:
                  name: {{ include "user-service.fullname" . }}-secret
                  key: database-username
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: {{ include "user-service.fullname" . }}-secret
                  key: database-password
            - name: REDIS_HOST
              value: {{ .Values.config.redis.host }}
            - name: REDIS_PORT
              value: "{{ .Values.config.redis.port }}"
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: {{ include "user-service.fullname" . }}-secret
                  key: redis-password
            - name: JWT_SECRET
              valueFrom:
                secretKeyRef:
                  name: {{ include "user-service.fullname" . }}-secret
                  key: jwt-secret
          livenessProbe:
            {{- toYaml .Values.healthCheck.livenessProbe | nindent 12 }}
          readinessProbe:
            {{- toYaml .Values.healthCheck.readinessProbe | nindent 12 }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          volumeMounts:
            - name: config
              mountPath: /app/config
              readOnly: true
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: config
          configMap:
            name: {{ include "user-service.fullname" . }}-config
        - name: tmp
          emptyDir: {}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

Terraform基础设施即代码

# terraform/main.tf
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.11"
    }
  }
  
  backend "s3" {
    bucket = "company-terraform-state"
    key    = "microservices/terraform.tfstate"
    region = "us-west-2"
  }
}

provider "aws" {
  region = var.aws_region
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

# VPC配置
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  
  name = "${var.project_name}-vpc"
  cidr = var.vpc_cidr
  
  azs             = var.availability_zones
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets
  
  enable_nat_gateway = true
  enable_vpn_gateway = false
  enable_dns_hostnames = true
  enable_dns_support = true
  
  tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
  
  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
  }
  
  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
  }
}

# EKS集群
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  
  cluster_name    = var.cluster_name
  cluster_version = var.kubernetes_version
  
  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = module.vpc.private_subnets
  cluster_endpoint_public_access = true
  
  # 节点组配置
  eks_managed_node_groups = {
    main = {
      name = "main-node-group"
      
      instance_types = ["t3.medium"]
      
      min_size     = 3
      max_size     = 10
      desired_size = 6
      
      disk_size = 50
      
      labels = {
        Environment = var.environment
        NodeGroup   = "main"
      }
      
      taints = []
      
      tags = {
        Environment = var.environment
      }
    }
  }
  
  # 集群插件
  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent = true
    }
  }
  
  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

# RDS数据库
resource "aws_db_subnet_group" "main" {
  name       = "${var.project_name}-db-subnet-group"
  subnet_ids = module.vpc.private_subnets
  
  tags = {
    Name = "${var.project_name} DB subnet group"
  }
}

resource "aws_security_group" "rds" {
  name_prefix = "${var.project_name}-rds-"
  vpc_id      = module.vpc.vpc_id
  
  ingress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

resource "aws_db_instance" "main" {
  identifier = "${var.project_name}-db"
  
  engine         = "postgres"
  engine_version = "14.9"
  instance_class = "db.t3.micro"
  
  allocated_storage     = 20
  max_allocated_storage = 100
  storage_type          = "gp2"
  storage_encrypted     = true
  
  db_name  = "microservices"
  username = "postgres"
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = true
  deletion_protection = false
  
  tags = {
    Name = "${var.project_name}-database"
  }
}

# ElastiCache Redis
resource "aws_elasticache_subnet_group" "main" {
  name       = "${var.project_name}-cache-subnet"
  subnet_ids = module.vpc.private_subnets
}

resource "aws_security_group" "redis" {
  name_prefix = "${var.project_name}-redis-"
  vpc_id      = module.vpc.vpc_id
  
  ingress {
    from_port   = 6379
    to_port     = 6379
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-redis-sg"
  }
}

resource "aws_elasticache_replication_group" "main" {
  replication_group_id       = "${var.project_name}-redis"
  description                = "Redis cluster for ${var.project_name}"
  
  node_type            = "cache.t3.micro"
  port                 = 6379
  parameter_group_name = "default.redis7"
  
  num_cache_clusters = 2
  
  subnet_group_name  = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]
  
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = var.redis_auth_token
  
  tags = {
    Name = "${var.project_name}-redis"
  }
}
# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-west-2"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "microservices"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
  default     = "microservices-cluster"
}

variable "kubernetes_version" {
  description = "Kubernetes version"
  type        = string
  default     = "1.28"
}

variable "vpc_cidr" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "Availability zones"
  type        = list(string)
  default     = ["us-west-2a", "us-west-2b", "us-west-2c"]
}

variable "private_subnets" {
  description = "Private subnet CIDR blocks"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}

variable "public_subnets" {
  description = "Public subnet CIDR blocks"
  type        = list(string)
  default     = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
}

variable "db_password" {
  description = "Database password"
  type        = string
  sensitive   = true
}

variable "redis_auth_token" {
  description = "Redis auth token"
  type        = string
  sensitive   = true
}

Ansible自动化运维

# ansible/playbooks/deploy-microservices.yml
---
- name: Deploy Microservices to Kubernetes
  hosts: localhost
  connection: local
  gather_facts: false
  
  vars:
    namespace: microservices
    services:
      - name: user-service
        image: harbor.company.com/microservices/user-service
        tag: "{{ user_service_tag | default('latest') }}"
        replicas: 3
      - name: order-service
        image: harbor.company.com/microservices/order-service
        tag: "{{ order_service_tag | default('latest') }}"
        replicas: 3
      - name: payment-service
        image: harbor.company.com/microservices/payment-service
        tag: "{{ payment_service_tag | default('latest') }}"
        replicas: 2
  
  tasks:
    - name: Create namespace
      kubernetes.core.k8s:
        name: "{{ namespace }}"
        api_version: v1
        kind: Namespace
        state: present
    
    - name: Deploy services
      kubernetes.core.helm:
        name: "{{ item.name }}"
        chart_ref: "./helm/{{ item.name }}"
        release_namespace: "{{ namespace }}"
        create_namespace: true
        values:
          image:
            repository: "{{ item.image }}"
            tag: "{{ item.tag }}"
          replicaCount: "{{ item.replicas }}"
          environment: "{{ target_environment | default('production') }}"
        wait: true
        timeout: 600s
      loop: "{{ services }}"
    
    - name: Wait for deployments to be ready
      kubernetes.core.k8s_info:
        api_version: apps/v1
        kind: Deployment
        name: "{{ item.name }}"
        namespace: "{{ namespace }}"
        wait: true
        wait_condition:
          type: Available
          status: "True"
        wait_timeout: 600
      loop: "{{ services }}"
    
    - name: Verify service health
      uri:
        url: "http://{{ item.name }}.{{ namespace }}.svc.cluster.local/health"
        method: GET
        status_code: 200
      loop: "{{ services }}"
      retries: 5
      delay: 10
# ansible/playbooks/backup-databases.yml
---
- name: Backup Databases
  hosts: localhost
  connection: local
  gather_facts: false
  
  vars:
    backup_bucket: company-database-backups
    timestamp: "{{ ansible_date_time.epoch }}"
    databases:
      - name: userdb
        host: postgres.microservices.svc.cluster.local
        port: 5432
        username: postgres
      - name: orderdb
        host: postgres.microservices.svc.cluster.local
        port: 5432
        username: postgres
  
  tasks:
    - name: Create backup directory
      file:
        path: "/tmp/backups/{{ timestamp }}"
        state: directory
        mode: '0755'
    
    - name: Backup PostgreSQL databases
      shell: |
        kubectl exec -n microservices deployment/postgres -- \
          pg_dump -h localhost -U {{ item.username }} -d {{ item.name }} \
          > /tmp/backups/{{ timestamp }}/{{ item.name }}_{{ timestamp }}.sql
      loop: "{{ databases }}"
      environment:
        PGPASSWORD: "{{ postgres_password }}"
    
    - name: Compress backups
      archive:
        path: "/tmp/backups/{{ timestamp }}"
        dest: "/tmp/backups/backup_{{ timestamp }}.tar.gz"
        format: gz
    
    - name: Upload to S3
      amazon.aws.s3_object:
        bucket: "{{ backup_bucket }}"
        object: "database-backups/{{ ansible_date_time.date }}/backup_{{ timestamp }}.tar.gz"
        src: "/tmp/backups/backup_{{ timestamp }}.tar.gz"
        mode: put
    
    - name: Clean up local backups
      file:
        path: "/tmp/backups"
        state: absent
    
    - name: Send notification
      uri:
        url: "{{ slack_webhook_url }}"
        method: POST
        body_format: json
        body:
          text: "✅ Database backup completed successfully at {{ ansible_date_time.iso8601 }}"
      when: slack_webhook_url is defined

本章总结

本章深入探讨了微服务的部署与运维,涵盖了现代微服务架构中的关键运维实践:

主要内容回顾

  1. 部署与运维挑战

    • 复杂性管理:多服务协调、依赖管理、配置管理
    • 可靠性保证:高可用性、故障恢复、数据一致性
    • 运维效率:自动化部署、监控告警、问题诊断
  2. 容器化部署策略

    • Docker容器化:多阶段构建、镜像优化、安全配置
    • Docker Compose:本地开发环境、服务编排
    • 容器最佳实践:镜像分层、资源限制、健康检查
  3. Kubernetes部署配置

    • 核心资源:Deployment、Service、ConfigMap、Secret
    • 高级特性:HPA自动扩缩容、RBAC权限控制
    • 生产配置:资源限制、亲和性规则、安全上下文
  4. 蓝绿部署和金丝雀发布

    • 蓝绿部署:零停机切换、快速回滚、环境隔离
    • 金丝雀发布:渐进式发布、风险控制、自动化回滚
    • Istio集成:流量管理、A/B测试、Flagger自动化
  5. CI/CD流水线

    • GitLab CI/CD:完整的DevOps流水线、多环境部署
    • Jenkins Pipeline:企业级CI/CD、插件生态
    • GitHub Actions:云原生CI/CD、社区集成
  6. 运维自动化

    • Helm Charts:应用打包、模板化部署、依赖管理
    • Terraform:基础设施即代码、云资源管理
    • Ansible:配置管理、自动化运维、批量操作

最佳实践

  1. 部署策略选择

    • 根据业务特点选择合适的部署策略
    • 结合监控指标进行自动化决策
    • 建立完善的回滚机制
  2. 基础设施管理

    • 使用基础设施即代码管理云资源
    • 实施多环境一致性配置
    • 建立灾难恢复计划
  3. 运维自动化

    • 自动化日常运维任务
    • 建立标准化的运维流程
    • 实施持续改进机制
  4. 安全与合规

    • 实施容器安全扫描
    • 建立访问控制和审计机制
    • 遵循安全最佳实践

技术要点

  • 容器化:Docker多阶段构建、镜像优化、安全配置
  • 编排管理:Kubernetes资源管理、服务发现、配置管理
  • 部署策略:蓝绿部署、金丝雀发布、滚动更新
  • CI/CD:自动化测试、构建、部署、监控
  • 基础设施:IaC、云原生、弹性扩缩容
  • 运维自动化:配置管理、备份恢复、监控告警

通过本章的学习,您应该能够: - 设计和实施完整的微服务部署方案 - 选择和配置适合的部署策略 - 建立自动化的CI/CD流水线 - 实施基础设施即代码管理 - 建立高效的运维自动化体系

下一章我们将探讨微服务的性能优化与调优,学习如何提升微服务系统的性能和效率。