概述
微服务部署与运维是微服务架构成功实施的关键环节。本章将深入探讨微服务的部署策略、CI/CD流水线设计、容器化技术、服务网格运维以及故障排查等核心内容。
部署与运维挑战
- 复杂性管理:多服务协调部署
- 环境一致性:开发、测试、生产环境统一
- 版本管理:服务间版本兼容性
- 故障隔离:快速定位和解决问题
- 性能监控:实时监控和优化
- 安全保障:部署过程和运行时安全
运维架构概览
# 微服务运维架构
apiVersion: v1
kind: ConfigMap
metadata:
name: devops-architecture
namespace: devops
data:
architecture.yml: |
# 源码管理
source_control:
git:
repositories:
- name: user-service
url: https://github.com/company/user-service
- name: order-service
url: https://github.com/company/order-service
- name: payment-service
url: https://github.com/company/payment-service
branching_strategy: GitFlow
# CI/CD流水线
cicd_pipeline:
tools:
ci: Jenkins/GitLab CI/GitHub Actions
cd: ArgoCD/Flux
registry: Harbor/Docker Hub
scanning: Trivy/Clair
stages:
- source_checkout
- unit_tests
- integration_tests
- security_scan
- build_image
- push_registry
- deploy_staging
- e2e_tests
- deploy_production
# 容器化平台
containerization:
runtime: Docker/Containerd
orchestration: Kubernetes
service_mesh: Istio/Linkerd
ingress: Nginx/Traefik
# 部署策略
deployment_strategies:
- blue_green
- canary
- rolling_update
- recreate
# 监控运维
operations:
monitoring:
metrics: Prometheus
logging: ELK Stack
tracing: Jaeger
alerting: Alertmanager
backup:
databases: Velero
configurations: Git
disaster_recovery:
rpo: 1h
rto: 30min
容器化部署策略
Docker容器化
多阶段构建Dockerfile
# 用户服务Dockerfile
# 构建阶段
FROM golang:1.21-alpine AS builder
# 设置工作目录
WORKDIR /app
# 安装依赖
RUN apk add --no-cache git ca-certificates tzdata
# 复制go mod文件
COPY go.mod go.sum ./
RUN go mod download
# 复制源码
COPY . .
# 构建应用
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd/server
# 运行阶段
FROM alpine:latest
# 安装ca证书
RUN apk --no-cache add ca-certificates
# 创建非root用户
RUN addgroup -g 1001 appgroup && \
adduser -D -s /bin/sh -u 1001 -G appgroup appuser
# 设置工作目录
WORKDIR /root/
# 从构建阶段复制二进制文件
COPY --from=builder /app/main .
COPY --from=builder /app/configs ./configs
# 设置文件权限
RUN chown -R appuser:appgroup /root
# 切换到非root用户
USER appuser
# 暴露端口
EXPOSE 8080
# 健康检查
HEALTHCHEK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1
# 启动应用
CMD ["./main"]
Docker Compose开发环境
# docker-compose.yml
version: '3.8'
services:
# 用户服务
user-service:
build:
context: ./user-service
dockerfile: Dockerfile
ports:
- "8081:8080"
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=userdb
- DB_USER=postgres
- DB_PASSWORD=password
- REDIS_HOST=redis
- REDIS_PORT=6379
depends_on:
- postgres
- redis
networks:
- microservices
volumes:
- ./user-service/configs:/app/configs
restart: unless-stopped
# 订单服务
order-service:
build:
context: ./order-service
dockerfile: Dockerfile
ports:
- "8082:8080"
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=orderdb
- DB_USER=postgres
- DB_PASSWORD=password
- USER_SERVICE_URL=http://user-service:8080
- PAYMENT_SERVICE_URL=http://payment-service:8080
depends_on:
- postgres
- user-service
networks:
- microservices
restart: unless-stopped
# 支付服务
payment-service:
build:
context: ./payment-service
dockerfile: Dockerfile
ports:
- "8083:8080"
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=paymentdb
- DB_USER=postgres
- DB_PASSWORD=password
depends_on:
- postgres
networks:
- microservices
restart: unless-stopped
# PostgreSQL数据库
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=password
- POSTGRES_MULTIPLE_DATABASES=userdb,orderdb,paymentdb
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./scripts/init-databases.sh:/docker-entrypoint-initdb.d/init-databases.sh
networks:
- microservices
restart: unless-stopped
# Redis缓存
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
networks:
- microservices
restart: unless-stopped
command: redis-server --appendonly yes
# API网关
api-gateway:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/conf.d:/etc/nginx/conf.d
depends_on:
- user-service
- order-service
- payment-service
networks:
- microservices
restart: unless-stopped
volumes:
postgres_data:
redis_data:
networks:
microservices:
driver: bridge
Kubernetes部署配置
用户服务部署
# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: microservices
labels:
app: user-service
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: user-service
version: v1
template:
metadata:
labels:
app: user-service
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: user-service
containers:
- name: user-service
image: harbor.company.com/microservices/user-service:v1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
- containerPort: 9090
name: grpc
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: database-secret
key: host
- name: DB_PORT
valueFrom:
secretKeyRef:
name: database-secret
key: port
- name: DB_NAME
valueFrom:
secretKeyRef:
name: database-secret
key: database
- name: DB_USER
valueFrom:
secretKeyRef:
name: database-secret
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: database-secret
key: password
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: redis-config
key: host
- name: REDIS_PORT
valueFrom:
configMapKeyRef:
name: redis-config
key: port
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: config
mountPath: /app/configs
readOnly: true
- name: logs
mountPath: /app/logs
securityContext:
runAsNonRoot: true
runAsUser: 1001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumes:
- name: config
configMap:
name: user-service-config
- name: logs
emptyDir: {}
imagePullSecrets:
- name: harbor-secret
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- user-service
topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: microservices
labels:
app: user-service
spec:
selector:
app: user-service
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- name: grpc
port: 9090
targetPort: 9090
protocol: TCP
type: ClusterIP
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: user-service
namespace: microservices
---
apiVersion: v1
kind: ConfigMap
metadata:
name: user-service-config
namespace: microservices
data:
app.yaml: |
server:
port: 8080
grpc_port: 9090
read_timeout: 30s
write_timeout: 30s
idle_timeout: 60s
logging:
level: info
format: json
output: stdout
metrics:
enabled: true
path: /metrics
tracing:
enabled: true
jaeger_endpoint: http://jaeger-collector:14268/api/traces
sample_rate: 0.1
---
apiVersion: v1
kind: Secret
metadata:
name: database-secret
namespace: microservices
type: Opaque
data:
host: cG9zdGdyZXNxbC5kYXRhYmFzZS5zdmMuY2x1c3Rlci5sb2NhbA== # postgresql.database.svc.cluster.local
port: NTQzMg== # 5432
database: dXNlcmRi # userdb
username: cG9zdGdyZXM= # postgres
password: cGFzc3dvcmQ= # password
HorizontalPodAutoscaler配置
# user-service-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
namespace: microservices
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
CI/CD流水线设计
Jenkins流水线
// Jenkinsfile
pipeline {
agent any
environment {
DOCKER_REGISTRY = 'harbor.company.com'
DOCKER_REPO = 'microservices'
IMAGE_NAME = 'user-service'
KUBECONFIG = credentials('kubeconfig')
HARBOR_CREDENTIALS = credentials('harbor-credentials')
SONAR_TOKEN = credentials('sonar-token')
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.GIT_COMMIT_SHORT = sh(
script: "git rev-parse --short HEAD",
returnStdout: true
).trim()
env.BUILD_VERSION = "${env.BUILD_NUMBER}-${env.GIT_COMMIT_SHORT}"
}
}
}
stage('Unit Tests') {
steps {
sh '''
go mod download
go test -v -race -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
'''
}
post {
always {
publishHTML([
allowMissing: false,
alwaysLinkToLastBuild: true,
keepAll: true,
reportDir: '.',
reportFiles: 'coverage.html',
reportName: 'Coverage Report'
])
}
}
}
stage('Code Quality') {
steps {
sh '''
# 静态代码分析
golangci-lint run --out-format checkstyle > golangci-lint-report.xml || true
# SonarQube分析
sonar-scanner \
-Dsonar.projectKey=${IMAGE_NAME} \
-Dsonar.sources=. \
-Dsonar.host.url=http://sonarqube:9000 \
-Dsonar.login=${SONAR_TOKEN}
'''
}
post {
always {
recordIssues(
enabledForFailure: true,
tools: [checkStyle(pattern: 'golangci-lint-report.xml')]
)
}
}
}
stage('Security Scan') {
steps {
sh '''
# 依赖漏洞扫描
nancy sleuth
# 代码安全扫描
gosec -fmt json -out gosec-report.json ./...
'''
}
}
stage('Build Image') {
steps {
script {
def image = docker.build("${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}")
// 镜像安全扫描
sh '''
trivy image --format json --output trivy-report.json \
${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}
'''
// 推送镜像
docker.withRegistry("https://${DOCKER_REGISTRY}", 'harbor-credentials') {
image.push()
image.push('latest')
}
}
}
}
stage('Integration Tests') {
steps {
sh '''
# 启动测试环境
docker-compose -f docker-compose.test.yml up -d
# 等待服务启动
sleep 30
# 运行集成测试
go test -v -tags=integration ./tests/integration/...
'''
}
post {
always {
sh 'docker-compose -f docker-compose.test.yml down'
}
}
}
stage('Deploy to Staging') {
when {
branch 'develop'
}
steps {
sh '''
# 更新Kubernetes部署
sed -i "s|image: .*|image: ${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}|" \
k8s/staging/deployment.yaml
kubectl apply -f k8s/staging/ --namespace=staging
# 等待部署完成
kubectl rollout status deployment/${IMAGE_NAME} --namespace=staging --timeout=300s
'''
}
}
stage('E2E Tests') {
when {
branch 'develop'
}
steps {
sh '''
# 运行端到端测试
npm install
npm run test:e2e -- --env staging
'''
}
post {
always {
publishTestResults testResultsPattern: 'test-results.xml'
}
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
steps {
script {
// 人工审批
input message: 'Deploy to production?', ok: 'Deploy',
submitterParameter: 'DEPLOYER'
sh '''
# 蓝绿部署
./scripts/blue-green-deploy.sh \
${DOCKER_REGISTRY}/${DOCKER_REPO}/${IMAGE_NAME}:${BUILD_VERSION}
'''
}
}
}
}
post {
always {
// 清理工作空间
cleanWs()
}
success {
// 发送成功通知
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ ${IMAGE_NAME} ${BUILD_VERSION} deployed successfully by ${env.DEPLOYER ?: 'Jenkins'}"
)
}
failure {
// 发送失败通知
slackSend(
channel: '#deployments',
color: 'danger',
message: "❌ ${IMAGE_NAME} ${BUILD_VERSION} deployment failed"
)
}
}
}
GitLab CI配置
# .gitlab-ci.yml
stages:
- test
- build
- security
- deploy-staging
- deploy-production
variables:
DOCKER_REGISTRY: harbor.company.com
DOCKER_REPO: microservices
IMAGE_NAME: user-service
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
before_script:
- export BUILD_VERSION="${CI_PIPELINE_ID}-${CI_COMMIT_SHORT_SHA}"
# 单元测试
unit-test:
stage: test
image: golang:1.21
services:
- postgres:15
- redis:7
variables:
POSTGRES_DB: testdb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
DB_HOST: postgres
REDIS_HOST: redis
script:
- go mod download
- go test -v -race -coverprofile=coverage.out ./...
- go tool cover -func=coverage.out
coverage: '/total:.*?(\d+\.\d+)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
paths:
- coverage.out
- coverage.html
expire_in: 1 week
# 代码质量检查
code-quality:
stage: test
image: golangci/golangci-lint:latest
script:
- golangci-lint run --out-format code-climate > gl-code-quality-report.json
artifacts:
reports:
codequality: gl-code-quality-report.json
expire_in: 1 week
allow_failure: true
# 构建Docker镜像
build-image:
stage: build
image: docker:latest
services:
- docker:dind
before_script:
- echo $HARBOR_PASSWORD | docker login $DOCKER_REGISTRY -u $HARBOR_USERNAME --password-stdin
script:
- docker build -t $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION .
- docker push $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
- docker tag $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:latest
- docker push $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:latest
only:
- develop
- main
# 安全扫描
security-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy image --format template --template "@contrib/gitlab.tpl" --output gl-container-scanning-report.json $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
artifacts:
reports:
container_scanning: gl-container-scanning-report.json
expire_in: 1 week
dependencies:
- build-image
only:
- develop
- main
# 部署到测试环境
deploy-staging:
stage: deploy-staging
image: bitnami/kubectl:latest
environment:
name: staging
url: https://staging.company.com
script:
- kubectl config use-context staging
- sed -i "s|image: .*|image: $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION|" k8s/staging/deployment.yaml
- kubectl apply -f k8s/staging/ --namespace=staging
- kubectl rollout status deployment/$IMAGE_NAME --namespace=staging --timeout=300s
dependencies:
- build-image
only:
- develop
# 部署到生产环境
deploy-production:
stage: deploy-production
image: bitnami/kubectl:latest
environment:
name: production
url: https://api.company.com
script:
- kubectl config use-context production
- ./scripts/canary-deploy.sh $DOCKER_REGISTRY/$DOCKER_REPO/$IMAGE_NAME:$BUILD_VERSION
when: manual
dependencies:
- build-image
only:
- main
ArgoCD GitOps配置
# argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: user-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: microservices
source:
repoURL: https://github.com/company/microservices-manifests
targetRevision: HEAD
path: user-service
helm:
valueFiles:
- values.yaml
- values-production.yaml
destination:
server: https://kubernetes.default.svc
namespace: microservices
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
revisionHistoryLimit: 10
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: microservices
namespace: argocd
spec:
description: Microservices project
sourceRepos:
- 'https://github.com/company/*'
destinations:
- namespace: microservices
server: https://kubernetes.default.svc
- namespace: staging
server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: ''
kind: Namespace
namespaceResourceWhitelist:
- group: ''
kind: ConfigMap
- group: ''
kind: Secret
- group: ''
kind: Service
- group: apps
kind: Deployment
- group: apps
kind: ReplicaSet
- group: autoscaling
kind: HorizontalPodAutoscaler
roles:
- name: admin
description: Admin access
policies:
- p, proj:microservices:admin, applications, *, microservices/*, allow
groups:
- company:devops
- name: developer
description: Developer access
policies:
- p, proj:microservices:developer, applications, get, microservices/*, allow
- p, proj:microservices:developer, applications, sync, microservices/*, allow
groups:
- company:developers
蓝绿部署和金丝雀发布
蓝绿部署策略
蓝绿部署是一种零停机部署策略,通过维护两个相同的生产环境(蓝环境和绿环境)来实现快速切换和回滚。
蓝绿部署脚本
#!/bin/bash
# blue-green-deploy.sh
set -e
# 配置参数
NAMESPACE="microservices"
SERVICE_NAME="user-service"
NEW_IMAGE="$1"
TIMEOUT="300s"
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${BLUE}开始蓝绿部署: ${SERVICE_NAME}${NC}"
echo -e "${BLUE}新镜像: ${NEW_IMAGE}${NC}"
# 检查当前活跃环境
ACTIVE_ENV=$(kubectl get service ${SERVICE_NAME} -n ${NAMESPACE} -o jsonpath='{.spec.selector.environment}' 2>/dev/null || echo "blue")
echo -e "${YELLOW}当前活跃环境: ${ACTIVE_ENV}${NC}"
# 确定目标环境
if [ "$ACTIVE_ENV" = "blue" ]; then
TARGET_ENV="green"
INACTIVE_ENV="blue"
else
TARGET_ENV="blue"
INACTIVE_ENV="green"
fi
echo -e "${YELLOW}目标部署环境: ${TARGET_ENV}${NC}"
# 部署到目标环境
echo -e "${BLUE}部署到 ${TARGET_ENV} 环境...${NC}"
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${SERVICE_NAME}-${TARGET_ENV}
namespace: ${NAMESPACE}
labels:
app: ${SERVICE_NAME}
environment: ${TARGET_ENV}
spec:
replicas: 3
selector:
matchLabels:
app: ${SERVICE_NAME}
environment: ${TARGET_ENV}
template:
metadata:
labels:
app: ${SERVICE_NAME}
environment: ${TARGET_ENV}
annotations:
deployment.kubernetes.io/revision: "$(date +%s)"
spec:
containers:
- name: ${SERVICE_NAME}
image: ${NEW_IMAGE}
ports:
- containerPort: 8080
env:
- name: ENVIRONMENT
value: ${TARGET_ENV}
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
EOF
# 等待部署完成
echo -e "${BLUE}等待 ${TARGET_ENV} 环境部署完成...${NC}"
kubectl rollout status deployment/${SERVICE_NAME}-${TARGET_ENV} -n ${NAMESPACE} --timeout=${TIMEOUT}
# 健康检查
echo -e "${BLUE}执行健康检查...${NC}"
for i in {1..30}; do
if kubectl get pods -n ${NAMESPACE} -l app=${SERVICE_NAME},environment=${TARGET_ENV} --field-selector=status.phase=Running | grep -q Running; then
echo -e "${GREEN}健康检查通过${NC}"
break
fi
echo "等待Pod启动... ($i/30)"
sleep 10
done
# 烟雾测试
echo -e "${BLUE}执行烟雾测试...${NC}"
TEST_POD=$(kubectl get pods -n ${NAMESPACE} -l app=${SERVICE_NAME},environment=${TARGET_ENV} -o jsonpath='{.items[0].metadata.name}')
TEST_RESULT=$(kubectl exec -n ${NAMESPACE} ${TEST_POD} -- wget -qO- http://localhost:8080/health)
if [[ $TEST_RESULT == *"healthy"* ]]; then
echo -e "${GREEN}烟雾测试通过${NC}"
else
echo -e "${RED}烟雾测试失败,回滚部署${NC}"
kubectl delete deployment ${SERVICE_NAME}-${TARGET_ENV} -n ${NAMESPACE}
exit 1
fi
# 切换流量
echo -e "${BLUE}切换流量到 ${TARGET_ENV} 环境...${NC}"
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"'${TARGET_ENV}'"}}}'
# 验证切换
echo -e "${BLUE}验证流量切换...${NC}"
sleep 10
for i in {1..5}; do
RESPONSE=$(curl -s http://${SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/health || echo "failed")
if [[ $RESPONSE == *"healthy"* ]]; then
echo -e "${GREEN}流量切换验证通过 ($i/5)${NC}"
else
echo -e "${RED}流量切换验证失败 ($i/5)${NC}"
if [ $i -eq 5 ]; then
echo -e "${RED}回滚流量到 ${ACTIVE_ENV} 环境${NC}"
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"'${ACTIVE_ENV}'"}}}'
exit 1
fi
fi
sleep 5
done
# 清理旧环境
echo -e "${BLUE}清理旧环境 ${INACTIVE_ENV}...${NC}"
kubectl delete deployment ${SERVICE_NAME}-${INACTIVE_ENV} -n ${NAMESPACE} --ignore-not-found=true
echo -e "${GREEN}蓝绿部署完成!${NC}"
echo -e "${GREEN}当前活跃环境: ${TARGET_ENV}${NC}"
echo -e "${GREEN}新镜像: ${NEW_IMAGE}${NC}"
金丝雀发布策略
金丝雀发布通过逐步增加新版本的流量比例来降低部署风险。
Istio金丝雀发布配置
# canary-virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: user-service
namespace: microservices
spec:
hosts:
- user-service
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: user-service
subset: v2
weight: 100
- route:
- destination:
host: user-service
subset: v1
weight: 90
- destination:
host: user-service
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: user-service
namespace: microservices
spec:
host: user-service
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
金丝雀发布脚本
#!/bin/bash
# canary-deploy.sh
set -e
# 配置参数
NAMESPACE="microservices"
SERVICE_NAME="user-service"
NEW_IMAGE="$1"
CANARY_STEPS=(5 10 25 50 75 100)
STEP_DURATION="300" # 5分钟
ERROR_THRESHOLD="5" # 错误率阈值5%
echo "开始金丝雀发布: ${SERVICE_NAME}"
echo "新镜像: ${NEW_IMAGE}"
# 部署金丝雀版本
echo "部署金丝雀版本..."
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${SERVICE_NAME}-v2
namespace: ${NAMESPACE}
spec:
replicas: 1
selector:
matchLabels:
app: ${SERVICE_NAME}
version: v2
template:
metadata:
labels:
app: ${SERVICE_NAME}
version: v2
spec:
containers:
- name: ${SERVICE_NAME}
image: ${NEW_IMAGE}
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080
EOF
# 等待金丝雀版本就绪
kubectl rollout status deployment/${SERVICE_NAME}-v2 -n ${NAMESPACE}
# 逐步增加流量
for step in "${CANARY_STEPS[@]}"; do
echo "设置金丝雀流量比例: ${step}%"
# 更新VirtualService
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ${SERVICE_NAME}
namespace: ${NAMESPACE}
spec:
hosts:
- ${SERVICE_NAME}
http:
- route:
- destination:
host: ${SERVICE_NAME}
subset: v1
weight: $((100 - step))
- destination:
host: ${SERVICE_NAME}
subset: v2
weight: ${step}
EOF
echo "等待 ${STEP_DURATION} 秒观察指标..."
sleep ${STEP_DURATION}
# 检查错误率
ERROR_RATE=$(kubectl exec -n istio-system deployment/prometheus -- \
promtool query instant \
'rate(istio_request_total{destination_service_name="'${SERVICE_NAME}'",response_code!~"2.."}[5m]) / rate(istio_request_total{destination_service_name="'${SERVICE_NAME}'"}[5m]) * 100' \
| grep -oP '\d+\.\d+' | head -1 || echo "0")
echo "当前错误率: ${ERROR_RATE}%"
if (( $(echo "${ERROR_RATE} > ${ERROR_THRESHOLD}" | bc -l) )); then
echo "错误率超过阈值,回滚金丝雀发布"
kubectl delete deployment ${SERVICE_NAME}-v2 -n ${NAMESPACE}
# 恢复原始流量路由
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ${SERVICE_NAME}
namespace: ${NAMESPACE}
spec:
hosts:
- ${SERVICE_NAME}
http:
- route:
- destination:
host: ${SERVICE_NAME}
subset: v1
weight: 100
EOF
exit 1
fi
if [ "${step}" = "100" ]; then
echo "金丝雀发布成功,清理旧版本"
kubectl delete deployment ${SERVICE_NAME}-v1 -n ${NAMESPACE}
# 重命名新版本为主版本
kubectl patch deployment ${SERVICE_NAME}-v2 -n ${NAMESPACE} -p '{"metadata":{"name":"'${SERVICE_NAME}'"},"spec":{"selector":{"matchLabels":{"version":"v1"}},"template":{"metadata":{"labels":{"version":"v1"}}}}}'
break
fi
done
echo "金丝雀发布完成"
Flagger自动化金丝雀发布
# flagger-canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: user-service
namespace: microservices
spec:
# 目标部署
targetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
# 自动扩缩容
autoscalerRef:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
name: user-service
# 服务配置
service:
port: 80
targetPort: 8080
gateways:
- public-gateway.istio-system.svc.cluster.local
hosts:
- api.company.com
trafficPolicy:
tls:
mode: DISABLE
# 分析配置
analysis:
# 分析间隔
interval: 1m
# 分析阈值
threshold: 5
# 最大权重
maxWeight: 50
# 权重增加步长
stepWeight: 10
# 成功率阈值
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 30s
# Webhook测试
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.test/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://user-service-canary/api/health | grep healthy"
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://user-service-canary/api/health"
# 跳过分析的条件
skipAnalysis: false
滚动更新策略
# rolling-update-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: microservices
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
# 最大不可用Pod数量
maxUnavailable: 1
# 最大超出期望副本数的Pod数量
maxSurge: 2
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: harbor.company.com/microservices/user-service:v2.0.0
ports:
- containerPort: 8080
# 就绪探针确保Pod准备好接收流量
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
# 存活探针确保Pod健康运行
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# 优雅关闭
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 15
# 资源限制
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# 优雅关闭时间
terminationGracePeriodSeconds: 30
CI/CD流水线
GitLab CI/CD配置
# .gitlab-ci.yml
stages:
- test
- build
- security
- deploy-dev
- deploy-staging
- deploy-prod
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
REGISTRY: harbor.company.com
PROJECT_NAME: microservices
SERVICE_NAME: user-service
KUBECONFIG_FILE: $KUBECONFIG_CONTENT
# 代码测试阶段
unit-test:
stage: test
image: golang:1.21-alpine
before_script:
- apk add --no-cache git
- go mod download
script:
- go test -v -race -coverprofile=coverage.out ./...
- go tool cover -html=coverage.out -o coverage.html
coverage: '/coverage: \d+\.\d+% of statements/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
paths:
- coverage.html
expire_in: 1 week
only:
- merge_requests
- main
- develop
# 代码质量检查
code-quality:
stage: test
image: golangci/golangci-lint:latest
script:
- golangci-lint run --out-format code-climate > gl-code-quality-report.json
artifacts:
reports:
codequality: gl-code-quality-report.json
expire_in: 1 week
only:
- merge_requests
- main
- develop
# 构建Docker镜像
build-image:
stage: build
image: docker:20.10.16
services:
- docker:20.10.16-dind
before_script:
- echo $HARBOR_PASSWORD | docker login $REGISTRY -u $HARBOR_USERNAME --password-stdin
script:
- |
if [ "$CI_COMMIT_REF_NAME" = "main" ]; then
TAG="latest"
else
TAG="$CI_COMMIT_SHORT_SHA"
fi
- docker build -t $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$TAG .
- docker push $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$TAG
- echo "IMAGE_TAG=$TAG" > build.env
artifacts:
reports:
dotenv: build.env
only:
- main
- develop
- /^release\/.*$/
# 安全扫描
security-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy image --format template --template "@contrib/gitlab.tpl" -o gl-container-scanning-report.json $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
artifacts:
reports:
container_scanning: gl-container-scanning-report.json
dependencies:
- build-image
only:
- main
- develop
- /^release\/.*$/
# 部署到开发环境
deploy-dev:
stage: deploy-dev
image: bitnami/kubectl:latest
environment:
name: development
url: https://dev-api.company.com
before_script:
- echo "$KUBECONFIG_DEV" | base64 -d > kubeconfig
- export KUBECONFIG=kubeconfig
script:
- |
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: $SERVICE_NAME
namespace: microservices-dev
labels:
app: $SERVICE_NAME
environment: development
spec:
replicas: 2
selector:
matchLabels:
app: $SERVICE_NAME
template:
metadata:
labels:
app: $SERVICE_NAME
environment: development
spec:
containers:
- name: $SERVICE_NAME
image: $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
ports:
- containerPort: 8080
env:
- name: ENVIRONMENT
value: development
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-secret
key: url
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
EOF
- kubectl rollout status deployment/$SERVICE_NAME -n microservices-dev --timeout=300s
dependencies:
- build-image
only:
- develop
# 部署到预发布环境
deploy-staging:
stage: deploy-staging
image: bitnami/kubectl:latest
environment:
name: staging
url: https://staging-api.company.com
before_script:
- echo "$KUBECONFIG_STAGING" | base64 -d > kubeconfig
- export KUBECONFIG=kubeconfig
script:
- |
# 使用Helm部署
helm upgrade --install $SERVICE_NAME ./helm/$SERVICE_NAME \
--namespace microservices-staging \
--set image.repository=$REGISTRY/$PROJECT_NAME/$SERVICE_NAME \
--set image.tag=$IMAGE_TAG \
--set environment=staging \
--set replicaCount=3 \
--set resources.requests.memory=256Mi \
--set resources.requests.cpu=250m \
--set resources.limits.memory=512Mi \
--set resources.limits.cpu=500m \
--wait --timeout=600s
dependencies:
- build-image
when: manual
only:
- /^release\/.*$/
# 部署到生产环境
deploy-prod:
stage: deploy-prod
image: bitnami/kubectl:latest
environment:
name: production
url: https://api.company.com
before_script:
- echo "$KUBECONFIG_PROD" | base64 -d > kubeconfig
- export KUBECONFIG=kubeconfig
script:
- |
# 使用蓝绿部署策略
./scripts/blue-green-deploy.sh $REGISTRY/$PROJECT_NAME/$SERVICE_NAME:$IMAGE_TAG
dependencies:
- build-image
when: manual
allow_failure: false
only:
- main
Jenkins Pipeline配置
// Jenkinsfile
pipeline {
agent any
environment {
REGISTRY = 'harbor.company.com'
PROJECT_NAME = 'microservices'
SERVICE_NAME = 'user-service'
DOCKER_CREDENTIALS = credentials('harbor-credentials')
KUBECONFIG_DEV = credentials('kubeconfig-dev')
KUBECONFIG_STAGING = credentials('kubeconfig-staging')
KUBECONFIG_PROD = credentials('kubeconfig-prod')
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.GIT_COMMIT_SHORT = sh(
script: 'git rev-parse --short HEAD',
returnStdout: true
).trim()
if (env.BRANCH_NAME == 'main') {
env.IMAGE_TAG = 'latest'
} else {
env.IMAGE_TAG = env.GIT_COMMIT_SHORT
}
}
}
}
stage('Test') {
parallel {
stage('Unit Tests') {
steps {
sh '''
go mod download
go test -v -race -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
'''
}
post {
always {
publishHTML([
allowMissing: false,
alwaysLinkToLastBuild: true,
keepAll: true,
reportDir: '.',
reportFiles: 'coverage.html',
reportName: 'Coverage Report'
])
}
}
}
stage('Code Quality') {
steps {
sh 'golangci-lint run --out-format checkstyle > checkstyle-report.xml'
}
post {
always {
recordIssues(
enabledForFailure: true,
tools: [checkStyle(pattern: 'checkstyle-report.xml')]
)
}
}
}
}
}
stage('Build') {
when {
anyOf {
branch 'main'
branch 'develop'
branch 'release/*'
}
}
steps {
script {
docker.withRegistry("https://${REGISTRY}", env.DOCKER_CREDENTIALS) {
def image = docker.build("${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}")
image.push()
if (env.BRANCH_NAME == 'main') {
image.push('latest')
}
}
}
}
}
stage('Security Scan') {
when {
anyOf {
branch 'main'
branch 'develop'
branch 'release/*'
}
}
steps {
sh '''
trivy image --format json -o trivy-report.json ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
trivy image --format table ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
'''
}
post {
always {
archiveArtifacts artifacts: 'trivy-report.json', fingerprint: true
}
}
}
stage('Deploy to Dev') {
when {
branch 'develop'
}
steps {
script {
withKubeConfig([credentialsId: 'kubeconfig-dev']) {
sh '''
kubectl set image deployment/${SERVICE_NAME} \
${SERVICE_NAME}=${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG} \
-n microservices-dev
kubectl rollout status deployment/${SERVICE_NAME} -n microservices-dev --timeout=300s
'''
}
}
}
}
stage('Deploy to Staging') {
when {
branch 'release/*'
}
steps {
input message: 'Deploy to Staging?', ok: 'Deploy'
script {
withKubeConfig([credentialsId: 'kubeconfig-staging']) {
sh '''
helm upgrade --install ${SERVICE_NAME} ./helm/${SERVICE_NAME} \
--namespace microservices-staging \
--set image.repository=${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME} \
--set image.tag=${IMAGE_TAG} \
--set environment=staging \
--wait --timeout=600s
'''
}
}
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
steps {
input message: 'Deploy to Production?', ok: 'Deploy'
script {
withKubeConfig([credentialsId: 'kubeconfig-prod']) {
sh '''
chmod +x ./scripts/blue-green-deploy.sh
./scripts/blue-green-deploy.sh ${REGISTRY}/${PROJECT_NAME}/${SERVICE_NAME}:${IMAGE_TAG}
'''
}
}
}
}
}
post {
always {
cleanWs()
}
success {
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ ${SERVICE_NAME} deployment successful: ${env.BUILD_URL}"
)
}
failure {
slackSend(
channel: '#deployments',
color: 'danger',
message: "❌ ${SERVICE_NAME} deployment failed: ${env.BUILD_URL}"
)
}
}
}
GitHub Actions配置
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main, develop ]
release:
types: [ published ]
env:
REGISTRY: harbor.company.com
PROJECT_NAME: microservices
SERVICE_NAME: user-service
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.21
- name: Cache Go modules
uses: actions/cache@v3
with:
path: ~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- name: Download dependencies
run: go mod download
- name: Run tests
run: |
go test -v -race -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
- name: Upload coverage reports
uses: codecov/codecov-action@v3
with:
file: ./coverage.out
- name: Run golangci-lint
uses: golangci/golangci-lint-action@v3
with:
version: latest
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name != 'pull_request'
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Harbor
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.HARBOR_USERNAME }}
password: ${{ secrets.HARBOR_PASSWORD }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.PROJECT_NAME }}/${{ env.SERVICE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix={{branch}}-
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
security-scan:
needs: build
runs-on: ubuntu-latest
if: github.event_name != 'pull_request'
steps:
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ needs.build.outputs.image-tag }}
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
deploy-dev:
needs: [build, security-scan]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/develop'
environment: development
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_DEV }}
- name: Deploy to development
run: |
kubectl set image deployment/${{ env.SERVICE_NAME }} \
${{ env.SERVICE_NAME }}=${{ needs.build.outputs.image-tag }} \
-n microservices-dev
kubectl rollout status deployment/${{ env.SERVICE_NAME }} -n microservices-dev --timeout=300s
deploy-staging:
needs: [build, security-scan]
runs-on: ubuntu-latest
if: github.event_name == 'release'
environment: staging
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}
- name: Install Helm
uses: azure/setup-helm@v3
with:
version: '3.10.0'
- name: Deploy to staging
run: |
helm upgrade --install ${{ env.SERVICE_NAME }} ./helm/${{ env.SERVICE_NAME }} \
--namespace microservices-staging \
--set image.repository=${{ env.REGISTRY }}/${{ env.PROJECT_NAME }}/${{ env.SERVICE_NAME }} \
--set image.tag=${{ github.event.release.tag_name }} \
--set environment=staging \
--wait --timeout=600s
deploy-prod:
needs: [build, security-scan]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG_PROD }}
- name: Deploy to production
run: |
chmod +x ./scripts/blue-green-deploy.sh
./scripts/blue-green-deploy.sh ${{ needs.build.outputs.image-tag }}
运维自动化
Helm Chart模板
# helm/user-service/Chart.yaml
apiVersion: v2
name: user-service
description: A Helm chart for User Service
type: application
version: 0.1.0
appVersion: "1.0.0"
dependencies:
- name: postgresql
version: "11.9.13"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
- name: redis
version: "17.3.7"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
# helm/user-service/values.yaml
# 默认配置值
replicaCount: 3
image:
repository: harbor.company.com/microservices/user-service
pullPolicy: IfNotPresent
tag: "latest"
imagePullSecrets:
- name: harbor-secret
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
podSecurityContext:
fsGroup: 2000
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
service:
type: ClusterIP
port: 80
targetPort: 8080
ingress:
enabled: true
className: "nginx"
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: api.company.com
paths:
- path: /api/users
pathType: Prefix
tls:
- secretName: api-tls
hosts:
- api.company.com
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- user-service
topologyKey: kubernetes.io/hostname
# 环境配置
environment: production
# 配置映射
config:
database:
host: postgresql
port: 5432
name: userdb
sslmode: require
redis:
host: redis-master
port: 6379
db: 0
logging:
level: info
format: json
metrics:
enabled: true
port: 8080
path: /metrics
# 密钥配置
secrets:
database:
username: postgres
password: ""
redis:
password: ""
jwt:
secret: ""
# 健康检查
healthCheck:
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
# 依赖服务配置
postgresql:
enabled: true
auth:
postgresPassword: "postgres123"
username: "userservice"
password: "userservice123"
database: "userdb"
primary:
persistence:
enabled: true
size: 8Gi
redis:
enabled: true
auth:
enabled: true
password: "redis123"
master:
persistence:
enabled: true
size: 8Gi
# helm/user-service/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "user-service.fullname" . }}
labels:
{{- include "user-service.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "user-service.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "user-service.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "user-service.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: ENVIRONMENT
value: {{ .Values.environment }}
- name: DATABASE_HOST
value: {{ .Values.config.database.host }}
- name: DATABASE_PORT
value: "{{ .Values.config.database.port }}"
- name: DATABASE_NAME
value: {{ .Values.config.database.name }}
- name: DATABASE_USERNAME
valueFrom:
secretKeyRef:
name: {{ include "user-service.fullname" . }}-secret
key: database-username
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "user-service.fullname" . }}-secret
key: database-password
- name: REDIS_HOST
value: {{ .Values.config.redis.host }}
- name: REDIS_PORT
value: "{{ .Values.config.redis.port }}"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "user-service.fullname" . }}-secret
key: redis-password
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: {{ include "user-service.fullname" . }}-secret
key: jwt-secret
livenessProbe:
{{- toYaml .Values.healthCheck.livenessProbe | nindent 12 }}
readinessProbe:
{{- toYaml .Values.healthCheck.readinessProbe | nindent 12 }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: config
configMap:
name: {{ include "user-service.fullname" . }}-config
- name: tmp
emptyDir: {}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
Terraform基础设施即代码
# terraform/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.11"
}
}
backend "s3" {
bucket = "company-terraform-state"
key = "microservices/terraform.tfstate"
region = "us-west-2"
}
}
provider "aws" {
region = var.aws_region
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
}
}
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
}
}
}
# VPC配置
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "${var.project_name}-vpc"
cidr = var.vpc_cidr
azs = var.availability_zones
private_subnets = var.private_subnets
public_subnets = var.public_subnets
enable_nat_gateway = true
enable_vpn_gateway = false
enable_dns_hostnames = true
enable_dns_support = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
# EKS集群
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = var.cluster_name
cluster_version = var.kubernetes_version
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
# 节点组配置
eks_managed_node_groups = {
main = {
name = "main-node-group"
instance_types = ["t3.medium"]
min_size = 3
max_size = 10
desired_size = 6
disk_size = 50
labels = {
Environment = var.environment
NodeGroup = "main"
}
taints = []
tags = {
Environment = var.environment
}
}
}
# 集群插件
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
}
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
# RDS数据库
resource "aws_db_subnet_group" "main" {
name = "${var.project_name}-db-subnet-group"
subnet_ids = module.vpc.private_subnets
tags = {
Name = "${var.project_name} DB subnet group"
}
}
resource "aws_security_group" "rds" {
name_prefix = "${var.project_name}-rds-"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-rds-sg"
}
}
resource "aws_db_instance" "main" {
identifier = "${var.project_name}-db"
engine = "postgres"
engine_version = "14.9"
instance_class = "db.t3.micro"
allocated_storage = 20
max_allocated_storage = 100
storage_type = "gp2"
storage_encrypted = true
db_name = "microservices"
username = "postgres"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = true
deletion_protection = false
tags = {
Name = "${var.project_name}-database"
}
}
# ElastiCache Redis
resource "aws_elasticache_subnet_group" "main" {
name = "${var.project_name}-cache-subnet"
subnet_ids = module.vpc.private_subnets
}
resource "aws_security_group" "redis" {
name_prefix = "${var.project_name}-redis-"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 6379
to_port = 6379
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-redis-sg"
}
}
resource "aws_elasticache_replication_group" "main" {
replication_group_id = "${var.project_name}-redis"
description = "Redis cluster for ${var.project_name}"
node_type = "cache.t3.micro"
port = 6379
parameter_group_name = "default.redis7"
num_cache_clusters = 2
subnet_group_name = aws_elasticache_subnet_group.main.name
security_group_ids = [aws_security_group.redis.id]
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = var.redis_auth_token
tags = {
Name = "${var.project_name}-redis"
}
}
# terraform/variables.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "us-west-2"
}
variable "project_name" {
description = "Project name"
type = string
default = "microservices"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "cluster_name" {
description = "EKS cluster name"
type = string
default = "microservices-cluster"
}
variable "kubernetes_version" {
description = "Kubernetes version"
type = string
default = "1.28"
}
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
default = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
variable "private_subnets" {
description = "Private subnet CIDR blocks"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
variable "public_subnets" {
description = "Public subnet CIDR blocks"
type = list(string)
default = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
}
variable "db_password" {
description = "Database password"
type = string
sensitive = true
}
variable "redis_auth_token" {
description = "Redis auth token"
type = string
sensitive = true
}
Ansible自动化运维
# ansible/playbooks/deploy-microservices.yml
---
- name: Deploy Microservices to Kubernetes
hosts: localhost
connection: local
gather_facts: false
vars:
namespace: microservices
services:
- name: user-service
image: harbor.company.com/microservices/user-service
tag: "{{ user_service_tag | default('latest') }}"
replicas: 3
- name: order-service
image: harbor.company.com/microservices/order-service
tag: "{{ order_service_tag | default('latest') }}"
replicas: 3
- name: payment-service
image: harbor.company.com/microservices/payment-service
tag: "{{ payment_service_tag | default('latest') }}"
replicas: 2
tasks:
- name: Create namespace
kubernetes.core.k8s:
name: "{{ namespace }}"
api_version: v1
kind: Namespace
state: present
- name: Deploy services
kubernetes.core.helm:
name: "{{ item.name }}"
chart_ref: "./helm/{{ item.name }}"
release_namespace: "{{ namespace }}"
create_namespace: true
values:
image:
repository: "{{ item.image }}"
tag: "{{ item.tag }}"
replicaCount: "{{ item.replicas }}"
environment: "{{ target_environment | default('production') }}"
wait: true
timeout: 600s
loop: "{{ services }}"
- name: Wait for deployments to be ready
kubernetes.core.k8s_info:
api_version: apps/v1
kind: Deployment
name: "{{ item.name }}"
namespace: "{{ namespace }}"
wait: true
wait_condition:
type: Available
status: "True"
wait_timeout: 600
loop: "{{ services }}"
- name: Verify service health
uri:
url: "http://{{ item.name }}.{{ namespace }}.svc.cluster.local/health"
method: GET
status_code: 200
loop: "{{ services }}"
retries: 5
delay: 10
# ansible/playbooks/backup-databases.yml
---
- name: Backup Databases
hosts: localhost
connection: local
gather_facts: false
vars:
backup_bucket: company-database-backups
timestamp: "{{ ansible_date_time.epoch }}"
databases:
- name: userdb
host: postgres.microservices.svc.cluster.local
port: 5432
username: postgres
- name: orderdb
host: postgres.microservices.svc.cluster.local
port: 5432
username: postgres
tasks:
- name: Create backup directory
file:
path: "/tmp/backups/{{ timestamp }}"
state: directory
mode: '0755'
- name: Backup PostgreSQL databases
shell: |
kubectl exec -n microservices deployment/postgres -- \
pg_dump -h localhost -U {{ item.username }} -d {{ item.name }} \
> /tmp/backups/{{ timestamp }}/{{ item.name }}_{{ timestamp }}.sql
loop: "{{ databases }}"
environment:
PGPASSWORD: "{{ postgres_password }}"
- name: Compress backups
archive:
path: "/tmp/backups/{{ timestamp }}"
dest: "/tmp/backups/backup_{{ timestamp }}.tar.gz"
format: gz
- name: Upload to S3
amazon.aws.s3_object:
bucket: "{{ backup_bucket }}"
object: "database-backups/{{ ansible_date_time.date }}/backup_{{ timestamp }}.tar.gz"
src: "/tmp/backups/backup_{{ timestamp }}.tar.gz"
mode: put
- name: Clean up local backups
file:
path: "/tmp/backups"
state: absent
- name: Send notification
uri:
url: "{{ slack_webhook_url }}"
method: POST
body_format: json
body:
text: "✅ Database backup completed successfully at {{ ansible_date_time.iso8601 }}"
when: slack_webhook_url is defined
本章总结
本章深入探讨了微服务的部署与运维,涵盖了现代微服务架构中的关键运维实践:
主要内容回顾
部署与运维挑战
- 复杂性管理:多服务协调、依赖管理、配置管理
- 可靠性保证:高可用性、故障恢复、数据一致性
- 运维效率:自动化部署、监控告警、问题诊断
容器化部署策略
- Docker容器化:多阶段构建、镜像优化、安全配置
- Docker Compose:本地开发环境、服务编排
- 容器最佳实践:镜像分层、资源限制、健康检查
Kubernetes部署配置
- 核心资源:Deployment、Service、ConfigMap、Secret
- 高级特性:HPA自动扩缩容、RBAC权限控制
- 生产配置:资源限制、亲和性规则、安全上下文
蓝绿部署和金丝雀发布
- 蓝绿部署:零停机切换、快速回滚、环境隔离
- 金丝雀发布:渐进式发布、风险控制、自动化回滚
- Istio集成:流量管理、A/B测试、Flagger自动化
CI/CD流水线
- GitLab CI/CD:完整的DevOps流水线、多环境部署
- Jenkins Pipeline:企业级CI/CD、插件生态
- GitHub Actions:云原生CI/CD、社区集成
运维自动化
- Helm Charts:应用打包、模板化部署、依赖管理
- Terraform:基础设施即代码、云资源管理
- Ansible:配置管理、自动化运维、批量操作
最佳实践
部署策略选择
- 根据业务特点选择合适的部署策略
- 结合监控指标进行自动化决策
- 建立完善的回滚机制
基础设施管理
- 使用基础设施即代码管理云资源
- 实施多环境一致性配置
- 建立灾难恢复计划
运维自动化
- 自动化日常运维任务
- 建立标准化的运维流程
- 实施持续改进机制
安全与合规
- 实施容器安全扫描
- 建立访问控制和审计机制
- 遵循安全最佳实践
技术要点
- 容器化:Docker多阶段构建、镜像优化、安全配置
- 编排管理:Kubernetes资源管理、服务发现、配置管理
- 部署策略:蓝绿部署、金丝雀发布、滚动更新
- CI/CD:自动化测试、构建、部署、监控
- 基础设施:IaC、云原生、弹性扩缩容
- 运维自动化:配置管理、备份恢复、监控告警
通过本章的学习,您应该能够: - 设计和实施完整的微服务部署方案 - 选择和配置适合的部署策略 - 建立自动化的CI/CD流水线 - 实施基础设施即代码管理 - 建立高效的运维自动化体系
下一章我们将探讨微服务的性能优化与调优,学习如何提升微服务系统的性能和效率。