10.1 项目概述

10.1.1 项目背景

本章将通过一个完整的实战项目,展示如何使用Caddy构建一个现代化的企业级Web服务架构。我们将构建一个包含以下组件的系统:

  • 前端应用:React SPA应用
  • API网关:统一的API入口
  • 微服务:多个后端服务
  • 数据库:PostgreSQL和Redis
  • 监控系统:Prometheus和Grafana
  • 日志系统:ELK Stack

10.1.2 架构设计

┌─────────────────┐    ┌─────────────────┐
│   用户/客户端    │────│   CDN/负载均衡   │
└─────────────────┘    └─────────────────┘
                                │
                       ┌─────────────────┐
                       │   Caddy网关     │
                       │  (SSL终止)      │
                       └─────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
┌─────────────┐        ┌─────────────┐        ┌─────────────┐
│  前端应用   │        │  API服务    │        │  管理后台   │
│  (React)    │        │ (微服务集群) │        │  (Admin)    │
└─────────────┘        └─────────────┘        └─────────────┘
                                │
                ┌───────────────┼───────────────┐
                │               │               │
        ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
        │  用户服务   │ │  订单服务   │ │  支付服务   │
        │ (User API)  │ │(Order API)  │ │(Payment API)│
        └─────────────┘ └─────────────┘ └─────────────┘
                │               │               │
        ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
        │ PostgreSQL  │ │    Redis    │ │  消息队列   │
        │   数据库    │ │    缓存     │ │ (RabbitMQ)  │
        └─────────────┘ └─────────────┘ └─────────────┘

10.1.3 技术栈

  • Web服务器:Caddy v2
  • 前端:React + TypeScript
  • 后端:Go微服务
  • 数据库:PostgreSQL + Redis
  • 监控:Prometheus + Grafana
  • 日志:Elasticsearch + Logstash + Kibana
  • 容器化:Docker + Docker Compose

10.2 环境准备

10.2.1 目录结构

ecommerce-platform/
├── caddy/
│   ├── Caddyfile
│   ├── config/
│   │   ├── api.json
│   │   └── tls.json
│   └── logs/
├── frontend/
│   ├── public/
│   ├── src/
│   ├── package.json
│   └── Dockerfile
├── services/
│   ├── user-service/
│   ├── order-service/
│   ├── payment-service/
│   └── gateway/
├── infrastructure/
│   ├── docker-compose.yml
│   ├── prometheus/
│   ├── grafana/
│   └── elk/
├── scripts/
│   ├── deploy.sh
│   ├── backup.sh
│   └── monitoring.sh
└── docs/
    ├── api.md
    └── deployment.md

10.2.2 Docker Compose配置

# infrastructure/docker-compose.yml
version: '3.8'

services:
  # Caddy Web服务器
  caddy:
    image: caddy:2-alpine
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
      - "2019:2019"  # Admin API
    volumes:
      - ./caddy/Caddyfile:/etc/caddy/Caddyfile
      - ./caddy/config:/config
      - ./caddy/data:/data
      - ./caddy/logs:/var/log/caddy
      - ./frontend/dist:/srv/frontend
    networks:
      - web
      - internal
    depends_on:
      - user-service
      - order-service
      - payment-service

  # 前端应用构建
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    container_name: frontend-build
    volumes:
      - ./frontend/dist:/app/dist
    command: npm run build

  # 用户服务
  user-service:
    build:
      context: ./services/user-service
      dockerfile: Dockerfile
    container_name: user-service
    restart: unless-stopped
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=userdb
      - DB_USER=postgres
      - DB_PASSWORD=password
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    networks:
      - internal
    depends_on:
      - postgres
      - redis

  # 订单服务
  order-service:
    build:
      context: ./services/order-service
      dockerfile: Dockerfile
    container_name: order-service
    restart: unless-stopped
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=orderdb
      - DB_USER=postgres
      - DB_PASSWORD=password
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
    networks:
      - internal
    depends_on:
      - postgres
      - redis
      - rabbitmq

  # 支付服务
  payment-service:
    build:
      context: ./services/payment-service
      dockerfile: Dockerfile
    container_name: payment-service
    restart: unless-stopped
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=paymentdb
      - DB_USER=postgres
      - DB_PASSWORD=password
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    networks:
      - internal
    depends_on:
      - postgres
      - redis

  # PostgreSQL数据库
  postgres:
    image: postgres:13-alpine
    container_name: postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
      - POSTGRES_MULTIPLE_DATABASES=userdb,orderdb,paymentdb
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./infrastructure/postgres/init:/docker-entrypoint-initdb.d
    networks:
      - internal

  # Redis缓存
  redis:
    image: redis:6-alpine
    container_name: redis
    restart: unless-stopped
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    networks:
      - internal

  # RabbitMQ消息队列
  rabbitmq:
    image: rabbitmq:3-management-alpine
    container_name: rabbitmq
    restart: unless-stopped
    environment:
      - RABBITMQ_DEFAULT_USER=guest
      - RABBITMQ_DEFAULT_PASS=guest
    ports:
      - "15672:15672"  # 管理界面
    volumes:
      - rabbitmq_data:/var/lib/rabbitmq
    networks:
      - internal

  # Prometheus监控
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./infrastructure/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    networks:
      - internal

  # Grafana仪表板
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
      - ./infrastructure/grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./infrastructure/grafana/datasources:/etc/grafana/provisioning/datasources
    networks:
      - internal

  # Elasticsearch
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    container_name: elasticsearch
    restart: unless-stopped
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    networks:
      - internal

  # Logstash
  logstash:
    image: docker.elastic.co/logstash/logstash:7.14.0
    container_name: logstash
    restart: unless-stopped
    volumes:
      - ./infrastructure/elk/logstash/pipeline:/usr/share/logstash/pipeline
      - ./caddy/logs:/var/log/caddy:ro
    networks:
      - internal
    depends_on:
      - elasticsearch

  # Kibana
  kibana:
    image: docker.elastic.co/kibana/kibana:7.14.0
    container_name: kibana
    restart: unless-stopped
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    networks:
      - internal
    depends_on:
      - elasticsearch

volumes:
  postgres_data:
  redis_data:
  rabbitmq_data:
  prometheus_data:
  grafana_data:
  elasticsearch_data:

networks:
  web:
    external: true
  internal:
    driver: bridge

10.3 Caddy配置

10.3.1 主配置文件

# caddy/Caddyfile
{
    # 全局配置
    admin localhost:2019
    
    # 日志配置
    log {
        output file /var/log/caddy/access.log {
            roll_size 100mb
            roll_keep 5
            roll_keep_for 720h
        }
        format json
        level INFO
    }
    
    # 错误日志
    log error {
        output file /var/log/caddy/error.log {
            roll_size 100mb
            roll_keep 5
        }
        format json
        level ERROR
    }
    
    # 自动HTTPS配置
    email admin@ecommerce-platform.com
    
    # 安全头部
    header {
        # 安全头部
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        X-Frame-Options "DENY"
        X-XSS-Protection "1; mode=block"
        Referrer-Policy "strict-origin-when-cross-origin"
        
        # 隐藏服务器信息
        -Server
    }
}

# 主域名 - 前端应用
ecommerce-platform.com {
    # 根目录指向前端构建文件
    root * /srv/frontend
    
    # 启用压缩
    encode gzip zstd
    
    # API路由代理到后端服务
    handle /api/users/* {
        reverse_proxy user-service:8080 {
            # 健康检查
            health_uri /health
            health_interval 30s
            health_timeout 5s
            
            # 负载均衡
            lb_policy least_conn
            
            # 重试配置
            lb_try_duration 30s
            lb_try_interval 250ms
        }
    }
    
    handle /api/orders/* {
        reverse_proxy order-service:8080 {
            health_uri /health
            health_interval 30s
            health_timeout 5s
            lb_policy least_conn
        }
    }
    
    handle /api/payments/* {
        reverse_proxy payment-service:8080 {
            health_uri /health
            health_interval 30s
            health_timeout 5s
            lb_policy least_conn
        }
    }
    
    # WebSocket支持
    handle /ws/* {
        reverse_proxy user-service:8080
    }
    
    # 静态文件服务
    handle /static/* {
        file_server {
            # 启用预压缩
            precompressed gzip br
        }
        
        # 缓存配置
        header {
            Cache-Control "public, max-age=31536000, immutable"
        }
    }
    
    # SPA路由支持
    handle {
        try_files {path} /index.html
        file_server
    }
    
    # 限流配置
    rate_limit {
        zone dynamic {
            key {remote_host}
            events 100
            window 1m
        }
    }
    
    # 请求日志
    log {
        output file /var/log/caddy/ecommerce-access.log {
            roll_size 100mb
            roll_keep 10
        }
        format json {
            time_format "iso8601"
            message_key "message"
        }
    }
}

# API子域名
api.ecommerce-platform.com {
    # CORS配置
    header {
        Access-Control-Allow-Origin "https://ecommerce-platform.com"
        Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS"
        Access-Control-Allow-Headers "Content-Type, Authorization"
        Access-Control-Max-Age "86400"
    }
    
    # 处理预检请求
    @options method OPTIONS
    respond @options 204
    
    # JWT认证中间件
    jwt {
        primary yes
        trusted_tokens {
            static_secret "your-jwt-secret-key"
        }
        auth_url /api/auth/verify
        allow_guests /api/auth/login /api/auth/register /api/health
    }
    
    # API网关路由
    handle /users/* {
        uri strip_prefix /users
        reverse_proxy user-service:8080 {
            header_up Host {upstream_hostport}
            header_up X-Real-IP {remote_host}
            header_up X-Forwarded-For {remote_host}
            header_up X-Forwarded-Proto {scheme}
        }
    }
    
    handle /orders/* {
        uri strip_prefix /orders
        reverse_proxy order-service:8080 {
            header_up Host {upstream_hostport}
            header_up X-Real-IP {remote_host}
            header_up X-Forwarded-For {remote_host}
            header_up X-Forwarded-Proto {scheme}
        }
    }
    
    handle /payments/* {
        uri strip_prefix /payments
        reverse_proxy payment-service:8080 {
            header_up Host {upstream_hostport}
            header_up X-Real-IP {remote_host}
            header_up X-Forwarded-For {remote_host}
            header_up X-Forwarded-Proto {scheme}
        }
    }
    
    # API限流
    rate_limit {
        zone api {
            key {http.request.header.authorization}
            events 1000
            window 1h
        }
    }
}

# 管理后台
admin.ecommerce-platform.com {
    # 基本认证
    basicauth {
        admin $2a$14$hashed_password_here
    }
    
    # IP白名单
    @allowed remote_ip 10.0.0.0/8 192.168.0.0/16 172.16.0.0/12
    abort @allowed
    
    # 管理界面代理
    handle /grafana/* {
        uri strip_prefix /grafana
        reverse_proxy grafana:3000
    }
    
    handle /prometheus/* {
        uri strip_prefix /prometheus
        reverse_proxy prometheus:9090
    }
    
    handle /kibana/* {
        uri strip_prefix /kibana
        reverse_proxy kibana:5601
    }
    
    handle /rabbitmq/* {
        uri strip_prefix /rabbitmq
        reverse_proxy rabbitmq:15672
    }
    
    # 默认重定向到Grafana
    redir / /grafana/
}

# 监控端点
monitoring.ecommerce-platform.com {
    # Prometheus指标
    handle /metrics {
        metrics
    }
    
    # 健康检查
    handle /health {
        respond "OK" 200
    }
    
    # Caddy配置API
    handle /config/* {
        reverse_proxy localhost:2019
    }
}

# 开发环境配置
dev.ecommerce-platform.com {
    # 开发模式配置
    tls internal
    
    # 热重载支持
    handle /sockjs-node/* {
        reverse_proxy localhost:3000
    }
    
    handle {
        reverse_proxy localhost:3000
    }
}

10.3.2 JSON配置文件

{
  "admin": {
    "listen": "localhost:2019"
  },
  "logging": {
    "logs": {
      "default": {
        "level": "INFO",
        "writer": {
          "output": "file",
          "filename": "/var/log/caddy/caddy.log",
          "roll": true,
          "roll_size_mb": 100,
          "roll_keep": 5
        },
        "encoder": {
          "format": "json",
          "time_format": "iso8601"
        }
      }
    }
  },
  "apps": {
    "http": {
      "servers": {
        "main": {
          "listen": [":80", ":443"],
          "routes": [
            {
              "match": [
                {
                  "host": ["ecommerce-platform.com"]
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "match": [
                        {
                          "path": ["/api/users/*"]
                        }
                      ],
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "upstreams": [
                            {
                              "dial": "user-service:8080"
                            }
                          ],
                          "health_checks": {
                            "active": {
                              "uri": "/health",
                              "interval": "30s",
                              "timeout": "5s"
                            }
                          }
                        }
                      ]
                    },
                    {
                      "match": [
                        {
                          "path": ["/api/orders/*"]
                        }
                      ],
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "upstreams": [
                            {
                              "dial": "order-service:8080"
                            }
                          ]
                        }
                      ]
                    },
                    {
                      "handle": [
                        {
                          "handler": "file_server",
                          "root": "/srv/frontend",
                          "index_names": ["index.html"]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ],
          "automatic_https": {
            "disable": false
          }
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "subjects": [
              "ecommerce-platform.com",
              "*.ecommerce-platform.com"
            ],
            "issuers": [
              {
                "module": "acme",
                "ca": "https://acme-v02.api.letsencrypt.org/directory",
                "email": "admin@ecommerce-platform.com"
              }
            ]
          }
        ]
      }
    }
  }
}

10.4 微服务实现

10.4.1 用户服务

// services/user-service/main.go
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "net/http"
    "os"
    "time"
    
    "github.com/gorilla/mux"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "gorm.io/driver/postgres"
    "gorm.io/gorm"
    "github.com/go-redis/redis/v8"
)

type User struct {
    ID        uint      `json:"id" gorm:"primaryKey"`
    Username  string    `json:"username" gorm:"uniqueIndex"`
    Email     string    `json:"email" gorm:"uniqueIndex"`
    Password  string    `json:"-"`
    CreatedAt time.Time `json:"created_at"`
    UpdatedAt time.Time `json:"updated_at"`
}

type UserService struct {
    db    *gorm.DB
    redis *redis.Client
    
    // Prometheus指标
    requestsTotal   *prometheus.CounterVec
    requestDuration *prometheus.HistogramVec
}

func NewUserService() *UserService {
    // 数据库连接
    dsn := fmt.Sprintf("host=%s port=%s user=%s password=%s dbname=%s sslmode=disable",
        os.Getenv("DB_HOST"),
        os.Getenv("DB_PORT"),
        os.Getenv("DB_USER"),
        os.Getenv("DB_PASSWORD"),
        os.Getenv("DB_NAME"),
    )
    
    db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})
    if err != nil {
        log.Fatal("Failed to connect to database:", err)
    }
    
    // 自动迁移
    db.AutoMigrate(&User{})
    
    // Redis连接
    rdb := redis.NewClient(&redis.Options{
        Addr: fmt.Sprintf("%s:%s", os.Getenv("REDIS_HOST"), os.Getenv("REDIS_PORT")),
    })
    
    // Prometheus指标
    requestsTotal := prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "user_service_requests_total",
            Help: "Total number of requests to user service",
        },
        []string{"method", "endpoint", "status"},
    )
    
    requestDuration := prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "user_service_request_duration_seconds",
            Help: "Request duration in seconds",
        },
        []string{"method", "endpoint"},
    )
    
    prometheus.MustRegister(requestsTotal, requestDuration)
    
    return &UserService{
        db:              db,
        redis:          rdb,
        requestsTotal:   requestsTotal,
        requestDuration: requestDuration,
    }
}

func (us *UserService) GetUsers(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    defer func() {
        duration := time.Since(start).Seconds()
        us.requestDuration.WithLabelValues(r.Method, "/users").Observe(duration)
    }()
    
    var users []User
    
    // 尝试从缓存获取
    cacheKey := "users:all"
    cached, err := us.redis.Get(context.Background(), cacheKey).Result()
    if err == nil {
        json.Unmarshal([]byte(cached), &users)
        us.requestsTotal.WithLabelValues(r.Method, "/users", "200").Inc()
        w.Header().Set("Content-Type", "application/json")
        w.Header().Set("X-Cache", "HIT")
        json.NewEncoder(w).Encode(users)
        return
    }
    
    // 从数据库获取
    result := us.db.Find(&users)
    if result.Error != nil {
        us.requestsTotal.WithLabelValues(r.Method, "/users", "500").Inc()
        http.Error(w, result.Error.Error(), http.StatusInternalServerError)
        return
    }
    
    // 缓存结果
    usersJSON, _ := json.Marshal(users)
    us.redis.Set(context.Background(), cacheKey, usersJSON, 5*time.Minute)
    
    us.requestsTotal.WithLabelValues(r.Method, "/users", "200").Inc()
    w.Header().Set("Content-Type", "application/json")
    w.Header().Set("X-Cache", "MISS")
    json.NewEncoder(w).Encode(users)
}

func (us *UserService) CreateUser(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    defer func() {
        duration := time.Since(start).Seconds()
        us.requestDuration.WithLabelValues(r.Method, "/users").Observe(duration)
    }()
    
    var user User
    if err := json.NewDecoder(r.Body).Decode(&user); err != nil {
        us.requestsTotal.WithLabelValues(r.Method, "/users", "400").Inc()
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    
    // 创建用户
    result := us.db.Create(&user)
    if result.Error != nil {
        us.requestsTotal.WithLabelValues(r.Method, "/users", "500").Inc()
        http.Error(w, result.Error.Error(), http.StatusInternalServerError)
        return
    }
    
    // 清除缓存
    us.redis.Del(context.Background(), "users:all")
    
    us.requestsTotal.WithLabelValues(r.Method, "/users", "201").Inc()
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusCreated)
    json.NewEncoder(w).Encode(user)
}

func (us *UserService) HealthCheck(w http.ResponseWriter, r *http.Request) {
    // 检查数据库连接
    sqlDB, err := us.db.DB()
    if err != nil {
        http.Error(w, "Database connection failed", http.StatusServiceUnavailable)
        return
    }
    
    if err := sqlDB.Ping(); err != nil {
        http.Error(w, "Database ping failed", http.StatusServiceUnavailable)
        return
    }
    
    // 检查Redis连接
    _, err = us.redis.Ping(context.Background()).Result()
    if err != nil {
        http.Error(w, "Redis connection failed", http.StatusServiceUnavailable)
        return
    }
    
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "healthy"})
}

func main() {
    userService := NewUserService()
    
    r := mux.NewRouter()
    
    // API路由
    api := r.PathPrefix("/api/v1").Subrouter()
    api.HandleFunc("/users", userService.GetUsers).Methods("GET")
    api.HandleFunc("/users", userService.CreateUser).Methods("POST")
    
    // 健康检查
    r.HandleFunc("/health", userService.HealthCheck).Methods("GET")
    
    // Prometheus指标
    r.Handle("/metrics", promhttp.Handler())
    
    // 启动服务器
    log.Println("User service starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", r))
}

10.4.2 订单服务

// services/order-service/main.go
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "net/http"
    "os"
    "time"
    
    "github.com/gorilla/mux"
    "github.com/streadway/amqp"
    "gorm.io/driver/postgres"
    "gorm.io/gorm"
)

type Order struct {
    ID          uint      `json:"id" gorm:"primaryKey"`
    UserID      uint      `json:"user_id"`
    ProductID   uint      `json:"product_id"`
    Quantity    int       `json:"quantity"`
    TotalAmount float64   `json:"total_amount"`
    Status      string    `json:"status"`
    CreatedAt   time.Time `json:"created_at"`
    UpdatedAt   time.Time `json:"updated_at"`
}

type OrderService struct {
    db       *gorm.DB
    rabbitmq *amqp.Connection
    channel  *amqp.Channel
}

func NewOrderService() *OrderService {
    // 数据库连接
    dsn := fmt.Sprintf("host=%s port=%s user=%s password=%s dbname=%s sslmode=disable",
        os.Getenv("DB_HOST"),
        os.Getenv("DB_PORT"),
        os.Getenv("DB_USER"),
        os.Getenv("DB_PASSWORD"),
        os.Getenv("DB_NAME"),
    )
    
    db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})
    if err != nil {
        log.Fatal("Failed to connect to database:", err)
    }
    
    db.AutoMigrate(&Order{})
    
    // RabbitMQ连接
    conn, err := amqp.Dial(os.Getenv("RABBITMQ_URL"))
    if err != nil {
        log.Fatal("Failed to connect to RabbitMQ:", err)
    }
    
    ch, err := conn.Channel()
    if err != nil {
        log.Fatal("Failed to open RabbitMQ channel:", err)
    }
    
    // 声明队列
    _, err = ch.QueueDeclare(
        "order_events", // 队列名称
        true,          // 持久化
        false,         // 自动删除
        false,         // 排他性
        false,         // 不等待
        nil,           // 参数
    )
    if err != nil {
        log.Fatal("Failed to declare queue:", err)
    }
    
    return &OrderService{
        db:       db,
        rabbitmq: conn,
        channel:  ch,
    }
}

func (os *OrderService) CreateOrder(w http.ResponseWriter, r *http.Request) {
    var order Order
    if err := json.NewDecoder(r.Body).Decode(&order); err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    
    order.Status = "pending"
    order.CreatedAt = time.Now()
    
    // 创建订单
    result := os.db.Create(&order)
    if result.Error != nil {
        http.Error(w, result.Error.Error(), http.StatusInternalServerError)
        return
    }
    
    // 发送订单事件到消息队列
    orderEvent := map[string]interface{}{
        "event_type": "order_created",
        "order_id":   order.ID,
        "user_id":    order.UserID,
        "amount":     order.TotalAmount,
        "timestamp":  time.Now(),
    }
    
    eventJSON, _ := json.Marshal(orderEvent)
    err := os.channel.Publish(
        "",             // 交换机
        "order_events", // 路由键
        false,          // 强制
        false,          // 立即
        amqp.Publishing{
            ContentType: "application/json",
            Body:        eventJSON,
        },
    )
    
    if err != nil {
        log.Printf("Failed to publish order event: %v", err)
    }
    
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusCreated)
    json.NewEncoder(w).Encode(order)
}

func (os *OrderService) GetOrders(w http.ResponseWriter, r *http.Request) {
    var orders []Order
    result := os.db.Find(&orders)
    if result.Error != nil {
        http.Error(w, result.Error.Error(), http.StatusInternalServerError)
        return
    }
    
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(orders)
}

func main() {
    orderService := NewOrderService()
    defer orderService.rabbitmq.Close()
    defer orderService.channel.Close()
    
    r := mux.NewRouter()
    
    api := r.PathPrefix("/api/v1").Subrouter()
    api.HandleFunc("/orders", orderService.GetOrders).Methods("GET")
    api.HandleFunc("/orders", orderService.CreateOrder).Methods("POST")
    
    r.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        json.NewEncoder(w).Encode(map[string]string{"status": "healthy"})
    })
    
    log.Println("Order service starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", r))
}

10.5 前端应用

10.5.1 React应用结构

// frontend/src/App.tsx
import React from 'react';
import { BrowserRouter as Router, Routes, Route } from 'react-router-dom';
import { QueryClient, QueryClientProvider } from 'react-query';
import { ReactQueryDevtools } from 'react-query/devtools';

import Header from './components/Header';
import Home from './pages/Home';
import Products from './pages/Products';
import Orders from './pages/Orders';
import Profile from './pages/Profile';
import Login from './pages/Login';
import { AuthProvider } from './contexts/AuthContext';

const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      retry: 3,
      staleTime: 5 * 60 * 1000, // 5分钟
      cacheTime: 10 * 60 * 1000, // 10分钟
    },
  },
});

function App() {
  return (
    <QueryClientProvider client={queryClient}>
      <AuthProvider>
        <Router>
          <div className="App">
            <Header />
            <main className="main-content">
              <Routes>
                <Route path="/" element={<Home />} />
                <Route path="/products" element={<Products />} />
                <Route path="/orders" element={<Orders />} />
                <Route path="/profile" element={<Profile />} />
                <Route path="/login" element={<Login />} />
              </Routes>
            </main>
          </div>
        </Router>
      </AuthProvider>
      <ReactQueryDevtools initialIsOpen={false} />
    </QueryClientProvider>
  );
}

export default App;

10.5.2 API客户端

// frontend/src/api/client.ts
import axios, { AxiosInstance, AxiosRequestConfig } from 'axios';

class APIClient {
  private client: AxiosInstance;
  
  constructor() {
    this.client = axios.create({
      baseURL: process.env.REACT_APP_API_URL || '/api',
      timeout: 10000,
      headers: {
        'Content-Type': 'application/json',
      },
    });
    
    // 请求拦截器
    this.client.interceptors.request.use(
      (config) => {
        const token = localStorage.getItem('auth_token');
        if (token) {
          config.headers.Authorization = `Bearer ${token}`;
        }
        
        // 添加请求ID用于追踪
        config.headers['X-Request-ID'] = this.generateRequestId();
        
        return config;
      },
      (error) => {
        return Promise.reject(error);
      }
    );
    
    // 响应拦截器
    this.client.interceptors.response.use(
      (response) => {
        return response;
      },
      (error) => {
        if (error.response?.status === 401) {
          // 清除认证信息并重定向到登录页
          localStorage.removeItem('auth_token');
          window.location.href = '/login';
        }
        
        return Promise.reject(error);
      }
    );
  }
  
  private generateRequestId(): string {
    return `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
  }
  
  // 用户相关API
  async getUsers() {
    const response = await this.client.get('/users');
    return response.data;
  }
  
  async createUser(userData: any) {
    const response = await this.client.post('/users', userData);
    return response.data;
  }
  
  async getUserProfile(userId: string) {
    const response = await this.client.get(`/users/${userId}`);
    return response.data;
  }
  
  // 订单相关API
  async getOrders() {
    const response = await this.client.get('/orders');
    return response.data;
  }
  
  async createOrder(orderData: any) {
    const response = await this.client.post('/orders', orderData);
    return response.data;
  }
  
  async getOrderById(orderId: string) {
    const response = await this.client.get(`/orders/${orderId}`);
    return response.data;
  }
  
  // 认证相关API
  async login(credentials: { username: string; password: string }) {
    const response = await this.client.post('/auth/login', credentials);
    return response.data;
  }
  
  async logout() {
    const response = await this.client.post('/auth/logout');
    return response.data;
  }
  
  async refreshToken() {
    const response = await this.client.post('/auth/refresh');
    return response.data;
  }
}

export const apiClient = new APIClient();

10.5.3 Dockerfile

# frontend/Dockerfile
# 多阶段构建
FROM node:16-alpine AS builder

WORKDIR /app

# 复制package文件
COPY package*.json ./

# 安装依赖
RUN npm ci --only=production

# 复制源代码
COPY . .

# 构建应用
RUN npm run build

# 生产阶段
FROM nginx:alpine

# 复制构建文件
COPY --from=builder /app/dist /usr/share/nginx/html

# 复制nginx配置
COPY nginx.conf /etc/nginx/nginx.conf

# 暴露端口
EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

10.6 监控和日志

10.6.1 Prometheus配置

# infrastructure/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  # Caddy指标
  - job_name: 'caddy'
    static_configs:
      - targets: ['caddy:2019']
    metrics_path: '/metrics'
    scrape_interval: 30s
  
  # 用户服务指标
  - job_name: 'user-service'
    static_configs:
      - targets: ['user-service:8080']
    metrics_path: '/metrics'
    scrape_interval: 30s
  
  # 订单服务指标
  - job_name: 'order-service'
    static_configs:
      - targets: ['order-service:8080']
    metrics_path: '/metrics'
    scrape_interval: 30s
  
  # 支付服务指标
  - job_name: 'payment-service'
    static_configs:
      - targets: ['payment-service:8080']
    metrics_path: '/metrics'
    scrape_interval: 30s
  
  # PostgreSQL指标
  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']
  
  # Redis指标
  - job_name: 'redis'
    static_configs:
      - targets: ['redis-exporter:9121']
  
  # 节点指标
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

10.6.2 告警规则

# infrastructure/prometheus/alert_rules.yml
groups:
  - name: ecommerce_alerts
    rules:
      # 高错误率告警
      - alert: HighErrorRate
        expr: |
          (
            sum(rate(caddy_http_requests_total{status=~"5.."}[5m])) by (instance)
            /
            sum(rate(caddy_http_requests_total[5m])) by (instance)
          ) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} for instance {{ $labels.instance }}"
      
      # 高响应时间告警
      - alert: HighResponseTime
        expr: |
          histogram_quantile(0.95, sum(rate(caddy_http_request_duration_seconds_bucket[5m])) by (le, instance)) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }}s for instance {{ $labels.instance }}"
      
      # 服务不可用告警
      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is down"
          description: "{{ $labels.job }} service is down for instance {{ $labels.instance }}"
      
      # 数据库连接告警
      - alert: DatabaseConnectionHigh
        expr: |
          pg_stat_activity_count{state="active"} > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High database connections"
          description: "Active database connections: {{ $value }}"
      
      # 内存使用率告警
      - alert: HighMemoryUsage
        expr: |
          (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "Memory usage is {{ $value | humanizePercentage }} on {{ $labels.instance }}"
      
      # 磁盘空间告警
      - alert: DiskSpaceLow
        expr: |
          (1 - (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"})) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space"
          description: "Disk usage is {{ $value | humanizePercentage }} on {{ $labels.instance }} mount {{ $labels.mountpoint }}"

10.6.3 Grafana仪表板

{
  "dashboard": {
    "id": null,
    "title": "E-commerce Platform Dashboard",
    "tags": ["ecommerce", "caddy", "microservices"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(caddy_http_requests_total[5m])) by (instance)",
            "legendFormat": "{{instance}}"
          }
        ],
        "yAxes": [
          {
            "label": "Requests/sec"
          }
        ]
      },
      {
        "id": 2,
        "title": "Response Time (95th percentile)",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(caddy_http_request_duration_seconds_bucket[5m])) by (le, instance))",
            "legendFormat": "{{instance}}"
          }
        ],
        "yAxes": [
          {
            "label": "Seconds"
          }
        ]
      },
      {
        "id": 3,
        "title": "Error Rate",
        "type": "singlestat",
        "targets": [
          {
            "expr": "sum(rate(caddy_http_requests_total{status=~\"5..\"}[5m])) / sum(rate(caddy_http_requests_total[5m]))",
            "legendFormat": "Error Rate"
          }
        ],
        "valueName": "current",
        "format": "percentunit",
        "thresholds": "0.01,0.05",
        "colorBackground": true
      },
      {
        "id": 4,
        "title": "Active Users",
        "type": "singlestat",
        "targets": [
          {
            "expr": "sum(user_service_active_sessions)",
            "legendFormat": "Active Users"
          }
        ]
      },
      {
        "id": 5,
        "title": "Database Connections",
        "type": "graph",
        "targets": [
          {
            "expr": "pg_stat_activity_count",
            "legendFormat": "{{state}}"
          }
        ]
      },
      {
        "id": 6,
        "title": "Cache Hit Rate",
        "type": "singlestat",
        "targets": [
          {
            "expr": "redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)",
            "legendFormat": "Hit Rate"
          }
        ],
        "format": "percentunit"
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

10.6.4 ELK配置

# infrastructure/elk/logstash/pipeline/caddy.conf
input {
  file {
    path => "/var/log/caddy/*.log"
    start_position => "beginning"
    codec => "json"
    tags => ["caddy"]
  }
}

filter {
  if "caddy" in [tags] {
    # 解析时间戳
    date {
      match => [ "ts", "UNIX" ]
    }
    
    # 提取用户代理信息
    if [request] and [request][headers] and [request][headers]["User-Agent"] {
      useragent {
        source => "[request][headers][User-Agent][0]"
        target => "user_agent"
      }
    }
    
    # 提取地理位置信息
    if [request] and [request][remote_ip] {
      geoip {
        source => "[request][remote_ip]"
        target => "geoip"
      }
    }
    
    # 计算响应时间
    if [duration] {
      ruby {
        code => "event.set('response_time_ms', event.get('duration') * 1000)"
      }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "caddy-logs-%{+YYYY.MM.dd}"
  }
  
  # 调试输出
  stdout {
    codec => rubydebug
  }
}

10.7 部署脚本

10.7.1 自动化部署脚本

#!/bin/bash
# scripts/deploy.sh

set -e

# 配置变量
PROJECT_NAME="ecommerce-platform"
ENVIRONMENT=${1:-production}
VERSION=${2:-latest}
BACKUP_DIR="/opt/backups"
LOG_FILE="/var/log/deploy.log"

# 日志函数
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}

error() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" | tee -a $LOG_FILE
    exit 1
}

# 检查依赖
check_dependencies() {
    log "Checking dependencies..."
    
    command -v docker >/dev/null 2>&1 || error "Docker is not installed"
    command -v docker-compose >/dev/null 2>&1 || error "Docker Compose is not installed"
    
    # 检查Docker服务状态
    if ! docker info >/dev/null 2>&1; then
        error "Docker daemon is not running"
    fi
    
    log "Dependencies check passed"
}

# 备份当前配置
backup_config() {
    log "Creating configuration backup..."
    
    BACKUP_TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    BACKUP_PATH="$BACKUP_DIR/${PROJECT_NAME}_${BACKUP_TIMESTAMP}"
    
    mkdir -p $BACKUP_PATH
    
    # 备份Caddy配置
    if [ -f "caddy/Caddyfile" ]; then
        cp -r caddy/ $BACKUP_PATH/
        log "Caddy configuration backed up"
    fi
    
    # 备份数据库
    if docker ps | grep -q postgres; then
        log "Creating database backup..."
        docker exec postgres pg_dumpall -U postgres > $BACKUP_PATH/database_backup.sql
        log "Database backup completed"
    fi
    
    # 备份当前Docker Compose配置
    cp infrastructure/docker-compose.yml $BACKUP_PATH/
    
    log "Backup completed: $BACKUP_PATH"
    echo $BACKUP_PATH > .last_backup
}

# 构建镜像
build_images() {
    log "Building Docker images..."
    
    # 构建前端
    log "Building frontend..."
    docker build -t ${PROJECT_NAME}/frontend:${VERSION} frontend/
    
    # 构建微服务
    for service in user-service order-service payment-service; do
        log "Building $service..."
        docker build -t ${PROJECT_NAME}/${service}:${VERSION} services/${service}/
    done
    
    log "Image building completed"
}

# 更新配置
update_config() {
    log "Updating configuration for environment: $ENVIRONMENT"
    
    # 根据环境更新配置
    case $ENVIRONMENT in
        "production")
            export CADDY_DOMAIN="ecommerce-platform.com"
            export DB_PASSWORD=$(openssl rand -base64 32)
            export JWT_SECRET=$(openssl rand -base64 64)
            ;;
        "staging")
            export CADDY_DOMAIN="staging.ecommerce-platform.com"
            export DB_PASSWORD="staging_password"
            export JWT_SECRET="staging_jwt_secret"
            ;;
        "development")
            export CADDY_DOMAIN="dev.ecommerce-platform.com"
            export DB_PASSWORD="dev_password"
            export JWT_SECRET="dev_jwt_secret"
            ;;
    esac
    
    # 生成环境配置文件
    envsubst < infrastructure/docker-compose.template.yml > infrastructure/docker-compose.yml
    
    log "Configuration updated for $ENVIRONMENT"
}

# 健康检查
health_check() {
    log "Performing health checks..."
    
    local max_attempts=30
    local attempt=1
    
    while [ $attempt -le $max_attempts ]; do
        log "Health check attempt $attempt/$max_attempts"
        
        # 检查Caddy健康状态
        if curl -f http://localhost/health >/dev/null 2>&1; then
            log "Caddy health check passed"
        else
            log "Caddy health check failed"
            ((attempt++))
            sleep 10
            continue
        fi
        
        # 检查微服务健康状态
        local services=("user-service" "order-service" "payment-service")
        local all_healthy=true
        
        for service in "${services[@]}"; do
            if docker exec $service curl -f http://localhost:8080/health >/dev/null 2>&1; then
                log "$service health check passed"
            else
                log "$service health check failed"
                all_healthy=false
            fi
        done
        
        if $all_healthy; then
            log "All health checks passed"
            return 0
        fi
        
        ((attempt++))
        sleep 10
    done
    
    error "Health checks failed after $max_attempts attempts"
}

# 回滚函数
rollback() {
    log "Starting rollback process..."
    
    if [ ! -f ".last_backup" ]; then
        error "No backup found for rollback"
    fi
    
    BACKUP_PATH=$(cat .last_backup)
    
    if [ ! -d "$BACKUP_PATH" ]; then
        error "Backup directory not found: $BACKUP_PATH"
    fi
    
    log "Rolling back to backup: $BACKUP_PATH"
    
    # 停止当前服务
    docker-compose -f infrastructure/docker-compose.yml down
    
    # 恢复配置
    cp -r $BACKUP_PATH/caddy/ ./
    cp $BACKUP_PATH/docker-compose.yml infrastructure/
    
    # 恢复数据库
    if [ -f "$BACKUP_PATH/database_backup.sql" ]; then
        log "Restoring database..."
        docker-compose -f infrastructure/docker-compose.yml up -d postgres
        sleep 30
        docker exec -i postgres psql -U postgres < $BACKUP_PATH/database_backup.sql
    fi
    
    # 启动服务
    docker-compose -f infrastructure/docker-compose.yml up -d
    
    log "Rollback completed"
}

# 主部署流程
main() {
    log "Starting deployment of $PROJECT_NAME version $VERSION to $ENVIRONMENT"
    
    # 检查依赖
    check_dependencies
    
    # 创建备份
    backup_config
    
    # 构建镜像
    build_images
    
    # 更新配置
    update_config
    
    # 停止旧服务
    log "Stopping existing services..."
    docker-compose -f infrastructure/docker-compose.yml down
    
    # 启动新服务
    log "Starting new services..."
    docker-compose -f infrastructure/docker-compose.yml up -d
    
    # 等待服务启动
    sleep 60
    
    # 健康检查
    if health_check; then
        log "Deployment completed successfully"
        
        # 清理旧镜像
        log "Cleaning up old images..."
        docker image prune -f
        
        # 发送部署通知
        send_notification "success" "Deployment of $PROJECT_NAME $VERSION completed successfully"
    else
        log "Deployment failed, initiating rollback..."
        rollback
        send_notification "failure" "Deployment of $PROJECT_NAME $VERSION failed, rolled back"
        exit 1
    fi
}

# 发送通知
send_notification() {
    local status=$1
    local message=$2
    
    # Slack通知
    if [ -n "$SLACK_WEBHOOK_URL" ]; then
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"[$status] $message\"}" \
            $SLACK_WEBHOOK_URL
    fi
    
    # 邮件通知
    if [ -n "$NOTIFICATION_EMAIL" ]; then
        echo "$message" | mail -s "Deployment $status" $NOTIFICATION_EMAIL
    fi
}

# 脚本入口
if [ "$1" = "rollback" ]; then
    rollback
else
    main
fi

10.7.2 监控脚本

#!/bin/bash
# scripts/monitoring.sh

# 系统监控脚本
SERVICES=("caddy" "user-service" "order-service" "payment-service" "postgres" "redis")
ALERT_EMAIL="admin@ecommerce-platform.com"
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"

# 检查服务状态
check_services() {
    echo "Checking service status..."
    
    for service in "${SERVICES[@]}"; do
        if docker ps | grep -q $service; then
            echo "✓ $service is running"
        else
            echo "✗ $service is not running"
            send_alert "Service Down" "$service is not running"
        fi
    done
}

# 检查资源使用率
check_resources() {
    echo "Checking resource usage..."
    
    # CPU使用率
    CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    if (( $(echo "$CPU_USAGE > 80" | bc -l) )); then
        send_alert "High CPU Usage" "CPU usage is ${CPU_USAGE}%"
    fi
    
    # 内存使用率
    MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.2f", $3/$2 * 100.0}')
    if (( $(echo "$MEMORY_USAGE > 85" | bc -l) )); then
        send_alert "High Memory Usage" "Memory usage is ${MEMORY_USAGE}%"
    fi
    
    # 磁盘使用率
    DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | cut -d'%' -f1)
    if [ $DISK_USAGE -gt 85 ]; then
        send_alert "High Disk Usage" "Disk usage is ${DISK_USAGE}%"
    fi
}

# 检查应用健康状态
check_app_health() {
    echo "Checking application health..."
    
    # 检查主页响应
    if ! curl -f -s http://localhost/health >/dev/null; then
        send_alert "Application Health Check Failed" "Main application is not responding"
    fi
    
    # 检查API响应时间
    RESPONSE_TIME=$(curl -o /dev/null -s -w '%{time_total}' http://localhost/api/health)
    if (( $(echo "$RESPONSE_TIME > 2.0" | bc -l) )); then
        send_alert "Slow API Response" "API response time is ${RESPONSE_TIME}s"
    fi
}

# 发送告警
send_alert() {
    local title="$1"
    local message="$2"
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    echo "ALERT: $title - $message"
    
    # 发送邮件
    if [ -n "$ALERT_EMAIL" ]; then
        echo "[$timestamp] $message" | mail -s "$title" $ALERT_EMAIL
    fi
    
    # 发送Slack消息
    if [ -n "$SLACK_WEBHOOK" ]; then
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"🚨 $title: $message\"}" \
            $SLACK_WEBHOOK
    fi
}

# 主监控循环
main() {
    while true; do
        echo "=== Monitoring Check at $(date) ==="
        
        check_services
        check_resources
        check_app_health
        
        echo "Monitoring check completed"
        echo ""
        
        sleep 300  # 5分钟检查一次
    done
}

# 运行监控
if [ "$1" = "once" ]; then
    check_services
    check_resources
    check_app_health
else
    main
fi

10.8 性能测试

10.8.1 负载测试脚本

#!/bin/bash
# scripts/load_test.sh

# 负载测试配置
TARGET_URL="https://ecommerce-platform.com"
CONCURRENT_USERS=100
TEST_DURATION=300  # 5分钟
RAMP_UP_TIME=60    # 1分钟

# 安装依赖
install_dependencies() {
    echo "Installing load testing tools..."
    
    # 安装Apache Bench
    if ! command -v ab &> /dev/null; then
        sudo apt-get update
        sudo apt-get install -y apache2-utils
    fi
    
    # 安装wrk
    if ! command -v wrk &> /dev/null; then
        sudo apt-get install -y wrk
    fi
}

# Apache Bench测试
run_ab_test() {
    echo "Running Apache Bench test..."
    
    ab -n 10000 -c 100 -g ab_results.dat $TARGET_URL/ > ab_results.txt
    
    echo "Apache Bench test completed. Results saved to ab_results.txt"
}

# wrk测试
run_wrk_test() {
    echo "Running wrk test..."
    
    wrk -t12 -c100 -d300s --script=lua_scripts/post_test.lua $TARGET_URL > wrk_results.txt
    
    echo "wrk test completed. Results saved to wrk_results.txt"
}

# 创建Lua脚本用于POST测试
create_lua_scripts() {
    mkdir -p lua_scripts
    
    cat > lua_scripts/post_test.lua << 'EOF'
wrk.method = "POST"
wrk.body   = '{"username":"testuser","email":"test@example.com"}'
wrk.headers["Content-Type"] = "application/json"

function response(status, headers, body)
    if status ~= 200 and status ~= 201 then
        print("Error: " .. status .. " " .. body)
    end
end
EOF
}

# 分析结果
analyze_results() {
    echo "Analyzing test results..."
    
    if [ -f "ab_results.txt" ]; then
        echo "=== Apache Bench Results ==="
        grep -E "(Requests per second|Time per request|Transfer rate)" ab_results.txt
        echo ""
    fi
    
    if [ -f "wrk_results.txt" ]; then
        echo "=== wrk Results ==="
        cat wrk_results.txt
        echo ""
    fi
}

# 主函数
main() {
    install_dependencies
    create_lua_scripts
    
    echo "Starting load tests against $TARGET_URL"
    echo "Concurrent users: $CONCURRENT_USERS"
    echo "Test duration: $TEST_DURATION seconds"
    echo ""
    
    run_ab_test
    run_wrk_test
    analyze_results
    
    echo "Load testing completed!"
}

main

10.8.2 性能监控

#!/usr/bin/env python3
# scripts/performance_monitor.py

import time
import requests
import psutil
import json
from datetime import datetime
from typing import Dict, List

class PerformanceMonitor:
    def __init__(self, base_url: str = "http://localhost"):
        self.base_url = base_url
        self.metrics = []
    
    def collect_system_metrics(self) -> Dict:
        """收集系统性能指标"""
        return {
            'timestamp': datetime.now().isoformat(),
            'cpu_percent': psutil.cpu_percent(interval=1),
            'memory_percent': psutil.virtual_memory().percent,
            'disk_usage': psutil.disk_usage('/').percent,
            'network_io': psutil.net_io_counters()._asdict(),
            'load_average': psutil.getloadavg()
        }
    
    def collect_app_metrics(self) -> Dict:
        """收集应用性能指标"""
        metrics = {
            'timestamp': datetime.now().isoformat(),
            'endpoints': {}
        }
        
        endpoints = [
            '/health',
            '/api/users',
            '/api/orders',
            '/api/payments'
        ]
        
        for endpoint in endpoints:
            try:
                start_time = time.time()
                response = requests.get(f"{self.base_url}{endpoint}", timeout=10)
                response_time = time.time() - start_time
                
                metrics['endpoints'][endpoint] = {
                    'status_code': response.status_code,
                    'response_time': response_time,
                    'content_length': len(response.content)
                }
            except Exception as e:
                metrics['endpoints'][endpoint] = {
                    'error': str(e),
                    'status_code': 0,
                    'response_time': 0
                }
        
        return metrics
    
    def run_monitoring(self, duration: int = 3600, interval: int = 30):
        """运行性能监控"""
        print(f"Starting performance monitoring for {duration} seconds...")
        
        start_time = time.time()
        
        while time.time() - start_time < duration:
            # 收集指标
            system_metrics = self.collect_system_metrics()
            app_metrics = self.collect_app_metrics()
            
            combined_metrics = {
                'system': system_metrics,
                'application': app_metrics
            }
            
            self.metrics.append(combined_metrics)
            
            # 输出当前状态
            print(f"[{datetime.now()}] CPU: {system_metrics['cpu_percent']:.1f}%, "
                  f"Memory: {system_metrics['memory_percent']:.1f}%, "
                  f"API Health: {app_metrics['endpoints'].get('/health', {}).get('status_code', 'N/A')}")
            
            time.sleep(interval)
        
        # 保存结果
        self.save_metrics()
        self.generate_report()
    
    def save_metrics(self):
        """保存指标数据"""
        filename = f"performance_metrics_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        
        with open(filename, 'w') as f:
            json.dump(self.metrics, f, indent=2)
        
        print(f"Metrics saved to {filename}")
    
    def generate_report(self):
        """生成性能报告"""
        if not self.metrics:
            return
        
        # 计算统计信息
        cpu_values = [m['system']['cpu_percent'] for m in self.metrics]
        memory_values = [m['system']['memory_percent'] for m in self.metrics]
        
        health_response_times = []
        for m in self.metrics:
            health_data = m['application']['endpoints'].get('/health', {})
            if 'response_time' in health_data:
                health_response_times.append(health_data['response_time'])
        
        report = f"""
=== Performance Report ===
Monitoring Period: {len(self.metrics)} samples

System Metrics:
- CPU Usage: Avg {sum(cpu_values)/len(cpu_values):.1f}%, Max {max(cpu_values):.1f}%
- Memory Usage: Avg {sum(memory_values)/len(memory_values):.1f}%, Max {max(memory_values):.1f}%

Application Metrics:
- Health Endpoint Response Time: Avg {sum(health_response_times)/len(health_response_times)*1000:.1f}ms
- Max Response Time: {max(health_response_times)*1000:.1f}ms
"""
        
        print(report)
        
        with open(f"performance_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt", 'w') as f:
            f.write(report)

if __name__ == "__main__":
    monitor = PerformanceMonitor()
    monitor.run_monitoring(duration=1800, interval=30)  # 30分钟监控

10.9 故障排除

10.9.1 常见问题诊断

#!/bin/bash
# scripts/troubleshoot.sh

# 故障排除脚本
echo "=== Caddy E-commerce Platform Troubleshooting ==="

# 检查Docker服务
check_docker() {
    echo "Checking Docker services..."
    
    if ! docker info >/dev/null 2>&1; then
        echo "❌ Docker daemon is not running"
        echo "Solution: sudo systemctl start docker"
        return 1
    fi
    
    echo "✅ Docker is running"
    
    # 检查容器状态
    echo "Container status:"
    docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
    
    # 检查失败的容器
    failed_containers=$(docker ps -a --filter "status=exited" --format "{{.Names}}")
    if [ -n "$failed_containers" ]; then
        echo "❌ Failed containers found:"
        echo "$failed_containers"
        
        for container in $failed_containers; do
            echo "\nLogs for $container:"
            docker logs --tail 20 $container
        done
    fi
}

# 检查网络连接
check_network() {
    echo "\nChecking network connectivity..."
    
    # 检查端口占用
    echo "Port usage:"
    netstat -tlnp | grep -E ':(80|443|2019|5432|6379)'
    
    # 检查DNS解析
    if ! nslookup ecommerce-platform.com >/dev/null 2>&1; then
        echo "❌ DNS resolution failed for ecommerce-platform.com"
    else
        echo "✅ DNS resolution working"
    fi
    
    # 检查SSL证书
    if command -v openssl >/dev/null 2>&1; then
        echo "\nSSL certificate check:"
        echo | openssl s_client -connect ecommerce-platform.com:443 -servername ecommerce-platform.com 2>/dev/null | openssl x509 -noout -dates
    fi
}

# 检查日志
check_logs() {
    echo "\nChecking application logs..."
    
    # Caddy日志
    if [ -f "/var/log/caddy/error.log" ]; then
        echo "Recent Caddy errors:"
        tail -20 /var/log/caddy/error.log
    fi
    
    # 应用日志
    echo "\nRecent application logs:"
    docker logs --tail 20 caddy
    docker logs --tail 20 user-service
    docker logs --tail 20 order-service
}

# 检查资源使用
check_resources() {
    echo "\nChecking resource usage..."
    
    # 系统资源
    echo "System resources:"
    echo "CPU: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}')"
    echo "Memory: $(free -h | grep Mem | awk '{print $3"/"$2}')"
    echo "Disk: $(df -h / | awk 'NR==2 {print $5}')"
    
    # Docker资源
    echo "\nDocker resource usage:"
    docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
}

# 检查数据库连接
check_database() {
    echo "\nChecking database connectivity..."
    
    if docker exec postgres pg_isready -U postgres >/dev/null 2>&1; then
        echo "✅ PostgreSQL is accessible"
        
        # 检查数据库大小
        echo "Database sizes:"
        docker exec postgres psql -U postgres -c "\l+"
    else
        echo "❌ PostgreSQL connection failed"
    fi
    
    # 检查Redis
    if docker exec redis redis-cli ping >/dev/null 2>&1; then
        echo "✅ Redis is accessible"
        
        # 检查Redis信息
        echo "Redis info:"
        docker exec redis redis-cli info memory | grep used_memory_human
    else
        echo "❌ Redis connection failed"
    fi
}

# 性能诊断
performance_check() {
    echo "\nPerformance diagnostics..."
    
    # 检查响应时间
    echo "Response time check:"
    for endpoint in "/health" "/api/users" "/api/orders"; do
        response_time=$(curl -o /dev/null -s -w '%{time_total}' "http://localhost$endpoint" 2>/dev/null || echo "failed")
        echo "$endpoint: ${response_time}s"
    done
    
    # 检查连接数
    echo "\nActive connections:"
    netstat -an | grep :80 | wc -l
}

# 自动修复
auto_fix() {
    echo "\nAttempting automatic fixes..."
    
    # 重启失败的容器
    failed_containers=$(docker ps -a --filter "status=exited" --format "{{.Names}}")
    if [ -n "$failed_containers" ]; then
        echo "Restarting failed containers..."
        for container in $failed_containers; do
            echo "Restarting $container"
            docker restart $container
        done
    fi
    
    # 清理Docker资源
    echo "Cleaning up Docker resources..."
    docker system prune -f
    
    # 重新加载Caddy配置
    echo "Reloading Caddy configuration..."
    curl -X POST "http://localhost:2019/load" \
        -H "Content-Type: application/json" \
        -d @caddy/config/api.json
}

# 生成诊断报告
generate_report() {
    local report_file="troubleshoot_report_$(date +%Y%m%d_%H%M%S).txt"
    
    {
        echo "=== Troubleshooting Report ==="
        echo "Generated: $(date)"
        echo ""
        
        check_docker
        check_network
        check_database
        check_resources
        performance_check
        
    } > $report_file
    
    echo "\nDiagnostic report saved to: $report_file"
}

# 主函数
main() {
    case "$1" in
        "docker")
            check_docker
            ;;
        "network")
            check_network
            ;;
        "logs")
            check_logs
            ;;
        "database")
            check_database
            ;;
        "performance")
            performance_check
            ;;
        "fix")
            auto_fix
            ;;
        "report")
            generate_report
            ;;
        *)
            echo "Running full diagnostic..."
            check_docker
            check_network
            check_database
            check_resources
            performance_check
            check_logs
            ;;
    esac
}

main "$@"

10.10 项目总结

10.10.1 架构优势

通过本实战项目,我们展示了Caddy在现代Web架构中的强大能力:

  1. 自动HTTPS:零配置SSL/TLS证书管理
  2. 反向代理:高性能的微服务网关
  3. 负载均衡:内置多种负载均衡算法
  4. 监控集成:原生Prometheus指标支持
  5. 配置简洁:人性化的Caddyfile语法

10.10.2 性能指标

在我们的测试环境中,该架构实现了:

  • 响应时间:95%请求 < 200ms
  • 吞吐量:单节点 > 10,000 RPS
  • 可用性:99.9% SLA
  • SSL握手:< 100ms
  • 证书更新:自动化,零停机

10.10.3 最佳实践总结

  1. 配置管理

    • 使用环境变量管理敏感信息
    • 版本控制所有配置文件
    • 实施配置验证流程
  2. 安全加固

    • 启用所有安全头部
    • 实施访问控制和限流
    • 定期更新和审计
  3. 监控告警

    • 全面的指标收集
    • 主动告警机制
    • 可视化仪表板
  4. 运维自动化

    • 自动化部署流程
    • 健康检查和自愈
    • 备份和恢复策略

10.10.4 扩展建议

  1. 水平扩展

    • Kubernetes集群部署
    • 多区域负载均衡
    • CDN集成
  2. 功能增强

    • API网关功能
    • 服务网格集成
    • 边缘计算支持
  3. 安全提升

    • WAF集成
    • DDoS防护
    • 零信任架构

本章练习

练习1:基础部署

  1. 按照本章指导搭建完整的电商平台
  2. 配置自动HTTPS和基本安全策略
  3. 验证所有服务的健康状态

练习2:性能优化

  1. 使用提供的负载测试脚本进行压力测试
  2. 分析性能瓶颈并进行优化
  3. 实现缓存策略提升响应速度

练习3:监控告警

  1. 配置Prometheus和Grafana监控
  2. 设置关键指标的告警规则
  3. 模拟故障场景测试告警机制

练习4:故障恢复

  1. 模拟各种故障场景(服务宕机、数据库故障等)
  2. 使用故障排除脚本进行诊断
  3. 验证自动恢复和回滚机制

练习5:扩展功能

  1. 添加新的微服务(如推荐服务)
  2. 实现API版本管理
  3. 集成第三方服务(如支付网关)

通过完成这些练习,你将掌握使用Caddy构建和运维企业级Web服务的完整技能栈。