1. 部署架构概述

1.1 部署模式

Kong支持多种部署模式,每种模式适用于不同的场景和需求:

1.1.1 传统模式(DB模式)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Kong 1    │    │   Kong 2    │    │   Kong 3    │
│  (Gateway)  │    │  (Gateway)  │    │  (Gateway)  │
└─────────────┘    └─────────────┘    └─────────────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │
                    ┌─────────────┐
                    │ PostgreSQL  │
                    │  Database   │
                    └─────────────┘

特点: - 配置存储在数据库中 - 支持动态配置更新 - 适合大规模部署 - 需要数据库高可用

1.1.2 DB-less模式

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Kong 1    │    │   Kong 2    │    │   Kong 3    │
│ (DB-less)   │    │ (DB-less)   │    │ (DB-less)   │
│ config.yml  │    │ config.yml  │    │ config.yml  │
└─────────────┘    └─────────────┘    └─────────────┘

特点: - 配置通过YAML文件管理 - 无数据库依赖 - 配置不可变 - 适合云原生部署

1.1.3 混合模式

┌─────────────┐    ┌─────────────┐
│ Control     │    │   Data      │
│ Plane       │    │   Plane     │
│ (Admin API) │    │ (Gateway)   │
└─────────────┘    └─────────────┘
       │                   │
       │            ┌─────────────┐
       └────────────│ PostgreSQL  │
                    │  Database   │
                    └─────────────┘

特点: - 控制平面和数据平面分离 - 提高安全性 - 支持多区域部署 - 适合企业级场景

1.2 架构组件

1.2.1 核心组件

  • Kong Gateway: 处理API请求的核心组件
  • Kong Admin API: 管理Kong配置的REST API
  • Kong Manager: Web界面管理工具(企业版)
  • 数据库: 存储配置信息(PostgreSQL/Cassandra)

1.2.2 可选组件

  • Kong Dev Portal: 开发者门户(企业版)
  • Kong Vitals: 监控和分析(企业版)
  • Kong Immunity: 安全分析(企业版)

2. 环境准备

2.1 系统要求

2.1.1 硬件要求

# 最小配置
minimum:
  cpu: 1 core
  memory: 1GB RAM
  disk: 10GB
  network: 100Mbps

# 推荐配置
recommended:
  cpu: 4 cores
  memory: 8GB RAM
  disk: 100GB SSD
  network: 1Gbps

# 生产环境
production:
  cpu: 8+ cores
  memory: 16GB+ RAM
  disk: 500GB+ SSD
  network: 10Gbps

2.1.2 操作系统支持

# 支持的操作系统
- Ubuntu 18.04, 20.04, 22.04
- CentOS 7, 8
- RHEL 7, 8, 9
- Amazon Linux 2
- Debian 9, 10, 11
- Alpine Linux

2.1.3 依赖软件

# 必需依赖
- OpenResty 1.19.9+
- LuaJIT 2.1+
- OpenSSL 1.1.1+

# 数据库(选择其一)
- PostgreSQL 9.5+
- Cassandra 3.11+

# 可选依赖
- Redis(用于速率限制等插件)
- Prometheus(监控)
- Grafana(可视化)

2.2 网络规划

2.2.1 端口规划

ports:
  proxy:
    http: 8000
    https: 8443
  admin:
    http: 8001
    https: 8444
  manager:
    http: 8002
    https: 8445
  portal:
    http: 8003
    https: 8446
  portal_api:
    http: 8004
    https: 8447

2.2.2 防火墙配置

# 开放必要端口
sudo ufw allow 8000/tcp  # Proxy HTTP
sudo ufw allow 8443/tcp  # Proxy HTTPS
sudo ufw allow 8001/tcp  # Admin API (内网)
sudo ufw allow 8444/tcp  # Admin API HTTPS (内网)

# 数据库端口(内网)
sudo ufw allow from 10.0.0.0/8 to any port 5432  # PostgreSQL
sudo ufw allow from 10.0.0.0/8 to any port 9042  # Cassandra

# 启用防火墙
sudo ufw enable

3. 单机部署

3.1 Docker部署

3.1.1 基础Docker部署

# 创建网络
docker network create kong-net

# 启动PostgreSQL
docker run -d --name kong-database \
  --network=kong-net \
  -p 5432:5432 \
  -e "POSTGRES_USER=kong" \
  -e "POSTGRES_PASSWORD=kong" \
  -e "POSTGRES_DB=kong" \
  postgres:13

# 等待数据库启动
sleep 30

# 初始化数据库
docker run --rm \
  --network=kong-net \
  -e "KONG_DATABASE=postgres" \
  -e "KONG_PG_HOST=kong-database" \
  -e "KONG_PG_USER=kong" \
  -e "KONG_PG_PASSWORD=kong" \
  -e "KONG_PG_DATABASE=kong" \
  kong:latest kong migrations bootstrap

# 启动Kong
docker run -d --name kong \
  --network=kong-net \
  -e "KONG_DATABASE=postgres" \
  -e "KONG_PG_HOST=kong-database" \
  -e "KONG_PG_USER=kong" \
  -e "KONG_PG_PASSWORD=kong" \
  -e "KONG_PG_DATABASE=kong" \
  -e "KONG_PROXY_ACCESS_LOG=/dev/stdout" \
  -e "KONG_ADMIN_ACCESS_LOG=/dev/stdout" \
  -e "KONG_PROXY_ERROR_LOG=/dev/stderr" \
  -e "KONG_ADMIN_ERROR_LOG=/dev/stderr" \
  -e "KONG_ADMIN_LISTEN=0.0.0.0:8001" \
  -p 8000:8000 \
  -p 8443:8443 \
  -p 8001:8001 \
  -p 8444:8444 \
  kong:latest

# 验证部署
curl -i http://localhost:8001/

3.1.2 Docker Compose部署

# docker-compose.yml
version: '3.8'

services:
  kong-database:
    image: postgres:13
    container_name: kong-database
    environment:
      POSTGRES_USER: kong
      POSTGRES_PASSWORD: kong
      POSTGRES_DB: kong
    volumes:
      - kong_data:/var/lib/postgresql/data
    networks:
      - kong-net
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U kong"]
      interval: 30s
      timeout: 10s
      retries: 3

  kong-migration:
    image: kong:latest
    container_name: kong-migration
    command: kong migrations bootstrap
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong
      KONG_PG_DATABASE: kong
    depends_on:
      kong-database:
        condition: service_healthy
    networks:
      - kong-net
    restart: "no"

  kong:
    image: kong:latest
    container_name: kong
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong
      KONG_PG_DATABASE: kong
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: 0.0.0.0:8001
      KONG_ADMIN_GUI_URL: http://localhost:8002
    ports:
      - "8000:8000"
      - "8443:8443"
      - "8001:8001"
      - "8444:8444"
    depends_on:
      - kong-migration
    networks:
      - kong-net
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "kong", "health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # Kong Manager (可选)
  kong-manager:
    image: pantsel/konga:latest
    container_name: kong-manager
    environment:
      NODE_ENV: production
      KONGA_HOOK_TIMEOUT: 120000
    ports:
      - "1337:1337"
    depends_on:
      - kong
    networks:
      - kong-net
    restart: unless-stopped

volumes:
  kong_data:

networks:
  kong-net:
    driver: bridge
# 启动服务
docker-compose up -d

# 查看服务状态
docker-compose ps

# 查看日志
docker-compose logs kong

# 停止服务
docker-compose down

3.2 原生安装

3.2.1 Ubuntu/Debian安装

# 添加Kong仓库
curl -fsSL https://download.konghq.com/gateway-2.x-ubuntu-$(lsb_release -cs)/gpg | sudo apt-key add -
echo "deb https://download.konghq.com/gateway-2.x-ubuntu-$(lsb_release -cs)/ default all" | sudo tee /etc/apt/sources.list.d/kong.list

# 更新包列表
sudo apt update

# 安装Kong
sudo apt install kong

# 安装PostgreSQL
sudo apt install postgresql postgresql-contrib

# 配置PostgreSQL
sudo -u postgres createuser kong
sudo -u postgres createdb kong --owner kong
sudo -u postgres psql -c "ALTER USER kong PASSWORD 'kong';"

# 配置Kong
sudo cp /etc/kong/kong.conf.default /etc/kong/kong.conf
sudo vim /etc/kong/kong.conf

# 初始化数据库
sudo kong migrations bootstrap -c /etc/kong/kong.conf

# 启动Kong
sudo kong start -c /etc/kong/kong.conf

# 设置开机自启
sudo systemctl enable kong
sudo systemctl start kong

3.2.2 CentOS/RHEL安装

# 添加Kong仓库
sudo yum install -y wget
wget https://download.konghq.com/gateway-2.x-centos-8/Packages/k/kong-2.8.1.el8.amd64.rpm

# 安装Kong
sudo yum install -y kong-2.8.1.el8.amd64.rpm

# 安装PostgreSQL
sudo yum install -y postgresql-server postgresql-contrib

# 初始化PostgreSQL
sudo postgresql-setup initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql

# 配置PostgreSQL
sudo -u postgres createuser kong
sudo -u postgres createdb kong --owner kong
sudo -u postgres psql -c "ALTER USER kong PASSWORD 'kong';"

# 修改pg_hba.conf
sudo vim /var/lib/pgsql/data/pg_hba.conf
# 添加: local   kong    kong                    md5

# 重启PostgreSQL
sudo systemctl restart postgresql

# 配置Kong
sudo cp /etc/kong/kong.conf.default /etc/kong/kong.conf

# 编辑配置文件
sudo vim /etc/kong/kong.conf

3.2.3 Kong配置文件

# /etc/kong/kong.conf

# 数据库配置
database = postgres
pg_host = 127.0.0.1
pg_port = 5432
pg_user = kong
pg_password = kong
pg_database = kong

# 代理配置
proxy_listen = 0.0.0.0:8000, 0.0.0.0:8443 ssl

# Admin API配置
admin_listen = 127.0.0.1:8001, 127.0.0.1:8444 ssl

# 日志配置
proxy_access_log = /var/log/kong/access.log
proxy_error_log = /var/log/kong/error.log
admin_access_log = /var/log/kong/admin_access.log
admin_error_log = /var/log/kong/admin_error.log

# 性能配置
nginx_worker_processes = auto
nginx_worker_connections = 1024

# 插件配置
plugins = bundled

# SSL配置
ssl_cert = /etc/kong/ssl/kong.crt
ssl_cert_key = /etc/kong/ssl/kong.key

# 其他配置
log_level = notice
mem_cache_size = 128m

4. 集群部署

4.1 高可用架构

4.1.1 多节点集群

                    ┌─────────────┐
                    │ Load        │
                    │ Balancer    │
                    └─────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
 ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
 │   Kong 1    │    │   Kong 2    │    │   Kong 3    │
 │  (Node 1)   │    │  (Node 2)   │    │  (Node 3)   │
 └─────────────┘    └─────────────┘    └─────────────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌─────────────┐
                    │ PostgreSQL  │
                    │  Cluster    │
                    └─────────────┘

4.1.2 数据库高可用

# PostgreSQL主从配置
postgresql_cluster:
  master:
    host: pg-master.internal
    port: 5432
    user: kong
    password: kong_password
    database: kong
  
  slaves:
    - host: pg-slave1.internal
      port: 5432
    - host: pg-slave2.internal
      port: 5432
  
  connection_pool:
    max_connections: 100
    idle_timeout: 300
    max_lifetime: 3600

4.2 Kubernetes部署

4.2.1 Namespace和ConfigMap

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: kong

---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kong-config
  namespace: kong
data:
  kong.conf: |
    database = postgres
    pg_host = postgres-service
    pg_port = 5432
    pg_user = kong
    pg_password = kong
    pg_database = kong
    
    proxy_listen = 0.0.0.0:8000, 0.0.0.0:8443 ssl
    admin_listen = 0.0.0.0:8001, 0.0.0.0:8444 ssl
    
    log_level = notice
    proxy_access_log = /dev/stdout
    proxy_error_log = /dev/stderr
    admin_access_log = /dev/stdout
    admin_error_log = /dev/stderr

4.2.2 PostgreSQL部署

# postgres-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: kong
type: Opaque
data:
  username: a29uZw==  # kong
  password: a29uZw==  # kong

---
# postgres-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: kong
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: fast-ssd

---
# postgres-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: kong
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        env:
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: POSTGRES_DB
          value: kong
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - kong
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - kong
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc

---
# postgres-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: kong
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
  type: ClusterIP

4.2.3 Kong部署

# kong-migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: kong-migration
  namespace: kong
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: kong-migration
        image: kong:latest
        command: ["kong", "migrations", "bootstrap"]
        env:
        - name: KONG_DATABASE
          value: postgres
        - name: KONG_PG_HOST
          value: postgres-service
        - name: KONG_PG_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: KONG_PG_DATABASE
          value: kong

---
# kong-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kong
  namespace: kong
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kong
  template:
    metadata:
      labels:
        app: kong
    spec:
      containers:
      - name: kong
        image: kong:latest
        env:
        - name: KONG_DATABASE
          value: postgres
        - name: KONG_PG_HOST
          value: postgres-service
        - name: KONG_PG_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: KONG_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: KONG_PG_DATABASE
          value: kong
        - name: KONG_PROXY_ACCESS_LOG
          value: /dev/stdout
        - name: KONG_ADMIN_ACCESS_LOG
          value: /dev/stdout
        - name: KONG_PROXY_ERROR_LOG
          value: /dev/stderr
        - name: KONG_ADMIN_ERROR_LOG
          value: /dev/stderr
        - name: KONG_ADMIN_LISTEN
          value: 0.0.0.0:8001
        ports:
        - containerPort: 8000
          name: proxy
        - containerPort: 8443
          name: proxy-ssl
        - containerPort: 8001
          name: admin
        - containerPort: 8444
          name: admin-ssl
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /status
            port: 8001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /status/ready
            port: 8001
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: kong-config
          mountPath: /etc/kong
      volumes:
      - name: kong-config
        configMap:
          name: kong-config

---
# kong-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: kong-proxy
  namespace: kong
spec:
  type: LoadBalancer
  ports:
  - name: proxy
    port: 80
    targetPort: 8000
    protocol: TCP
  - name: proxy-ssl
    port: 443
    targetPort: 8443
    protocol: TCP
  selector:
    app: kong

---
apiVersion: v1
kind: Service
metadata:
  name: kong-admin
  namespace: kong
spec:
  type: ClusterIP
  ports:
  - name: admin
    port: 8001
    targetPort: 8001
    protocol: TCP
  selector:
    app: kong

4.2.4 HPA自动扩缩容

# kong-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kong-hpa
  namespace: kong
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kong
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

4.3 Helm部署

4.3.1 Helm Chart安装

# 添加Kong Helm仓库
helm repo add kong https://charts.konghq.com
helm repo update

# 创建values文件
cat > kong-values.yaml << EOF
image:
  repository: kong
  tag: "latest"

env:
  database: postgres
  pg_host: postgres-service
  pg_user: kong
  pg_password: kong
  pg_database: kong

proxy:
  enabled: true
  type: LoadBalancer
  http:
    enabled: true
    servicePort: 80
    containerPort: 8000
  tls:
    enabled: true
    servicePort: 443
    containerPort: 8443

admin:
  enabled: true
  type: ClusterIP
  http:
    enabled: true
    servicePort: 8001
    containerPort: 8001

replicaCount: 3

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

postgresql:
  enabled: true
  auth:
    username: kong
    password: kong
    database: kong
  primary:
    persistence:
      enabled: true
      size: 20Gi
EOF

# 安装Kong
helm install kong kong/kong -f kong-values.yaml -n kong --create-namespace

# 查看部署状态
helm status kong -n kong
kubectl get pods -n kong

# 升级Kong
helm upgrade kong kong/kong -f kong-values.yaml -n kong

# 卸载Kong
helm uninstall kong -n kong

5. 配置管理

5.1 配置文件管理

5.1.1 环境配置分离

# 开发环境配置
# /etc/kong/kong-dev.conf
database = postgres
pg_host = dev-postgres.internal
pg_user = kong_dev
pg_password = dev_password
pg_database = kong_dev
log_level = debug
proxy_listen = 0.0.0.0:8000
admin_listen = 0.0.0.0:8001

# 测试环境配置
# /etc/kong/kong-test.conf
database = postgres
pg_host = test-postgres.internal
pg_user = kong_test
pg_password = test_password
pg_database = kong_test
log_level = info
proxy_listen = 0.0.0.0:8000
admin_listen = 127.0.0.1:8001

# 生产环境配置
# /etc/kong/kong-prod.conf
database = postgres
pg_host = prod-postgres.internal
pg_user = kong_prod
pg_password = prod_password
pg_database = kong_prod
log_level = warn
proxy_listen = 0.0.0.0:8000, 0.0.0.0:8443 ssl
admin_listen = 127.0.0.1:8001, 127.0.0.1:8444 ssl

5.1.2 环境变量配置

# 环境变量配置文件
# /etc/kong/kong.env

# 数据库配置
export KONG_DATABASE=postgres
export KONG_PG_HOST=${DB_HOST:-localhost}
export KONG_PG_PORT=${DB_PORT:-5432}
export KONG_PG_USER=${DB_USER:-kong}
export KONG_PG_PASSWORD=${DB_PASSWORD}
export KONG_PG_DATABASE=${DB_NAME:-kong}

# 代理配置
export KONG_PROXY_LISTEN="0.0.0.0:8000, 0.0.0.0:8443 ssl"
export KONG_ADMIN_LISTEN="127.0.0.1:8001, 127.0.0.1:8444 ssl"

# 日志配置
export KONG_LOG_LEVEL=${LOG_LEVEL:-notice}
export KONG_PROXY_ACCESS_LOG=/var/log/kong/access.log
export KONG_PROXY_ERROR_LOG=/var/log/kong/error.log

# 性能配置
export KONG_NGINX_WORKER_PROCESSES=${WORKER_PROCESSES:-auto}
export KONG_NGINX_WORKER_CONNECTIONS=${WORKER_CONNECTIONS:-1024}

# 插件配置
export KONG_PLUGINS=${KONG_PLUGINS:-bundled}

# SSL配置
export KONG_SSL_CERT=${SSL_CERT_PATH}
export KONG_SSL_CERT_KEY=${SSL_KEY_PATH}

5.2 声明式配置

5.2.1 YAML配置文件

# kong.yaml
_format_version: "2.1"
_transform: true

services:
- name: user-service
  url: http://user-service.internal:8080
  tags:
    - production
    - user
  routes:
  - name: user-api
    paths:
    - /api/users
    methods:
    - GET
    - POST
    - PUT
    - DELETE
    strip_path: false
    preserve_host: false
    tags:
    - api
    - user

- name: order-service
  url: http://order-service.internal:8080
  tags:
    - production
    - order
  routes:
  - name: order-api
    paths:
    - /api/orders
    methods:
    - GET
    - POST
    - PUT
    - DELETE
    strip_path: false
    tags:
    - api
    - order

upstreams:
- name: user-service-upstream
  algorithm: round-robin
  hash_on: none
  hash_fallback: none
  healthchecks:
    active:
      healthy:
        interval: 5
        successes: 3
      unhealthy:
        interval: 5
        tcp_failures: 3
        http_failures: 3
    passive:
      healthy:
        successes: 3
      unhealthy:
        tcp_failures: 3
        http_failures: 3
  targets:
  - target: user-service-1.internal:8080
    weight: 100
  - target: user-service-2.internal:8080
    weight: 100
  - target: user-service-3.internal:8080
    weight: 100

consumers:
- username: api-client
  custom_id: client-001
  tags:
  - external
  - api-client

- username: mobile-app
  custom_id: mobile-001
  tags:
  - mobile
  - app

plugins:
- name: rate-limiting
  config:
    minute: 1000
    hour: 10000
    policy: local
  tags:
  - rate-limiting
  - global

- name: prometheus
  config:
    per_consumer: true
    status_code_metrics: true
    latency_metrics: true
    bandwidth_metrics: true
  tags:
  - monitoring
  - prometheus

- name: cors
  config:
    origins:
    - "*"
    methods:
    - GET
    - POST
    - PUT
    - DELETE
    - OPTIONS
    headers:
    - Accept
    - Accept-Version
    - Content-Length
    - Content-MD5
    - Content-Type
    - Date
    - X-Auth-Token
    exposed_headers:
    - X-Auth-Token
    credentials: true
    max_age: 3600
  tags:
  - cors
  - security

- name: key-auth
  service: user-service
  config:
    key_names:
    - apikey
    - x-api-key
    key_in_body: false
    hide_credentials: true
  tags:
  - authentication
  - api-key

key_auths:
- consumer: api-client
  key: client-api-key-12345
  tags:
  - api-client
  - production

- consumer: mobile-app
  key: mobile-api-key-67890
  tags:
  - mobile-app
  - production

5.2.2 配置验证和应用

# 验证配置文件
kong config parse kong.yaml

# 应用配置(DB-less模式)
kong start -c kong.conf --declarative-config kong.yaml

# 重新加载配置
kong reload -c kong.conf --declarative-config kong.yaml

# 通过Admin API应用配置
curl -X POST http://localhost:8001/config \
  -F config=@kong.yaml

# 验证当前配置
curl -s http://localhost:8001/config | jq .

5.3 配置版本控制

5.3.1 Git版本控制

# 初始化配置仓库
mkdir kong-config
cd kong-config
git init

# 创建目录结构
mkdir -p environments/{dev,test,prod}
mkdir -p services
mkdir -p plugins
mkdir -p consumers

# 环境特定配置
cat > environments/dev/kong.yaml << EOF
_format_version: "2.1"
_transform: true

services:
- name: user-service
  url: http://user-service-dev.internal:8080
  tags: ["dev", "user"]
EOF

cat > environments/prod/kong.yaml << EOF
_format_version: "2.1"
_transform: true

services:
- name: user-service
  url: http://user-service-prod.internal:8080
  tags: ["prod", "user"]
EOF

# 提交配置
git add .
git commit -m "Initial Kong configuration"

# 创建分支
git checkout -b feature/new-service
# 修改配置...
git add .
git commit -m "Add new service configuration"

# 合并到主分支
git checkout main
git merge feature/new-service

5.3.2 配置部署脚本

#!/bin/bash
# deploy-config.sh

set -e

ENVIRONMENT=${1:-dev}
CONFIG_FILE="environments/${ENVIRONMENT}/kong.yaml"
KONG_ADMIN_URL=${KONG_ADMIN_URL:-http://localhost:8001}

if [ ! -f "$CONFIG_FILE" ]; then
    echo "Error: Configuration file $CONFIG_FILE not found"
    exit 1
fi

echo "Validating configuration for $ENVIRONMENT..."
kong config parse "$CONFIG_FILE"

echo "Backing up current configuration..."
curl -s "$KONG_ADMIN_URL/config" > "backup-$(date +%Y%m%d-%H%M%S).json"

echo "Applying configuration to $ENVIRONMENT..."
curl -X POST "$KONG_ADMIN_URL/config" \
    -F "config=@$CONFIG_FILE"

echo "Verifying deployment..."
sleep 5
curl -s "$KONG_ADMIN_URL/status" | jq .

echo "Configuration deployed successfully to $ENVIRONMENT"

6. 监控与日志

6.1 监控系统

6.1.1 Prometheus监控

# prometheus-config.yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "kong-alerts.yml"

scrape_configs:
  - job_name: 'kong'
    static_configs:
      - targets: ['kong:8001']
    metrics_path: '/metrics'
    scrape_interval: 5s
    scrape_timeout: 5s

  - job_name: 'kong-cluster'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - kong
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: kong
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: ${1}:8001

6.1.2 Grafana仪表板

{
  "dashboard": {
    "title": "Kong Gateway Monitoring",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(kong_http_requests_total[5m])",
            "legendFormat": "{{service}}"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(kong_latency_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.50, rate(kong_latency_bucket[5m]))",
            "legendFormat": "50th percentile"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(kong_http_status{code=~\"5..\"}[5m])",
            "legendFormat": "5xx errors"
          },
          {
            "expr": "rate(kong_http_status{code=~\"4..\"}[5m])",
            "legendFormat": "4xx errors"
          }
        ]
      }
    ]
  }
}

6.2 日志管理

6.2.1 日志配置

# Kong日志配置
# /etc/kong/kong.conf

# 访问日志
proxy_access_log = /var/log/kong/access.log
admin_access_log = /var/log/kong/admin_access.log

# 错误日志
proxy_error_log = /var/log/kong/error.log
admin_error_log = /var/log/kong/admin_error.log

# 日志级别
log_level = notice

# 自定义日志格式
nginx_http_log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time $upstream_response_time'

6.2.2 Logrotate配置

# /etc/logrotate.d/kong
/var/log/kong/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 644 kong kong
    postrotate
        /usr/local/bin/kong reload > /dev/null 2>&1 || true
    endscript
}

6.2.3 ELK集成

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/kong/access.log
  fields:
    service: kong
    log_type: access
  fields_under_root: true
  multiline.pattern: '^\d{4}/\d{2}/\d{2}'
  multiline.negate: true
  multiline.match: after

- type: log
  enabled: true
  paths:
    - /var/log/kong/error.log
  fields:
    service: kong
    log_type: error
  fields_under_root: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "kong-logs-%{+yyyy.MM.dd}"

processors:
- add_host_metadata:
    when.not.contains.tags: forwarded
- add_docker_metadata: ~
- add_kubernetes_metadata: ~

6.3 告警配置

6.3.1 Prometheus告警规则

# kong-alerts.yml
groups:
- name: kong.rules
  rules:
  - alert: KongHighErrorRate
    expr: |
      (
        rate(kong_http_status{code=~"5.."}[5m]) /
        rate(kong_http_requests_total[5m])
      ) * 100 > 5
    for: 2m
    labels:
      severity: critical
      service: kong
    annotations:
      summary: "Kong high error rate detected"
      description: "Kong error rate is {{ $value }}% for the last 5 minutes"

  - alert: KongHighLatency
    expr: |
      histogram_quantile(0.95, rate(kong_latency_bucket[5m])) > 1000
    for: 5m
    labels:
      severity: warning
      service: kong
    annotations:
      summary: "Kong high latency detected"
      description: "Kong 95th percentile latency is {{ $value }}ms"

  - alert: KongServiceDown
    expr: up{job="kong"} == 0
    for: 1m
    labels:
      severity: critical
      service: kong
    annotations:
      summary: "Kong service is down"
      description: "Kong service has been down for more than 1 minute"

  - alert: KongDatabaseConnectionFailed
    expr: kong_database_reachable == 0
    for: 30s
    labels:
      severity: critical
      service: kong
    annotations:
      summary: "Kong database connection failed"
      description: "Kong cannot connect to the database"

  - alert: KongMemoryUsageHigh
    expr: |
      (
        kong_memory_workers_lua_vms_bytes /
        kong_memory_workers_lua_vms_bytes
      ) * 100 > 80
    for: 5m
    labels:
      severity: warning
      service: kong
    annotations:
      summary: "Kong memory usage is high"
      description: "Kong memory usage is {{ $value }}%"

6.3.2 AlertManager配置

# alertmanager.yml
global:
  smtp_smarthost: 'smtp.company.com:587'
  smtp_from: 'alerts@company.com'

route:
  group_by: ['alertname', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
  routes:
  - match:
      severity: critical
    receiver: 'critical-alerts'
  - match:
      severity: warning
    receiver: 'warning-alerts'

receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://webhook-service:8080/alerts'

- name: 'critical-alerts'
  email_configs:
  - to: 'ops-team@company.com'
    subject: 'CRITICAL: Kong Alert - {{ .GroupLabels.alertname }}'
    body: |
      {{ range .Alerts }}
      Alert: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      {{ end }}
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    channel: '#ops-alerts'
    title: 'CRITICAL Kong Alert'
    text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

- name: 'warning-alerts'
  email_configs:
  - to: 'dev-team@company.com'
    subject: 'WARNING: Kong Alert - {{ .GroupLabels.alertname }}'
    body: |
      {{ range .Alerts }}
      Alert: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      {{ end }}

7. 备份与恢复

7.1 数据库备份

7.1.1 PostgreSQL备份

#!/bin/bash
# backup-postgres.sh

set -e

# 配置变量
DB_HOST=${DB_HOST:-localhost}
DB_PORT=${DB_PORT:-5432}
DB_USER=${DB_USER:-kong}
DB_NAME=${DB_NAME:-kong}
BACKUP_DIR=${BACKUP_DIR:-/backup/kong}
RETENTION_DAYS=${RETENTION_DAYS:-30}

# 创建备份目录
mkdir -p "$BACKUP_DIR"

# 生成备份文件名
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
BACKUP_FILE="$BACKUP_DIR/kong_backup_$TIMESTAMP.sql"

echo "Starting Kong database backup..."

# 执行备份
PGPASSWORD="$DB_PASSWORD" pg_dump \
    -h "$DB_HOST" \
    -p "$DB_PORT" \
    -U "$DB_USER" \
    -d "$DB_NAME" \
    --verbose \
    --no-password \
    --format=custom \
    --file="$BACKUP_FILE"

if [ $? -eq 0 ]; then
    echo "Backup completed successfully: $BACKUP_FILE"
    
    # 压缩备份文件
    gzip "$BACKUP_FILE"
    echo "Backup compressed: $BACKUP_FILE.gz"
    
    # 清理旧备份
    find "$BACKUP_DIR" -name "kong_backup_*.sql.gz" -mtime +$RETENTION_DAYS -delete
    echo "Old backups cleaned up (older than $RETENTION_DAYS days)"
else
    echo "Backup failed!"
    exit 1
fi

7.1.2 自动化备份

# 添加到crontab
# crontab -e

# 每天凌晨2点执行备份
0 2 * * * /opt/kong/scripts/backup-postgres.sh >> /var/log/kong/backup.log 2>&1

# 每周日凌晨3点执行完整备份
0 3 * * 0 /opt/kong/scripts/backup-postgres.sh full >> /var/log/kong/backup.log 2>&1

7.1.3 云存储备份

#!/bin/bash
# backup-to-s3.sh

set -e

# AWS S3配置
S3_BUCKET=${S3_BUCKET:-kong-backups}
S3_PREFIX=${S3_PREFIX:-database}
AWS_REGION=${AWS_REGION:-us-west-2}

# 本地备份目录
BACKUP_DIR=${BACKUP_DIR:-/backup/kong}

# 上传最新备份到S3
LATEST_BACKUP=$(ls -t "$BACKUP_DIR"/kong_backup_*.sql.gz | head -1)

if [ -f "$LATEST_BACKUP" ]; then
    echo "Uploading backup to S3: $LATEST_BACKUP"
    
    aws s3 cp "$LATEST_BACKUP" \
        "s3://$S3_BUCKET/$S3_PREFIX/$(basename "$LATEST_BACKUP")" \
        --region "$AWS_REGION" \
        --storage-class STANDARD_IA
    
    if [ $? -eq 0 ]; then
        echo "Backup uploaded successfully to S3"
    else
        echo "Failed to upload backup to S3"
        exit 1
    fi
else
    echo "No backup file found"
    exit 1
fi

# 清理S3中的旧备份(保留30天)
aws s3api list-objects-v2 \
    --bucket "$S3_BUCKET" \
    --prefix "$S3_PREFIX/" \
    --query "Contents[?LastModified<='$(date -d '30 days ago' --iso-8601)'].Key" \
    --output text | \
while read -r key; do
    if [ -n "$key" ]; then
        echo "Deleting old backup: $key"
        aws s3 rm "s3://$S3_BUCKET/$key"
    fi
done

7.2 配置备份

7.2.1 声明式配置备份

#!/bin/bash
# backup-config.sh

set -e

KONG_ADMIN_URL=${KONG_ADMIN_URL:-http://localhost:8001}
BACKUP_DIR=${BACKUP_DIR:-/backup/kong/config}
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")

# 创建备份目录
mkdir -p "$BACKUP_DIR"

echo "Backing up Kong configuration..."

# 备份完整配置
curl -s "$KONG_ADMIN_URL/config" > "$BACKUP_DIR/kong_config_$TIMESTAMP.json"

# 备份各个资源
curl -s "$KONG_ADMIN_URL/services" > "$BACKUP_DIR/services_$TIMESTAMP.json"
curl -s "$KONG_ADMIN_URL/routes" > "$BACKUP_DIR/routes_$TIMESTAMP.json"
curl -s "$KONG_ADMIN_URL/consumers" > "$BACKUP_DIR/consumers_$TIMESTAMP.json"
curl -s "$KONG_ADMIN_URL/plugins" > "$BACKUP_DIR/plugins_$TIMESTAMP.json"
curl -s "$KONG_ADMIN_URL/upstreams" > "$BACKUP_DIR/upstreams_$TIMESTAMP.json"
curl -s "$KONG_ADMIN_URL/certificates" > "$BACKUP_DIR/certificates_$TIMESTAMP.json"

# 压缩备份
tar -czf "$BACKUP_DIR/kong_config_backup_$TIMESTAMP.tar.gz" -C "$BACKUP_DIR" \
    kong_config_$TIMESTAMP.json \
    services_$TIMESTAMP.json \
    routes_$TIMESTAMP.json \
    consumers_$TIMESTAMP.json \
    plugins_$TIMESTAMP.json \
    upstreams_$TIMESTAMP.json \
    certificates_$TIMESTAMP.json

# 清理临时文件
rm "$BACKUP_DIR"/*_$TIMESTAMP.json

echo "Configuration backup completed: kong_config_backup_$TIMESTAMP.tar.gz"

7.3 恢复流程

7.3.1 数据库恢复

#!/bin/bash
# restore-postgres.sh

set -e

BACKUP_FILE=$1
DB_HOST=${DB_HOST:-localhost}
DB_PORT=${DB_PORT:-5432}
DB_USER=${DB_USER:-kong}
DB_NAME=${DB_NAME:-kong}

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: $0 <backup_file>"
    exit 1
fi

if [ ! -f "$BACKUP_FILE" ]; then
    echo "Backup file not found: $BACKUP_FILE"
    exit 1
fi

echo "WARNING: This will restore the Kong database from backup."
echo "All current data will be lost!"
read -p "Are you sure you want to continue? (yes/no): " confirm

if [ "$confirm" != "yes" ]; then
    echo "Restore cancelled."
    exit 0
fi

echo "Stopping Kong..."
sudo systemctl stop kong

echo "Dropping existing database..."
PGPASSWORD="$DB_PASSWORD" dropdb \
    -h "$DB_HOST" \
    -p "$DB_PORT" \
    -U "$DB_USER" \
    "$DB_NAME"

echo "Creating new database..."
PGPASSWORD="$DB_PASSWORD" createdb \
    -h "$DB_HOST" \
    -p "$DB_PORT" \
    -U "$DB_USER" \
    "$DB_NAME"

echo "Restoring database from backup..."
if [[ "$BACKUP_FILE" == *.gz ]]; then
    gunzip -c "$BACKUP_FILE" | PGPASSWORD="$DB_PASSWORD" pg_restore \
        -h "$DB_HOST" \
        -p "$DB_PORT" \
        -U "$DB_USER" \
        -d "$DB_NAME" \
        --verbose
else
    PGPASSWORD="$DB_PASSWORD" pg_restore \
        -h "$DB_HOST" \
        -p "$DB_PORT" \
        -U "$DB_USER" \
        -d "$DB_NAME" \
        --verbose \
        "$BACKUP_FILE"
fi

echo "Starting Kong..."
sudo systemctl start kong

echo "Verifying Kong status..."
sleep 5
curl -s http://localhost:8001/status | jq .

echo "Database restore completed successfully!"

7.3.2 配置恢复

#!/bin/bash
# restore-config.sh

set -e

BACKUP_FILE=$1
KONG_ADMIN_URL=${KONG_ADMIN_URL:-http://localhost:8001}

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: $0 <config_backup_file>"
    exit 1
fi

if [ ! -f "$BACKUP_FILE" ]; then
    echo "Backup file not found: $BACKUP_FILE"
    exit 1
fi

echo "WARNING: This will restore Kong configuration from backup."
echo "All current configuration will be replaced!"
read -p "Are you sure you want to continue? (yes/no): " confirm

if [ "$confirm" != "yes" ]; then
    echo "Restore cancelled."
    exit 0
fi

# 解压备份文件
TEMP_DIR=$(mktemp -d)
tar -xzf "$BACKUP_FILE" -C "$TEMP_DIR"

echo "Restoring Kong configuration..."

# 恢复完整配置(DB-less模式)
if [ -f "$TEMP_DIR/kong_config_*.json" ]; then
    CONFIG_FILE=$(ls "$TEMP_DIR"/kong_config_*.json | head -1)
    curl -X POST "$KONG_ADMIN_URL/config" \
        -F "config=@$CONFIG_FILE"
fi

# 清理临时目录
rm -rf "$TEMP_DIR"

echo "Configuration restore completed successfully!"

7.3.3 灾难恢复计划

#!/bin/bash
# disaster-recovery.sh

set -e

echo "Kong Disaster Recovery Plan"
echo "==========================="

# 1. 检查系统状态
echo "1. Checking system status..."
if systemctl is-active --quiet kong; then
    echo "   Kong is running"
else
    echo "   Kong is not running"
fi

if systemctl is-active --quiet postgresql; then
    echo "   PostgreSQL is running"
else
    echo "   PostgreSQL is not running - CRITICAL!"
fi

# 2. 检查数据库连接
echo "2. Checking database connectivity..."
if PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "SELECT 1;" > /dev/null 2>&1; then
    echo "   Database connection: OK"
else
    echo "   Database connection: FAILED - CRITICAL!"
fi

# 3. 检查Kong Admin API
echo "3. Checking Kong Admin API..."
if curl -s "$KONG_ADMIN_URL/status" > /dev/null; then
    echo "   Admin API: OK"
else
    echo "   Admin API: FAILED"
fi

# 4. 检查Kong Proxy
echo "4. Checking Kong Proxy..."
if curl -s "http://localhost:8000" > /dev/null; then
    echo "   Proxy: OK"
else
    echo "   Proxy: FAILED"
fi

# 5. 自动恢复流程
echo "5. Starting automatic recovery..."

# 重启服务
echo "   Restarting PostgreSQL..."
sudo systemctl restart postgresql
sleep 10

echo "   Restarting Kong..."
sudo systemctl restart kong
sleep 15

# 验证恢复
echo "6. Verifying recovery..."
if curl -s "$KONG_ADMIN_URL/status" | jq -r '.database.reachable' | grep -q "true"; then
    echo "   Recovery successful!"
else
    echo "   Recovery failed - manual intervention required"
    exit 1
fi

8. 安全运维

8.1 安全配置

8.1.1 网络安全

# 防火墙配置
#!/bin/bash
# setup-firewall.sh

# 清除现有规则
sudo iptables -F
sudo iptables -X
sudo iptables -t nat -F
sudo iptables -t nat -X

# 设置默认策略
sudo iptables -P INPUT DROP
sudo iptables -P FORWARD DROP
sudo iptables -P OUTPUT ACCEPT

# 允许本地回环
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A OUTPUT -o lo -j ACCEPT

# 允许已建立的连接
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# 允许SSH(根据实际端口调整)
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# 允许Kong代理端口(公网)
sudo iptables -A INPUT -p tcp --dport 8000 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8443 -j ACCEPT

# 允许Kong Admin API(仅内网)
sudo iptables -A INPUT -p tcp -s 10.0.0.0/8 --dport 8001 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 172.16.0.0/12 --dport 8001 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 192.168.0.0/16 --dport 8001 -j ACCEPT

# 允许数据库端口(仅内网)
sudo iptables -A INPUT -p tcp -s 10.0.0.0/8 --dport 5432 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 172.16.0.0/12 --dport 5432 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 192.168.0.0/16 --dport 5432 -j ACCEPT

# 保存规则
sudo iptables-save > /etc/iptables/rules.v4

echo "Firewall rules configured successfully"

8.1.2 SSL/TLS配置

# 生成SSL证书
#!/bin/bash
# generate-ssl-certs.sh

SSL_DIR="/etc/kong/ssl"
DOMAIN="api.company.com"

# 创建SSL目录
sudo mkdir -p "$SSL_DIR"
cd "$SSL_DIR"

# 生成私钥
sudo openssl genrsa -out kong.key 2048

# 生成证书签名请求
sudo openssl req -new -key kong.key -out kong.csr -subj "/C=US/ST=CA/L=San Francisco/O=Company/CN=$DOMAIN"

# 生成自签名证书(生产环境应使用CA签名)
sudo openssl x509 -req -days 365 -in kong.csr -signkey kong.key -out kong.crt

# 设置权限
sudo chown kong:kong kong.key kong.crt
sudo chmod 600 kong.key
sudo chmod 644 kong.crt

# 验证证书
openssl x509 -in kong.crt -text -noout

echo "SSL certificates generated successfully"

8.1.3 访问控制

# Kong RBAC配置(企业版)
rbac:
  enabled: true
  admin_gui_auth: basic-auth
  admin_gui_auth_conf: |
    {
      "hide_credentials": true
    }
  session_conf: |
    {
      "secret": "your-session-secret",
      "cookie_secure": true,
      "cookie_httponly": true,
      "cookie_samesite": "Strict"
    }

# 创建管理员角色
roles:
  - name: admin
    permissions:
      - "*"
  - name: developer
    permissions:
      - "services:read"
      - "routes:read"
      - "plugins:read"
  - name: operator
    permissions:
      - "services:*"
      - "routes:*"
      - "upstreams:*"
      - "targets:*"

8.2 更新与补丁

8.2.1 更新流程

#!/bin/bash
# update-kong.sh

set -e

CURRENT_VERSION=$(kong version | grep -oP 'Kong\s+\K[0-9.]+')
TARGET_VERSION=${1:-latest}

echo "Current Kong version: $CURRENT_VERSION"
echo "Target version: $TARGET_VERSION"

# 1. 备份当前配置和数据
echo "1. Creating backup..."
/opt/kong/scripts/backup-postgres.sh
/opt/kong/scripts/backup-config.sh

# 2. 下载新版本
echo "2. Downloading Kong $TARGET_VERSION..."
wget "https://download.konghq.com/gateway-2.x-ubuntu-$(lsb_release -cs)/pool/all/k/kong/kong_${TARGET_VERSION}_amd64.deb"

# 3. 停止Kong服务
echo "3. Stopping Kong..."
sudo systemctl stop kong

# 4. 安装新版本
echo "4. Installing Kong $TARGET_VERSION..."
sudo dpkg -i "kong_${TARGET_VERSION}_amd64.deb"

# 5. 运行数据库迁移
echo "5. Running database migrations..."
sudo kong migrations up -c /etc/kong/kong.conf

# 6. 启动Kong
echo "6. Starting Kong..."
sudo systemctl start kong

# 7. 验证更新
echo "7. Verifying update..."
sleep 10
NEW_VERSION=$(kong version | grep -oP 'Kong\s+\K[0-9.]+')
echo "New Kong version: $NEW_VERSION"

if curl -s http://localhost:8001/status | jq -r '.database.reachable' | grep -q "true"; then
    echo "Update completed successfully!"
else
    echo "Update failed - rolling back..."
    # 回滚逻辑
    exit 1
fi

# 8. 清理
rm "kong_${TARGET_VERSION}_amd64.deb"

8.2.2 回滚流程

#!/bin/bash
# rollback-kong.sh

set -e

BACKUP_VERSION=$1

if [ -z "$BACKUP_VERSION" ]; then
    echo "Usage: $0 <backup_version>"
    echo "Available backups:"
    ls /backup/kong/kong_backup_*.sql.gz
    exit 1
fi

echo "WARNING: This will rollback Kong to a previous version."
echo "All changes since the backup will be lost!"
read -p "Are you sure you want to continue? (yes/no): " confirm

if [ "$confirm" != "yes" ]; then
    echo "Rollback cancelled."
    exit 0
fi

echo "Starting rollback process..."

# 1. 停止Kong
echo "1. Stopping Kong..."
sudo systemctl stop kong

# 2. 恢复数据库
echo "2. Restoring database..."
/opt/kong/scripts/restore-postgres.sh "/backup/kong/$BACKUP_VERSION"

# 3. 降级Kong版本(如果需要)
echo "3. Downgrading Kong version..."
# 这里需要根据具体情况安装旧版本

# 4. 启动Kong
echo "4. Starting Kong..."
sudo systemctl start kong

# 5. 验证回滚
echo "5. Verifying rollback..."
sleep 10
if curl -s http://localhost:8001/status | jq -r '.database.reachable' | grep -q "true"; then
    echo "Rollback completed successfully!"
else
    echo "Rollback failed!"
    exit 1
fi

8.3 故障排除

8.3.1 常见问题诊断

#!/bin/bash
# diagnose-kong.sh

echo "Kong Diagnostic Tool"
echo "==================="

# 1. 检查Kong进程
echo "1. Checking Kong processes..."
ps aux | grep kong | grep -v grep

# 2. 检查端口占用
echo "2. Checking port usage..."
netstat -tlnp | grep -E ':(8000|8001|8443|8444)'

# 3. 检查Kong配置
echo "3. Checking Kong configuration..."
kong config -c /etc/kong/kong.conf

# 4. 检查数据库连接
echo "4. Checking database connection..."
if PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "SELECT version();" > /dev/null 2>&1; then
    echo "   Database connection: OK"
else
    echo "   Database connection: FAILED"
fi

# 5. 检查Kong状态
echo "5. Checking Kong status..."
curl -s http://localhost:8001/status | jq .

# 6. 检查最近的错误日志
echo "6. Recent error logs..."
tail -20 /var/log/kong/error.log

# 7. 检查系统资源
echo "7. System resources..."
echo "   Memory usage:"
free -h
echo "   Disk usage:"
df -h
echo "   CPU load:"
uptime

# 8. 检查网络连接
echo "8. Network connectivity..."
echo "   Testing proxy port:"
curl -I http://localhost:8000 2>/dev/null | head -1 || echo "   Proxy port not responding"
echo "   Testing admin port:"
curl -I http://localhost:8001 2>/dev/null | head -1 || echo "   Admin port not responding"

8.3.2 性能问题排查

#!/bin/bash
# performance-check.sh

echo "Kong Performance Check"
echo "====================="

# 1. 检查Kong worker进程
echo "1. Kong worker processes:"
ps aux | grep 'nginx: worker process' | wc -l

# 2. 检查内存使用
echo "2. Memory usage by Kong processes:"
ps aux | grep kong | awk '{sum+=$6} END {print "Total RSS: " sum/1024 " MB"}'

# 3. 检查连接数
echo "3. Active connections:"
netstat -an | grep :8000 | grep ESTABLISHED | wc -l

# 4. 检查数据库连接池
echo "4. Database connections:"
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "
    SELECT count(*) as active_connections,
           max_conn,
           max_conn - count(*) as available_connections
    FROM pg_stat_activity
    CROSS JOIN (SELECT setting::int as max_conn FROM pg_settings WHERE name = 'max_connections') mc;
"

# 5. 检查响应时间
echo "5. Response time test:"
for i in {1..5}; do
    curl -w "Response time: %{time_total}s\n" -o /dev/null -s http://localhost:8000
done

# 6. 检查错误率
echo "6. Recent error rate:"
ERRORS=$(tail -1000 /var/log/kong/access.log | grep -c ' 5[0-9][0-9] ')
TOTAL=$(tail -1000 /var/log/kong/access.log | wc -l)
if [ $TOTAL -gt 0 ]; then
    ERROR_RATE=$(echo "scale=2; $ERRORS * 100 / $TOTAL" | bc)
    echo "   Error rate: $ERROR_RATE% ($ERRORS/$TOTAL)"
else
    echo "   No recent requests found"
fi

9. 运维最佳实践

9.1 部署策略

9.1.1 蓝绿部署

#!/bin/bash
# blue-green-deployment.sh

set -e

CURRENT_ENV=${1:-blue}
TARGET_ENV=${2:-green}
LB_CONFIG="/etc/nginx/conf.d/kong-lb.conf"

echo "Starting blue-green deployment..."
echo "Current: $CURRENT_ENV, Target: $TARGET_ENV"

# 1. 部署到目标环境
echo "1. Deploying to $TARGET_ENV environment..."
docker-compose -f docker-compose-$TARGET_ENV.yml up -d

# 2. 等待服务启动
echo "2. Waiting for $TARGET_ENV to be ready..."
for i in {1..30}; do
    if curl -s http://kong-$TARGET_ENV:8001/status > /dev/null; then
        echo "   $TARGET_ENV is ready"
        break
    fi
    sleep 10
done

# 3. 健康检查
echo "3. Running health checks on $TARGET_ENV..."
if ! curl -s http://kong-$TARGET_ENV:8001/status | jq -r '.database.reachable' | grep -q "true"; then
    echo "   Health check failed for $TARGET_ENV"
    exit 1
fi

# 4. 切换负载均衡器
echo "4. Switching load balancer to $TARGET_ENV..."
sed -i "s/kong-$CURRENT_ENV/kong-$TARGET_ENV/g" "$LB_CONFIG"
nginx -s reload

# 5. 验证切换
echo "5. Verifying switch..."
sleep 5
if curl -s http://localhost/status > /dev/null; then
    echo "   Switch successful"
else
    echo "   Switch failed, rolling back..."
    sed -i "s/kong-$TARGET_ENV/kong-$CURRENT_ENV/g" "$LB_CONFIG"
    nginx -s reload
    exit 1
fi

# 6. 停止旧环境
echo "6. Stopping $CURRENT_ENV environment..."
docker-compose -f docker-compose-$CURRENT_ENV.yml down

echo "Blue-green deployment completed successfully!"

9.1.2 金丝雀部署

#!/bin/bash
# canary-deployment.sh

set -e

CANARY_PERCENTAGE=${1:-10}
NEW_VERSION=$2

if [ -z "$NEW_VERSION" ]; then
    echo "Usage: $0 <canary_percentage> <new_version>"
    exit 1
fi

echo "Starting canary deployment..."
echo "Canary percentage: $CANARY_PERCENTAGE%"
echo "New version: $NEW_VERSION"

# 1. 部署金丝雀版本
echo "1. Deploying canary version..."
docker run -d --name kong-canary-$NEW_VERSION \
    --network kong-net \
    -e "KONG_DATABASE=postgres" \
    -e "KONG_PG_HOST=kong-database" \
    kong:$NEW_VERSION

# 2. 等待服务启动
echo "2. Waiting for canary to be ready..."
sleep 30

# 3. 配置流量分割
echo "3. Configuring traffic split..."
curl -X POST http://localhost:8001/upstreams \
    -d "name=kong-cluster" \
    -d "algorithm=round-robin"

# 添加生产目标(90%流量)
for i in $(seq 1 $((100 - CANARY_PERCENTAGE))); do
    curl -X POST http://localhost:8001/upstreams/kong-cluster/targets \
        -d "target=kong-prod:8000" \
        -d "weight=1"
done

# 添加金丝雀目标(10%流量)
for i in $(seq 1 $CANARY_PERCENTAGE); do
    curl -X POST http://localhost:8001/upstreams/kong-cluster/targets \
        -d "target=kong-canary-$NEW_VERSION:8000" \
        -d "weight=1"
done

# 4. 监控金丝雀指标
echo "4. Monitoring canary metrics..."
for i in {1..60}; do
    ERROR_RATE=$(curl -s http://prometheus:9090/api/v1/query?query='rate(kong_http_status{code=~"5..",instance="kong-canary-'$NEW_VERSION':8001"}[5m])' | jq -r '.data.result[0].value[1] // 0')
    
    if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
        echo "   High error rate detected: $ERROR_RATE"
        echo "   Rolling back canary deployment..."
        docker stop kong-canary-$NEW_VERSION
        docker rm kong-canary-$NEW_VERSION
        exit 1
    fi
    
    sleep 60
done

echo "5. Canary deployment successful, promoting to production..."
# 这里可以添加完全切换到新版本的逻辑

9.2 监控告警

9.2.1 监控指标

# 关键监控指标
metrics:
  availability:
    - kong_up
    - kong_database_reachable
    - kong_nginx_http_current_connections
  
  performance:
    - kong_latency_bucket
    - kong_bandwidth_bytes
    - kong_http_requests_total
  
  errors:
    - kong_http_status{code=~"4.."}
    - kong_http_status{code=~"5.."}
    - kong_nginx_http_total_requests
  
  resources:
    - kong_memory_workers_lua_vms_bytes
    - kong_nginx_connections_active
    - kong_nginx_connections_reading
    - kong_nginx_connections_writing
    - kong_nginx_connections_waiting

9.2.2 告警阈值

# 告警阈值配置
alerts:
  critical:
    - metric: kong_up
      threshold: 0
      duration: 1m
      description: "Kong service is down"
    
    - metric: kong_database_reachable
      threshold: 0
      duration: 30s
      description: "Kong cannot connect to database"
    
    - metric: rate(kong_http_status{code=~"5.."}[5m])
      threshold: 0.05
      duration: 2m
      description: "High 5xx error rate"
  
  warning:
    - metric: histogram_quantile(0.95, rate(kong_latency_bucket[5m]))
      threshold: 1000
      duration: 5m
      description: "High response latency"
    
    - metric: kong_memory_workers_lua_vms_bytes
      threshold: 1073741824  # 1GB
      duration: 10m
      description: "High memory usage"

9.3 容量规划

9.3.1 性能基准测试

#!/bin/bash
# performance-benchmark.sh

set -e

TEST_URL="http://localhost:8000/api/test"
CONCURRENCY_LEVELS=(1 10 50 100 200 500)
DURATION=60

echo "Kong Performance Benchmark"
echo "========================="

# 准备测试环境
echo "Setting up test environment..."
curl -X POST http://localhost:8001/services \
    -d "name=test-service" \
    -d "url=http://httpbin.org"

curl -X POST http://localhost:8001/services/test-service/routes \
    -d "paths[]=/api/test"

# 运行基准测试
for concurrency in "${CONCURRENCY_LEVELS[@]}"; do
    echo "Testing with $concurrency concurrent connections..."
    
    ab -n $((concurrency * 100)) -c $concurrency -t $DURATION "$TEST_URL" > "benchmark_c${concurrency}.txt"
    
    # 提取关键指标
    RPS=$(grep "Requests per second" "benchmark_c${concurrency}.txt" | awk '{print $4}')
    LATENCY_MEAN=$(grep "Time per request" "benchmark_c${concurrency}.txt" | head -1 | awk '{print $4}')
    LATENCY_95=$(grep "95%" "benchmark_c${concurrency}.txt" | awk '{print $2}')
    
    echo "   RPS: $RPS"
    echo "   Mean Latency: ${LATENCY_MEAN}ms"
    echo "   95th Percentile: ${LATENCY_95}ms"
    echo ""
done

# 生成报告
echo "Generating performance report..."
cat > performance_report.md << EOF
# Kong Performance Benchmark Report

## Test Configuration
- Duration: ${DURATION}s per test
- Target: $TEST_URL
- Date: $(date)

## Results

| Concurrency | RPS | Mean Latency (ms) | 95th Percentile (ms) |
|-------------|-----|-------------------|----------------------|
EOF

for concurrency in "${CONCURRENCY_LEVELS[@]}"; do
    RPS=$(grep "Requests per second" "benchmark_c${concurrency}.txt" | awk '{print $4}')
    LATENCY_MEAN=$(grep "Time per request" "benchmark_c${concurrency}.txt" | head -1 | awk '{print $4}')
    LATENCY_95=$(grep "95%" "benchmark_c${concurrency}.txt" | awk '{print $2}')
    
    echo "| $concurrency | $RPS | $LATENCY_MEAN | $LATENCY_95 |" >> performance_report.md
done

echo "Performance benchmark completed. Report saved to performance_report.md"

9.3.2 容量规划计算

#!/usr/bin/env python3
# capacity-planning.py

import json
import sys
from datetime import datetime, timedelta

def calculate_capacity(current_rps, target_rps, current_instances, cpu_threshold=70, memory_threshold=80):
    """
    计算所需的Kong实例数量
    """
    # 基于RPS的计算
    rps_ratio = target_rps / current_rps
    required_instances_rps = int(current_instances * rps_ratio * 1.2)  # 20%缓冲
    
    # 基于资源使用率的计算
    cpu_factor = 100 / cpu_threshold
    memory_factor = 100 / memory_threshold
    
    required_instances_cpu = int(current_instances * cpu_factor)
    required_instances_memory = int(current_instances * memory_factor)
    
    # 取最大值
    recommended_instances = max(required_instances_rps, required_instances_cpu, required_instances_memory)
    
    return {
        'current_instances': current_instances,
        'target_rps': target_rps,
        'recommended_instances': recommended_instances,
        'scaling_factor': recommended_instances / current_instances,
        'calculations': {
            'rps_based': required_instances_rps,
            'cpu_based': required_instances_cpu,
            'memory_based': required_instances_memory
        }
    }

def estimate_costs(instances, instance_cost_per_hour=0.1, hours_per_month=730):
    """
    估算月度成本
    """
    monthly_cost = instances * instance_cost_per_hour * hours_per_month
    return {
        'instances': instances,
        'cost_per_hour': instances * instance_cost_per_hour,
        'monthly_cost': monthly_cost
    }

def main():
    if len(sys.argv) != 4:
        print("Usage: python3 capacity-planning.py <current_rps> <target_rps> <current_instances>")
        sys.exit(1)
    
    current_rps = float(sys.argv[1])
    target_rps = float(sys.argv[2])
    current_instances = int(sys.argv[3])
    
    # 计算容量需求
    capacity = calculate_capacity(current_rps, target_rps, current_instances)
    
    # 估算成本
    current_cost = estimate_costs(current_instances)
    recommended_cost = estimate_costs(capacity['recommended_instances'])
    
    # 生成报告
    report = {
        'timestamp': datetime.now().isoformat(),
        'capacity_planning': capacity,
        'cost_estimation': {
            'current': current_cost,
            'recommended': recommended_cost,
            'cost_increase': recommended_cost['monthly_cost'] - current_cost['monthly_cost']
        },
        'recommendations': [
            f"Scale from {current_instances} to {capacity['recommended_instances']} instances",
            f"Expected cost increase: ${recommended_cost['monthly_cost'] - current_cost['monthly_cost']:.2f}/month",
            f"Scaling factor: {capacity['scaling_factor']:.2f}x"
        ]
    }
    
    print(json.dumps(report, indent=2))

if __name__ == '__main__':
    main()

10. 总结

Kong的部署与运维是一个复杂但重要的过程,需要考虑多个方面:

10.1 关键要点

  1. 架构选择: 根据业务需求选择合适的部署模式(DB模式、DB-less模式、混合模式)
  2. 高可用性: 通过多节点集群、数据库高可用、负载均衡确保服务可用性
  3. 安全性: 实施网络安全、访问控制、SSL/TLS加密等安全措施
  4. 监控告警: 建立完善的监控体系,及时发现和处理问题
  5. 备份恢复: 定期备份配置和数据,制定灾难恢复计划
  6. 性能优化: 通过容量规划、性能调优确保系统性能

10.2 最佳实践

  1. 环境分离: 开发、测试、生产环境严格分离
  2. 配置管理: 使用版本控制管理配置,实现配置即代码
  3. 自动化部署: 采用CI/CD流水线,实现自动化部署
  4. 渐进式发布: 使用蓝绿部署、金丝雀部署等策略降低发布风险
  5. 持续监控: 建立全方位监控,包括业务指标、技术指标、用户体验指标

10.3 运维建议

  1. 文档化: 维护详细的运维文档和操作手册
  2. 培训: 定期进行运维培训,提高团队技能
  3. 演练: 定期进行故障演练,验证应急响应能力
  4. 优化: 持续优化配置和流程,提高运维效率
  5. 创新: 关注新技术和最佳实践,不断改进运维体系

通过遵循这些原则和实践,可以构建一个稳定、安全、高效的Kong API网关运维体系。