学习目标

通过本章学习,您将掌握: - Elasticsearch的多种安装方式 - 集群配置和优化 - 安全配置和认证设置 - 监控和日志配置 - 生产环境部署最佳实践

1. 系统要求

1.1 硬件要求

组件 最低要求 推荐配置 生产环境
CPU 2核 4核+ 8核+
内存 4GB 8GB+ 32GB+
存储 10GB 100GB+ SSD 500GB+
网络 100Mbps 1Gbps 10Gbps

1.2 软件要求

# Java版本要求
Elasticsearch 8.x: Java 17+
Elasticsearch 7.x: Java 8/11
Elasticsearch 6.x: Java 8

# 操作系统支持
- Linux (推荐)
- macOS
- Windows

1.3 系统配置

Linux系统优化

# 1. 增加文件描述符限制
echo "elasticsearch soft nofile 65536" >> /etc/security/limits.conf
echo "elasticsearch hard nofile 65536" >> /etc/security/limits.conf

# 2. 增加内存映射限制
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
sysctl -p

# 3. 禁用交换分区
echo "vm.swappiness=1" >> /etc/sysctl.conf

# 4. 设置进程数限制
echo "elasticsearch soft nproc 4096" >> /etc/security/limits.conf
echo "elasticsearch hard nproc 4096" >> /etc/security/limits.conf

2. 安装方式

2.1 下载安装包

# 下载Elasticsearch 8.x
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz

# 解压
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
cd elasticsearch-8.11.0

2.2 包管理器安装

Ubuntu/Debian

# 添加Elastic仓库
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# 安装
sudo apt update
sudo apt install elasticsearch

# 启动服务
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

CentOS/RHEL

# 添加仓库
cat > /etc/yum.repos.d/elasticsearch.repo << EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
EOF

# 安装
sudo yum install --enablerepo=elasticsearch elasticsearch

# 启动服务
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

2.3 Docker安装

单节点部署

# 拉取镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.0

# 运行容器
docker run -d \
  --name elasticsearch \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
  docker.elastic.co/elasticsearch/elasticsearch:8.11.0

Docker Compose部署

# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    container_name: elasticsearch
    environment:
      - cluster.name=docker-cluster
      - node.name=es01
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - xpack.security.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elastic

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    container_name: kibana
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    networks:
      - elastic
    depends_on:
      - elasticsearch

volumes:
  esdata:
    driver: local

networks:
  elastic:
    driver: bridge

2.4 Kubernetes部署

使用Helm Chart

# 添加Elastic Helm仓库
helm repo add elastic https://helm.elastic.co
helm repo update

# 创建命名空间
kubectl create namespace elastic-system

# 安装Elasticsearch
helm install elasticsearch elastic/elasticsearch \
  --namespace elastic-system \
  --set replicas=3 \
  --set minimumMasterNodes=2 \
  --set resources.requests.cpu=1000m \
  --set resources.requests.memory=2Gi \
  --set resources.limits.cpu=2000m \
  --set resources.limits.memory=4Gi

自定义YAML部署

# elasticsearch-cluster.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: elastic-system
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
        ports:
        - containerPort: 9200
          name: rest
        - containerPort: 9300
          name: inter-node
        env:
        - name: cluster.name
          value: k8s-logs
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: discovery.seed_hosts
          value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
        - name: cluster.initial_master_nodes
          value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
        resources:
          limits:
            cpu: 2000m
            memory: 4Gi
          requests:
            cpu: 1000m
            memory: 2Gi
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: elastic-system
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
  - port: 9200
    name: rest
  - port: 9300
    name: inter-node

3. 配置文件详解

3.1 主配置文件 elasticsearch.yml

# ======================== Elasticsearch Configuration =========================

# ---------------------------------- Cluster -----------------------------------
# 集群名称
cluster.name: my-application

# ------------------------------------ Node ------------------------------------
# 节点名称
node.name: node-1

# 节点角色
node.roles: [ master, data, ingest, ml, remote_cluster_client ]

# ----------------------------------- Paths ------------------------------------
# 数据目录
path.data: /var/lib/elasticsearch

# 日志目录
path.logs: /var/log/elasticsearch

# ----------------------------------- Memory -----------------------------------
# 锁定内存,防止交换
bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------
# 绑定地址
network.host: 0.0.0.0

# HTTP端口
http.port: 9200

# 传输端口
transport.port: 9300

# --------------------------------- Discovery ----------------------------------
# 种子节点
discovery.seed_hosts: ["host1", "host2"]

# 初始主节点
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# ---------------------------------- Various -----------------------------------
# 允许跨域访问
http.cors.enabled: true
http.cors.allow-origin: "*"

# 最大内容长度
http.max_content_length: 100mb

# ---------------------------------- Security ----------------------------------
# 启用安全功能
xpack.security.enabled: true

# 启用HTTPS
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12

# 启用节点间加密
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/transport.p12
xpack.security.transport.ssl.truststore.path: certs/transport.p12

3.2 JVM配置 jvm.options

# JVM堆大小设置
-Xms2g
-Xmx2g

# 垃圾收集器设置
-XX:+UseG1GC
-XX:G1HeapRegionSize=32m
-XX:+UseG1OldGCMixedGCCount=16
-XX:+UseG1MixedGCLiveThresholdPercent=90

# GC日志
-Xlog:gc*,gc+age=trace,safepoint:gc.log:utctime,pid,tid,level
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32
-XX:GCLogFileSize=64m

# 堆转储
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch

# JVM错误日志
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

# 禁用偏向锁
-XX:-UseBiasedLocking

# 确保UTF-8编码
-Dfile.encoding=UTF-8

# 使用服务器模式JVM
-server

# 减少JVM启动时间
-XX:+AlwaysPreTouch

# 禁用交换
-Djava.awt.headless=true

# 强制服务器VM
-server

# 明确设置堆大小
-Xms${heap.min}
-Xmx${heap.max}

# 设置新生代大小
-XX:NewRatio=3

# 设置永久代大小(Java 8)
-XX:MetaspaceSize=256m
-XX:MaxMetaspaceSize=256m

3.3 日志配置 log4j2.properties

# 根日志级别
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.rolling.ref = rolling

# 控制台输出
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n

# 滚动文件输出
appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.log.gz
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size = 128MB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.fileIndex = nomax
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path}
appender.rolling.strategy.action.condition.type = IfFileName
appender.rolling.strategy.action.condition.glob = ${sys:es.logs.cluster_name}-*
appender.rolling.strategy.action.condition.nested.type = IfAccumulatedFileSize
appender.rolling.strategy.action.condition.nested.exceeds = 2GB

# 慢查询日志
logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false

appender.index_search_slowlog_rolling.type = RollingFile
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}_index_search_slowlog.log
appender.index_search_slowlog_rolling.layout.type = PatternLayout
appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] [%node_name]%marker %m%n

4. 集群配置

4.1 三节点集群配置

节点1配置

# elasticsearch.yml for node-1
cluster.name: production-cluster
node.name: node-1
node.roles: [ master, data, ingest ]

network.host: 192.168.1.10
http.port: 9200
transport.port: 9300

discovery.seed_hosts: ["192.168.1.10:9300", "192.168.1.11:9300", "192.168.1.12:9300"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true

# 集群设置
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

节点2配置

# elasticsearch.yml for node-2
cluster.name: production-cluster
node.name: node-2
node.roles: [ master, data, ingest ]

network.host: 192.168.1.11
http.port: 9200
transport.port: 9300

discovery.seed_hosts: ["192.168.1.10:9300", "192.168.1.11:9300", "192.168.1.12:9300"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true

节点3配置

# elasticsearch.yml for node-3
cluster.name: production-cluster
node.name: node-3
node.roles: [ master, data, ingest ]

network.host: 192.168.1.12
http.port: 9200
transport.port: 9300

discovery.seed_hosts: ["192.168.1.10:9300", "192.168.1.11:9300", "192.168.1.12:9300"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true

4.2 专用角色节点配置

专用Master节点

# master-node.yml
cluster.name: production-cluster
node.name: master-1
node.roles: [ master ]

network.host: 192.168.1.10
http.port: 9200
transport.port: 9300

# Master节点不存储数据,资源配置较低
bootstrap.memory_lock: true

专用Data节点

# data-node.yml
cluster.name: production-cluster
node.name: data-1
node.roles: [ data, ingest ]

network.host: 192.168.1.20
http.port: 9200
transport.port: 9300

# Data节点需要更多存储和内存
path.data: ["/data1/elasticsearch", "/data2/elasticsearch"]
bootstrap.memory_lock: true

协调节点

# coordinating-node.yml
cluster.name: production-cluster
node.name: coordinating-1
node.roles: [ ]

network.host: 192.168.1.30
http.port: 9200
transport.port: 9300

# 协调节点主要处理请求路由
bootstrap.memory_lock: true

5. 安全配置

5.1 启用X-Pack Security

# 生成CA证书
./bin/elasticsearch-certutil ca

# 生成节点证书
./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

# 生成HTTP证书
./bin/elasticsearch-certutil http

5.2 配置TLS/SSL

# elasticsearch.yml
xpack.security.enabled: true

# 传输层加密
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

# HTTP层加密
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/elastic-certificates.p12

5.3 用户和角色管理

# 设置内置用户密码
./bin/elasticsearch-setup-passwords interactive

# 创建自定义用户
POST /_security/user/john
{
  "password" : "password123",
  "roles" : [ "admin", "other_role1" ],
  "full_name" : "John Doe",
  "email" : "john.doe@example.com",
  "metadata" : {
    "intelligence" : 7
  }
}

# 创建自定义角色
POST /_security/role/my_admin_role
{
  "cluster": ["all"],
  "indices": [
    {
      "names": [ "index1", "index2" ],
      "privileges": ["all"]
    }
  ]
}

5.4 API密钥认证

# 创建API密钥
POST /_security/api_key
{
  "name": "my-api-key",
  "expiration": "1d",
  "role_descriptors": {
    "role-a": {
      "cluster": ["all"],
      "index": [
        {
          "names": ["index-a*"],
          "privileges": ["read"]
        }
      ]
    }
  }
}

# 使用API密钥
curl -H "Authorization: ApiKey <base64-encoded-api-key>" \
     http://localhost:9200/_cluster/health

6. 监控配置

6.1 启用监控

# elasticsearch.yml
xpack.monitoring.enabled: true
xpack.monitoring.collection.enabled: true
xpack.monitoring.collection.interval: 10s

6.2 Metricbeat监控

# metricbeat.yml
metricbeat.modules:
- module: elasticsearch
  metricsets:
    - node
    - node_stats
    - cluster_stats
    - index
    - index_recovery
    - index_summary
    - shard
  period: 10s
  hosts: ["http://localhost:9200"]
  username: "elastic"
  password: "changeme"

output.elasticsearch:
  hosts: ["localhost:9200"]
  username: "elastic"
  password: "changeme"

setup.kibana:
  host: "localhost:5601"

6.3 自定义监控脚本

#!/usr/bin/env python3
# elasticsearch_monitor.py

import requests
import json
import time
import logging
from datetime import datetime

class ElasticsearchMonitor:
    def __init__(self, host="localhost", port=9200, username=None, password=None):
        self.base_url = f"http://{host}:{port}"
        self.auth = (username, password) if username and password else None
        self.logger = self._setup_logger()
    
    def _setup_logger(self):
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s - %(levelname)s - %(message)s',
            handlers=[
                logging.FileHandler('elasticsearch_monitor.log'),
                logging.StreamHandler()
            ]
        )
        return logging.getLogger(__name__)
    
    def get_cluster_health(self):
        """获取集群健康状态"""
        try:
            response = requests.get(
                f"{self.base_url}/_cluster/health",
                auth=self.auth,
                timeout=10
            )
            return response.json()
        except Exception as e:
            self.logger.error(f"Failed to get cluster health: {e}")
            return None
    
    def get_cluster_stats(self):
        """获取集群统计信息"""
        try:
            response = requests.get(
                f"{self.base_url}/_cluster/stats",
                auth=self.auth,
                timeout=10
            )
            return response.json()
        except Exception as e:
            self.logger.error(f"Failed to get cluster stats: {e}")
            return None
    
    def get_node_stats(self):
        """获取节点统计信息"""
        try:
            response = requests.get(
                f"{self.base_url}/_nodes/stats",
                auth=self.auth,
                timeout=10
            )
            return response.json()
        except Exception as e:
            self.logger.error(f"Failed to get node stats: {e}")
            return None
    
    def check_disk_usage(self, threshold=80):
        """检查磁盘使用率"""
        node_stats = self.get_node_stats()
        if not node_stats:
            return False
        
        alerts = []
        for node_id, node_data in node_stats['nodes'].items():
            node_name = node_data['name']
            fs_data = node_data.get('fs', {}).get('total', {})
            
            if fs_data:
                total_bytes = fs_data.get('total_in_bytes', 0)
                available_bytes = fs_data.get('available_in_bytes', 0)
                
                if total_bytes > 0:
                    used_percent = ((total_bytes - available_bytes) / total_bytes) * 100
                    
                    if used_percent > threshold:
                        alert = f"Node {node_name} disk usage: {used_percent:.1f}%"
                        alerts.append(alert)
                        self.logger.warning(alert)
        
        return alerts
    
    def check_memory_usage(self, threshold=85):
        """检查内存使用率"""
        node_stats = self.get_node_stats()
        if not node_stats:
            return False
        
        alerts = []
        for node_id, node_data in node_stats['nodes'].items():
            node_name = node_data['name']
            jvm_data = node_data.get('jvm', {}).get('mem', {})
            
            heap_used_percent = jvm_data.get('heap_used_percent', 0)
            
            if heap_used_percent > threshold:
                alert = f"Node {node_name} heap usage: {heap_used_percent}%"
                alerts.append(alert)
                self.logger.warning(alert)
        
        return alerts
    
    def monitor_loop(self, interval=60):
        """监控循环"""
        self.logger.info("Starting Elasticsearch monitoring...")
        
        while True:
            try:
                # 检查集群健康状态
                health = self.get_cluster_health()
                if health:
                    status = health.get('status', 'unknown')
                    self.logger.info(f"Cluster status: {status}")
                    
                    if status in ['yellow', 'red']:
                        self.logger.warning(f"Cluster status is {status}!")
                
                # 检查磁盘使用率
                disk_alerts = self.check_disk_usage()
                
                # 检查内存使用率
                memory_alerts = self.check_memory_usage()
                
                # 等待下次检查
                time.sleep(interval)
                
            except KeyboardInterrupt:
                self.logger.info("Monitoring stopped by user")
                break
            except Exception as e:
                self.logger.error(f"Monitoring error: {e}")
                time.sleep(interval)

if __name__ == "__main__":
    monitor = ElasticsearchMonitor(
        host="localhost",
        port=9200,
        username="elastic",
        password="changeme"
    )
    monitor.monitor_loop()

7. 性能调优

7.1 JVM调优

# jvm.options
# 设置堆大小为物理内存的50%,但不超过32GB
-Xms16g
-Xmx16g

# 使用G1垃圾收集器
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32m

# GC调优
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-XX:G1NewSizePercent=30
-XX:G1MaxNewSizePercent=40
-XX:+UseLargePages
-XX:+AlwaysPreTouch

7.2 索引性能调优

# elasticsearch.yml
# 增加刷新间隔
index.refresh_interval: 30s

# 增加缓冲区大小
indices.memory.index_buffer_size: 20%

# 增加批量请求大小
http.max_content_length: 500mb

# 线程池配置
thread_pool:
  write:
    size: 8
    queue_size: 1000
  search:
    size: 13
    queue_size: 1000

7.3 系统级优化

# 文件系统优化
# 使用SSD存储
# 禁用访问时间更新
mount -o noatime,nodiratime /dev/sdb1 /data

# 网络优化
echo 'net.core.rmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf

# 应用配置
sysctl -p

8. 备份和恢复

8.1 快照配置

# 注册快照仓库
PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/my_backup",
    "compress": true
  }
}

# 创建快照
PUT /_snapshot/my_backup/snapshot_1
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "admin",
    "taken_because": "backup before upgrade"
  }
}

8.2 自动备份脚本

#!/bin/bash
# elasticsearch_backup.sh

ES_HOST="localhost:9200"
BACKUP_REPO="my_backup"
DATE=$(date +%Y%m%d_%H%M%S)
SNAPSHOT_NAME="snapshot_${DATE}"

# 创建快照
curl -X PUT "${ES_HOST}/_snapshot/${BACKUP_REPO}/${SNAPSHOT_NAME}" \
     -H 'Content-Type: application/json' \
     -d '{
       "ignore_unavailable": true,
       "include_global_state": false
     }'

# 检查快照状态
while true; do
    STATUS=$(curl -s "${ES_HOST}/_snapshot/${BACKUP_REPO}/${SNAPSHOT_NAME}" | jq -r '.snapshots[0].state')
    if [ "$STATUS" = "SUCCESS" ]; then
        echo "Snapshot ${SNAPSHOT_NAME} completed successfully"
        break
    elif [ "$STATUS" = "FAILED" ]; then
        echo "Snapshot ${SNAPSHOT_NAME} failed"
        exit 1
    else
        echo "Snapshot in progress: $STATUS"
        sleep 30
    fi
done

# 清理旧快照(保留最近7天)
OLD_SNAPSHOTS=$(curl -s "${ES_HOST}/_snapshot/${BACKUP_REPO}/_all" | \
                jq -r '.snapshots[] | select(.start_time_in_millis < ('$(date -d "7 days ago" +%s)'000)) | .snapshot')

for snapshot in $OLD_SNAPSHOTS; do
    echo "Deleting old snapshot: $snapshot"
    curl -X DELETE "${ES_HOST}/_snapshot/${BACKUP_REPO}/${snapshot}"
done

9. 故障排查

9.1 常见问题诊断

# 检查集群状态
GET /_cluster/health?pretty

# 检查节点信息
GET /_nodes?pretty

# 检查分片分配
GET /_cat/shards?v

# 检查未分配分片
GET /_cluster/allocation/explain

# 检查任务状态
GET /_tasks?detailed=true&actions=*search*

9.2 性能分析

# 慢查询分析
GET /my_index/_search
{
  "profile": true,
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

# 热点线程分析
GET /_nodes/hot_threads

# 统计信息
GET /_stats
GET /_nodes/stats

本章总结

本章详细介绍了Elasticsearch的安装部署和环境配置:

  1. 安装方式:掌握了多种安装方法,包括包管理器、Docker、Kubernetes等
  2. 配置管理:学习了主要配置文件的详细设置
  3. 集群部署:了解了单节点和多节点集群的配置方法
  4. 安全配置:掌握了TLS/SSL、用户认证等安全设置
  5. 监控运维:学习了监控配置和故障排查方法

下一章我们将学习Elasticsearch的索引管理和映射配置,深入了解数据存储和检索的核心机制。

练习题

  1. 配置一个三节点的Elasticsearch集群
  2. 启用X-Pack Security并创建自定义用户和角色
  3. 编写一个监控脚本检查集群健康状态
  4. 配置自动快照备份策略