5.1 通知渠道概述

支持的通知渠道

Alertmanager 支持多种通知渠道,可以满足不同场景的需求: mermaid flowchart LR A[Alertmanager] --> B[Email] A --> C[Slack] A --> D[Webhook] A --> E[PagerDuty] A --> F[OpsGenie] A --> G[VictorOps] A --> H[WeChat] A --> I[Telegram] A --> J[Discord] A --> K[Microsoft Teams] A --> L[Pushover] A --> M[SNS] A --> N[WebEx]

通知渠道特性对比

| 通知渠道 | 实时性 | 可靠性 | 交互性 | 移动支持 | 企业集成 | 成本 | |———|——–|——–|——–|———-|———-|——| | Email | 中 | 高 | 低 | 中 | 高 | 低 | | Slack | 高 | 高 | 高 | 高 | 高 | 中 | | Webhook | 高 | 中 | 高 | - | 高 | 低 | | PagerDuty | 高 | 高 | 高 | 高 | 高 | 高 | | OpsGenie | 高 | 高 | 高 | 高 | 高 | 高 | | WeChat | 高 | 高 | 中 | 高 | 中 | 低 | | Telegram | 高 | 中 | 中 | 高 | 低 | 低 | | SMS | 高 | 高 | 低 | 高 | 中 | 中 |

5.2 邮件通知配置

基础邮件配置

“`yaml

全局 SMTP 配置

global: smtp_smarthost: ‘smtp.gmail.com:587’ smtp_from: ‘alerts@example.com’ smtp_auth_username: ‘alerts@example.com’ smtp_auth_password: ‘app-password’ smtp_require_tls: true smtp_hello: ‘alertmanager.example.com’

邮件接收器配置

receivers: - name: ‘email-team’ email_configs: - to: ‘team@example.com’ subject: ‘[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}’ body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }} {{ end }} “`

高级邮件配置

yaml receivers: - name: 'advanced-email' email_configs: # 基本配置 - to: 'alerts@example.com' from: 'alertmanager@company.com' # 覆盖全局配置 subject: | [{{ .Status | toUpper }}] {{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts) # 纯文本邮件正文 body: | {{ if eq .Status "firing" }} 🚨 ALERT FIRING {{ else }} ✅ ALERT RESOLVED {{ end }} Alert Group: {{ .GroupLabels.alertname }} Cluster: {{ .GroupLabels.cluster | default "Unknown" }} Environment: {{ .GroupLabels.environment | default "Unknown" }} Severity: {{ .GroupLabels.severity | default "Unknown" }} Total Alerts: {{ len .Alerts }} Firing: {{ len .Alerts.Firing }} Resolved: {{ len .Alerts.Resolved }} {{ range .Alerts }} ===================================== Alert: {{ .Labels.alertname }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} Status: {{ .Status }} Summary: {{ .Annotations.summary }} Description: {{ .Annotations.description }} {{ if .Annotations.runbook_url }}Runbook: {{ .Annotations.runbook_url }}{{ end }} Started: {{ .StartsAt.Format "2006-01-02 15:04:05 MST" }} {{ if .EndsAt }}Ended: {{ .EndsAt.Format "2006-01-02 15:04:05 MST" }}{{ end }} Labels: {{ range .Labels.SortedPairs }} {{ .Name }}: {{ .Value }} {{ end }} Annotations: {{ range .Annotations.SortedPairs }} {{ .Name }}: {{ .Value }} {{ end }} {{ end }} View in Alertmanager: {{ .ExternalURL }} # HTML 邮件正文 html: | <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Alert Notification</title> <style> body { font-family: Arial, sans-serif; margin: 20px; } .header { background-color: {{ if eq .Status "firing" }}#d32f2f{{ else }}#388e3c{{ end }}; color: white; padding: 15px; border-radius: 5px; } .alert-group { margin: 20px 0; } .alert-item { border: 1px solid #ddd; margin: 10px 0; padding: 15px; border-radius: 5px; } .alert-critical { border-left: 5px solid #d32f2f; } .alert-warning { border-left: 5px solid #f57c00; } .alert-info { border-left: 5px solid #1976d2; } .labels { background-color: #f5f5f5; padding: 10px; margin: 10px 0; border-radius: 3px; } .timestamp { color: #666; font-size: 0.9em; } table { border-collapse: collapse; width: 100%; } th, td { border: 1px solid #ddd; padding: 8px; text-align: left; } th { background-color: #f2f2f2; } </style> </head> <body> <div class="header"> <h2>{{ if eq .Status "firing" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}</h2> <p>{{ .GroupLabels.alertname }} - {{ len .Alerts }} alert(s)</p> </div> <div class="alert-group"> <h3>Alert Summary</h3> <table> <tr><th>Property</th><th>Value</th></tr> <tr><td>Alert Group</td><td>{{ .GroupLabels.alertname }}</td></tr> <tr><td>Cluster</td><td>{{ .GroupLabels.cluster | default "Unknown" }}</td></tr> <tr><td>Environment</td><td>{{ .GroupLabels.environment | default "Unknown" }}</td></tr> <tr><td>Total Alerts</td><td>{{ len .Alerts }}</td></tr> <tr><td>Firing</td><td>{{ len .Alerts.Firing }}</td></tr> <tr><td>Resolved</td><td>{{ len .Alerts.Resolved }}</td></tr> </table> </div> {{ range .Alerts }} <div class="alert-item alert-{{ .Labels.severity }}"> <h4>{{ .Labels.alertname }}</h4> <p><strong>Instance:</strong> {{ .Labels.instance }}</p> <p><strong>Severity:</strong> {{ .Labels.severity }}</p> <p><strong>Status:</strong> {{ .Status }}</p> {{ if .Annotations.summary }} <p><strong>Summary:</strong> {{ .Annotations.summary }}</p> {{ end }} {{ if .Annotations.description }} <p><strong>Description:</strong> {{ .Annotations.description }}</p> {{ end }} {{ if .Annotations.runbook_url }} <p><strong>Runbook:</strong> <a href="{{ .Annotations.runbook_url }}">{{ .Annotations.runbook_url }}</a></p> {{ end }} <div class="labels"> <strong>Labels:</strong><br> {{ range .Labels.SortedPairs }} <span style="background-color: #e3f2fd; padding: 2px 6px; margin: 2px; border-radius: 3px; font-size: 0.9em;"> {{ .Name }}={{ .Value }} </span> {{ end }} </div> <p class="timestamp"> <strong>Started:</strong> {{ .StartsAt.Format "2006-01-02 15:04:05 MST" }}<br> {{ if .EndsAt }}<strong>Ended:</strong> {{ .EndsAt.Format "2006-01-02 15:04:05 MST" }}{{ end }} </p> </div> {{ end }} <div style="margin-top: 30px; padding: 15px; background-color: #f5f5f5; border-radius: 5px;"> <p><strong>Actions:</strong></p> <ul> <li><a href="{{ .ExternalURL }}">View in Alertmanager</a></li> <li><a href="{{ .ExternalURL }}/#/silences/new">Create Silence</a></li> <li><a href="http://prometheus.example.com/alerts">View in Prometheus</a></li> <li><a href="http://grafana.example.com/dashboard">View Dashboard</a></li> </ul> </div> </body> </html> # 邮件头部 headers: 'X-Priority': '1' # 高优先级 'X-Mailer': 'Alertmanager' 'X-Alert-Count': '{{ len .Alerts }}' 'X-Alert-Status': '{{ .Status }}' 'X-Alert-Severity': '{{ .GroupLabels.severity }}' # SMTP 配置覆盖 smarthost: 'smtp.company.com:587' auth_username: 'alerts@company.com' auth_password: 'company-password' auth_secret: 'smtp-secret' # 从文件读取密码 auth_identity: 'alerts@company.com' require_tls: true # TLS 配置 tls_config: ca_file: '/etc/ssl/certs/ca.pem' cert_file: '/etc/ssl/certs/client.pem' key_file: '/etc/ssl/private/client.key' server_name: 'smtp.company.com' insecure_skip_verify: false

多收件人邮件配置

yaml receivers: - name: 'multi-email' email_configs: # 主要收件人 - to: 'primary@example.com' subject: '[PRIMARY] {{ .GroupLabels.alertname }}' body: 'Primary alert notification...' # 抄送收件人 - to: 'secondary@example.com' subject: '[SECONDARY] {{ .GroupLabels.alertname }}' body: 'Secondary alert notification...' # 管理层通知(仅严重告警) - to: 'management@example.com' subject: '[MANAGEMENT] Critical Alert' body: 'Management notification for critical alerts...' # 可以在路由中通过条件控制

5.3 Slack 通知配置

基础 Slack 配置

“`yaml

全局 Slack 配置

global: slack_api_url: ‘https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK’

Slack 接收器配置

receivers: - name: ‘slack-team’ slack_configs: - channel: ‘#alerts’ username: ‘Alertmanager’ icon_emoji: ‘:warning:’ title: ‘{{ .GroupLabels.alertname }}’ text: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} {{ end }} “`

高级 Slack 配置

yaml receivers: - name: 'advanced-slack' slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#production-alerts' username: 'AlertBot' icon_url: 'https://example.com/alertmanager-icon.png' # 动态标题 title: | {{ if eq .Status "firing" }}🚨{{ else }}✅{{ end }} {{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts) title_link: 'http://alertmanager.example.com' # 富文本消息 text: | {{ if eq .Status "firing" }} *🚨 ALERT FIRING* {{ else }} *✅ ALERT RESOLVED* {{ end }} *Environment:* {{ .GroupLabels.environment | default "Unknown" }} *Cluster:* {{ .GroupLabels.cluster | default "Unknown" }} *Severity:* {{ .GroupLabels.severity | default "Unknown" }} {{ range .Alerts }} --- *Alert:* {{ .Annotations.summary }} *Instance:* {{ .Labels.instance }} *Service:* {{ .Labels.service | default "Unknown" }} {{ if .Annotations.description }}*Description:* {{ .Annotations.description }}{{ end }} {{ if .Annotations.runbook_url }}*Runbook:* <{{ .Annotations.runbook_url }}|View Runbook>{{ end }} *Started:* {{ .StartsAt.Format "2006-01-02 15:04:05" }} {{ if .EndsAt }}*Ended:* {{ .EndsAt.Format "2006-01-02 15:04:05" }}{{ end }} {{ end }} # 动态颜色 color: | {{ if eq .Status "firing" }} {{ if eq .GroupLabels.severity "critical" }}danger {{ else if eq .GroupLabels.severity "warning" }}warning {{ else }}good{{ end }} {{ else }}good{{ end }} # 结构化字段 fields: - title: 'Alert Count' value: '{{ len .Alerts }}' short: true - title: 'Firing' value: '{{ len .Alerts.Firing }}' short: true - title: 'Resolved' value: '{{ len .Alerts.Resolved }}' short: true - title: 'Environment' value: '{{ .GroupLabels.environment | default "Unknown" }}' short: true - title: 'Cluster' value: '{{ .GroupLabels.cluster | default "Unknown" }}' short: true - title: 'Severity' value: '{{ .GroupLabels.severity | default "Unknown" }}' short: true # 操作按钮 actions: - type: 'button' text: '🔍 View Details' url: '{{ .ExternalURL }}' style: 'primary' - type: 'button' text: '🔇 Create Silence' url: '{{ .ExternalURL }}/#/silences/new' style: 'default' - type: 'button' text: '📊 View Dashboard' url: 'http://grafana.example.com/dashboard/alerts' style: 'default' - type: 'button' text: '📖 Runbook' url: 'http://runbooks.example.com/{{ .GroupLabels.alertname }}' style: 'default' # 图片附件 image_url: 'http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1&width=800&height=400' thumb_url: 'http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1&width=200&height=100' # 脚注 footer: 'Alertmanager' footer_icon: 'https://example.com/alertmanager-small-icon.png' # 发送已解决的告警 send_resolved: true # HTTP 配置 http_config: proxy_url: 'http://proxy.example.com:8080' tls_config: insecure_skip_verify: true

Slack 工作流集成

yaml receivers: - name: 'slack-workflow' slack_configs: # 使用 Slack Workflow Webhook - api_url: 'https://hooks.slack.com/workflows/YOUR/WORKFLOW/WEBHOOK' channel: '#incidents' # 工作流专用格式 text: | { "alert_name": "{{ .GroupLabels.alertname }}", "status": "{{ .Status }}", "severity": "{{ .GroupLabels.severity }}", "environment": "{{ .GroupLabels.environment }}", "cluster": "{{ .GroupLabels.cluster }}", "alert_count": {{ len .Alerts }}, "firing_count": {{ len .Alerts.Firing }}, "resolved_count": {{ len .Alerts.Resolved }}, "alerts": [ {{ range $i, $alert := .Alerts }} {{ if $i }},{{ end }} { "summary": "{{ $alert.Annotations.summary }}", "instance": "{{ $alert.Labels.instance }}", "service": "{{ $alert.Labels.service }}", "started_at": "{{ $alert.StartsAt.Format "2006-01-02T15:04:05Z" }}" {{ if $alert.EndsAt }},"ended_at": "{{ $alert.EndsAt.Format "2006-01-02T15:04:05Z" }}"{{ end }} } {{ end }} ] }

5.4 Webhook 通知配置

基础 Webhook 配置

yaml receivers: - name: 'webhook-receiver' webhook_configs: - url: 'http://webhook.example.com/alerts' send_resolved: true # HTTP 配置 http_config: basic_auth: username: 'webhook-user' password: 'webhook-pass' bearer_token: 'webhook-token' # 自定义头部 http_headers: 'Content-Type': 'application/json' 'X-Custom-Header': 'alertmanager' 'X-API-Version': 'v1' # 最大告警数量 max_alerts: 10

自定义 Webhook 处理器

“`python

webhook_handler.py

from flask import Flask, request, jsonify import json import logging from datetime import datetime app = Flask(name) logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name) @app.route(‘/alerts’, methods=[‘POST’]) def handle_alerts(): try: data = request.get_json() # 解析告警数据 status = data.get(‘status’) group_labels = data.get(‘groupLabels’, {}) common_labels = data.get(‘commonLabels’, {}) common_annotations = data.get(‘commonAnnotations’, {}) alerts = data.get(‘alerts’, []) logger.info(f”Received {len(alerts)} alerts with status: {status}“) # 处理告警 for alert in alerts: process_alert(alert, status, group_labels) # 发送到其他系统 if status == ‘firing’: send_to_ticketing_system(data) send_to_monitoring_dashboard(data) return jsonify({‘status’: ‘success’, ‘processed’: len(alerts)}) except Exception as e: logger.error(f”Error processing alerts: {str(e)}“) return jsonify({‘status’: ‘error’, ‘message’: str(e)}), 500 def process_alert(alert, status, group_labels): “”“处理单个告警”“” labels = alert.get(‘labels’, {}) annotations = alert.get(‘annotations’, {}) alert_info = { ‘alertname’: labels.get(‘alertname’), ‘instance’: labels.get(‘instance’), ‘severity’: labels.get(‘severity’), ‘status’: status, ‘summary’: annotations.get(‘summary’), ‘description’: annotations.get(‘description’), ‘starts_at’: alert.get(‘startsAt’), ‘ends_at’: alert.get(‘endsAt’), ‘group_labels’: group_labels } # 根据严重程度处理 if labels.get(‘severity’) == ‘critical’: handle_critical_alert(alert_info) elif labels.get(‘severity’) == ‘warning’: handle_warning_alert(alert_info) else: handle_info_alert(alert_info) def handle_critical_alert(alert_info): “”“处理严重告警”“” logger.critical(f”Critical alert: {alert_info[‘alertname’]}“) # 发送到 PagerDuty send_to_pagerduty(alert_info) # 发送短信 send_sms_notification(alert_info) # 创建事件工单 create_incident_ticket(alert_info) def handle_warning_alert(alert_info): “”“处理警告告警”“” logger.warning(f”Warning alert: {alert_info[‘alertname’]}“) # 发送到团队聊天室 send_to_chat(alert_info) # 更新监控仪表板 update_dashboard(alert_info) def handle_info_alert(alert_info): “”“处理信息告警”“” logger.info(f”Info alert: {alert_info[‘alertname’]}“) # 记录到日志系统 log_to_system(alert_info) def send_to_ticketing_system(data): “”“发送到工单系统”“” # 实现工单系统集成 pass def send_to_monitoring_dashboard(data): “”“发送到监控仪表板”“” # 实现仪表板集成 pass if name == ‘main’: app.run(host=‘0.0.0.0’, port=8080) “`

Webhook 数据格式

json { "receiver": "webhook-receiver", "status": "firing", "alerts": [ { "status": "firing", "labels": { "alertname": "HighCPUUsage", "instance": "server1:9100", "job": "node-exporter", "severity": "warning" }, "annotations": { "description": "CPU usage is above 80%", "summary": "High CPU usage detected" }, "startsAt": "2024-01-15T10:30:00.000Z", "endsAt": "0001-01-01T00:00:00Z", "generatorURL": "http://prometheus:9090/graph?g0.expr=...", "fingerprint": "1234567890abcdef" } ], "groupLabels": { "alertname": "HighCPUUsage" }, "commonLabels": { "alertname": "HighCPUUsage", "severity": "warning" }, "commonAnnotations": { "description": "CPU usage is above 80%", "summary": "High CPU usage detected" }, "externalURL": "http://alertmanager:9093", "version": "4", "groupKey": "{}:{alertname=\"HighCPUUsage\"}", "truncatedAlerts": 0 }

5.5 PagerDuty 集成

基础 PagerDuty 配置

“`yaml

全局 PagerDuty 配置

global: pagerduty_url: ‘https://events.pagerduty.com/v2/enqueue’

PagerDuty 接收器配置

receivers: - name: ‘pagerduty-team’ pagerduty_configs: - routing_key: ‘YOUR_PAGERDUTY_INTEGRATION_KEY’ description: ‘{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}’ # 事件详情 details: cluster: ‘{{ .GroupLabels.cluster }}’ environment: ‘{{ .GroupLabels.environment }}’ alert_count: ‘{{ len .Alerts }}’ firing_count: ‘{{ len .Alerts.Firing }}’ resolved_count: ‘{{ len .Alerts.Resolved }}’ # 严重程度映射 severity: | {{ if eq .GroupLabels.severity “critical” }}critical {{ else if eq .GroupLabels.severity “warning” }}warning {{ else }}info{{ end }} # 客户端信息 client: ‘Alertmanager’ client_url: ‘http://alertmanager.example.com’ # 自定义链接 links: - href: ‘http://prometheus.example.com/alerts’ text: ‘Prometheus Alerts’ - href: ‘http://grafana.example.com/dashboard’ text: ‘Grafana Dashboard’ - href: ‘http://runbooks.example.com/{{ .GroupLabels.alertname }}’ text: ‘Runbook’ # 图片 images: - src: ‘http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1’ alt: ‘Alert Chart’ href: ‘http://grafana.example.com/dashboard/alerts’ “`

高级 PagerDuty 配置

yaml receivers: - name: 'pagerduty-advanced' pagerduty_configs: # 严重告警配置 - routing_key: 'critical-team-key' description: | [CRITICAL] {{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts in {{ .GroupLabels.cluster }}) # 动态严重程度 severity: | {{ if eq .GroupLabels.severity "critical" }}critical {{ else if eq .GroupLabels.severity "warning" }}warning {{ else }}info{{ end }} # 详细信息 details: alert_group: '{{ .GroupLabels.alertname }}' cluster: '{{ .GroupLabels.cluster }}' environment: '{{ .GroupLabels.environment }}' total_alerts: '{{ len .Alerts }}' firing_alerts: '{{ len .Alerts.Firing }}' resolved_alerts: '{{ len .Alerts.Resolved }}' first_alert_time: '{{ (index .Alerts 0).StartsAt.Format "2006-01-02 15:04:05" }}' alert_details: | {{ range .Alerts }} - {{ .Labels.alertname }} on {{ .Labels.instance }} ({{ .Labels.severity }}) Summary: {{ .Annotations.summary }} Started: {{ .StartsAt.Format "2006-01-02 15:04:05" }} {{ end }} # 组件信息 component: '{{ .GroupLabels.service | default .GroupLabels.job }}' group: '{{ .GroupLabels.team | default "infrastructure" }}' class: '{{ .GroupLabels.alertname }}' # 自定义字段 custom_details: environment: '{{ .GroupLabels.environment }}' cluster: '{{ .GroupLabels.cluster }}' team: '{{ .GroupLabels.team }}' runbook: '{{ .CommonAnnotations.runbook_url }}' dashboard: 'http://grafana.example.com/d/alerts/alerts' prometheus: 'http://prometheus.example.com/alerts' # HTTP 配置 http_config: proxy_url: 'http://proxy.example.com:8080' tls_config: insecure_skip_verify: false

5.6 企业通信工具集成

Microsoft Teams 配置

yaml receivers: - name: 'teams-notifications' webhook_configs: - url: 'https://outlook.office.com/webhook/YOUR-TEAMS-WEBHOOK-URL' send_resolved: true # Teams 消息格式 http_config: proxy_url: 'http://proxy.company.com:8080' http_headers: 'Content-Type': 'application/json' # 使用模板发送 Teams 格式的消息 # 需要在模板中定义 Teams 消息格式

Teams 消息模板

go {{/* teams.tmpl */}} {{ define "teams.message" }} { "@type": "MessageCard", "@context": "http://schema.org/extensions", "themeColor": "{{ if eq .Status \"firing\" }}{{ if eq .GroupLabels.severity \"critical\" }}FF0000{{ else }}FFA500{{ end }}{{ else }}00FF00{{ end }}", "summary": "{{ .GroupLabels.alertname }}", "sections": [ { "activityTitle": "{{ if eq .Status \"firing\" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}", "activitySubtitle": "{{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts)", "activityImage": "https://example.com/alertmanager-icon.png", "facts": [ { "name": "Environment", "value": "{{ .GroupLabels.environment | default \"Unknown\" }}" }, { "name": "Cluster", "value": "{{ .GroupLabels.cluster | default \"Unknown\" }}" }, { "name": "Severity", "value": "{{ .GroupLabels.severity | default \"Unknown\" }}" }, { "name": "Total Alerts", "value": "{{ len .Alerts }}" }, { "name": "Firing", "value": "{{ len .Alerts.Firing }}" }, { "name": "Resolved", "value": "{{ len .Alerts.Resolved }}" } ], "markdown": true } {{ range .Alerts }} ,{ "activityTitle": "{{ .Labels.alertname }}", "activitySubtitle": "{{ .Labels.instance }}", "facts": [ { "name": "Summary", "value": "{{ .Annotations.summary }}" }, { "name": "Description", "value": "{{ .Annotations.description }}" }, { "name": "Started", "value": "{{ .StartsAt.Format \"2006-01-02 15:04:05\" }}" } {{ if .EndsAt }} ,{ "name": "Ended", "value": "{{ .EndsAt.Format \"2006-01-02 15:04:05\" }}" } {{ end }} ] } {{ end }} ], "potentialAction": [ { "@type": "OpenUri", "name": "View in Alertmanager", "targets": [ { "os": "default", "uri": "{{ .ExternalURL }}" } ] }, { "@type": "OpenUri", "name": "View Dashboard", "targets": [ { "os": "default", "uri": "http://grafana.example.com/dashboard/alerts" } ] }, { "@type": "OpenUri", "name": "Create Silence", "targets": [ { "os": "default", "uri": "{{ .ExternalURL }}/#/silences/new" } ] } ] } {{ end }}

Discord 配置

yaml receivers: - name: 'discord-notifications' webhook_configs: - url: 'https://discord.com/api/webhooks/YOUR/DISCORD/WEBHOOK' send_resolved: true http_headers: 'Content-Type': 'application/json' # Discord 消息需要通过模板格式化

Discord 消息模板

go {{/* discord.tmpl */}} {{ define "discord.message" }} { "username": "Alertmanager", "avatar_url": "https://example.com/alertmanager-avatar.png", "embeds": [ { "title": "{{ if eq .Status \"firing\" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}", "description": "{{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts)", "color": {{ if eq .Status "firing" }}{{ if eq .GroupLabels.severity "critical" }}16711680{{ else }}16753920{{ end }}{{ else }}65280{{ end }}, "timestamp": "{{ (index .Alerts 0).StartsAt.Format \"2006-01-02T15:04:05Z\" }}", "fields": [ { "name": "Environment", "value": "{{ .GroupLabels.environment | default \"Unknown\" }}", "inline": true }, { "name": "Cluster", "value": "{{ .GroupLabels.cluster | default \"Unknown\" }}", "inline": true }, { "name": "Severity", "value": "{{ .GroupLabels.severity | default \"Unknown\" }}", "inline": true } {{ range .Alerts }} ,{ "name": "{{ .Labels.alertname }}", "value": "**Instance:** {{ .Labels.instance }}\n**Summary:** {{ .Annotations.summary }}\n**Started:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}", "inline": false } {{ end }} ], "footer": { "text": "Alertmanager", "icon_url": "https://example.com/alertmanager-small-icon.png" } } ] } {{ end }}

5.7 移动端通知

Telegram 配置

yaml receivers: - name: 'telegram-notifications' webhook_configs: - url: 'https://api.telegram.org/botYOUR_BOT_TOKEN/sendMessage' send_resolved: true http_headers: 'Content-Type': 'application/json' # Telegram 消息格式

Telegram 消息模板

go {{/* telegram.tmpl */}} {{ define "telegram.message" }} { "chat_id": "YOUR_CHAT_ID", "parse_mode": "Markdown", "text": "{{ if eq .Status \"firing\" }}🚨 *ALERT FIRING*{{ else }}✅ *ALERT RESOLVED*{{ end }}\n\n*Alert Group:* {{ .GroupLabels.alertname }}\n*Environment:* {{ .GroupLabels.environment | default \"Unknown\" }}\n*Cluster:* {{ .GroupLabels.cluster | default \"Unknown\" }}\n*Severity:* {{ .GroupLabels.severity | default \"Unknown\" }}\n*Total Alerts:* {{ len .Alerts }}\n\n{{ range .Alerts }}---\n*Alert:* {{ .Annotations.summary }}\n*Instance:* {{ .Labels.instance }}\n*Started:* {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.description }}*Description:* {{ .Annotations.description }}\n{{ end }}{{ end }}\n[View in Alertmanager]({{ .ExternalURL }})", "reply_markup": { "inline_keyboard": [ [ { "text": "🔍 View Details", "url": "{{ .ExternalURL }}" }, { "text": "📊 Dashboard", "url": "http://grafana.example.com/dashboard/alerts" } ], [ { "text": "🔇 Create Silence", "url": "{{ .ExternalURL }}/#/silences/new" }, { "text": "📖 Runbook", "url": "http://runbooks.example.com/{{ .GroupLabels.alertname }}" } ] ] } } {{ end }}

Pushover 配置

yaml receivers: - name: 'pushover-notifications' webhook_configs: - url: 'https://api.pushover.net/1/messages.json' send_resolved: true http_headers: 'Content-Type': 'application/x-www-form-urlencoded' # Pushover 需要特殊的数据格式

5.8 通知渠道最佳实践

多渠道通知策略

“`yaml

分层通知策略

route: receiver: ‘default’ routes: # 严重告警:多渠道通知 - match: severity: critical receiver: ‘critical-multi-channel’ continue: true # 警告告警:标准通知 - match: severity: warning receiver: ‘warning-standard’ # 信息告警:低优先级通知 - match: severity: info receiver: ‘info-low-priority’ receivers:

严重告警多渠道

  • name: ‘critical-multi-channel’ email_configs:

    • to: ‘oncall@example.com’ subject: ‘[CRITICAL] {{ .GroupLabels.alertname }}’ slack_configs:

    • channel: ‘#critical-alerts’ title: ‘🚨 CRITICAL: {{ .GroupLabels.alertname }}’ pagerduty_configs:

    • routing_key: ‘critical-pagerduty-key’ severity: ‘critical’ webhook_configs:

    • url: ‘http://sms-gateway.example.com/send’

      发送短信通知

      警告告警标准通知

  • name: ‘warning-standard’ email_configs:

    • to: ‘team@example.com’ subject: ‘[WARNING] {{ .GroupLabels.alertname }}’ slack_configs:

    • channel: ‘#alerts’ title: ‘⚠️ WARNING: {{ .GroupLabels.alertname }}’

      信息告警低优先级

  • name: ‘info-low-priority’ email_configs:

    • to: ‘logs@example.com’ subject: ‘[INFO] {{ .GroupLabels.alertname }}’ “`

      通知去重和限流

      “`yaml

      通过路由配置实现限流

      route: receiver: ‘default’

      全局限流配置

      group_wait: 30s group_interval: 5m repeat_interval: 2h routes:

      高频告警限流

    • match: frequency: high receiver: ‘high-frequency-alerts’ group_wait: 2m # 延长等待时间 group_interval: 15m # 降低发送频率 repeat_interval: 6h # 减少重复通知

      低频告警快速响应

    • match: frequency: low receiver: ‘low-frequency-alerts’ group_wait: 10s # 快速响应 group_interval: 2m # 正常频率 repeat_interval: 1h # 正常重复 “`

      通知内容优化

      “`yaml

      优化的通知模板

      templates:

    • ’/etc/alertmanager/templates/optimized.tmpl’ receivers:

  • name: ‘optimized-notifications’ email_configs:

    • to: ‘team@example.com’ subject: ‘{{ template “email.subject.optimized” . }}’ body: ‘{{ template “email.body.optimized” . }}’ slack_configs:

    • channel: ‘#alerts’ title: ‘{{ template “slack.title.optimized” . }}’ text: ‘{{ template “slack.text.optimized” . }}’ color: ‘{{ template “slack.color.optimized” . }}’ “`

      通知测试和验证

      “`bash #!/bin/bash

      test-notifications.sh

      ALERTMANAGER_URL=”http://localhost:9093” echo “=== 通知渠道测试 ===”

      测试告警数据

      test_alert=‘[ { “labels”: { “alertname”: “TestNotification”, “severity”: “warning”, “instance”: “test-server:9100”, “team”: “test”, “environment”: “test” }, “annotations”: { “summary”: “This is a test notification”, “description”: “Testing notification channels configuration” }, “startsAt”: “’$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ)‘” } ]’ echo “发送测试告警…” if curl -XPOST “$ALERTMANAGER_URL/api/v1/alerts” \ -H “Content-Type: application/json” \ -d “$test_alert”; then echo “✅ 测试告警发送成功” else echo “❌ 测试告警发送失败” exit 1 fi echo “\n等待通知发送…” sleep 30 echo “\n检查告警状态…” curl -s “$ALERTMANAGER_URL/api/v1/alerts” | jq ‘.data[] | select(.labels.alertname==“TestNotification”)’ echo “\n=== 测试完成 ===” echo “请检查各通知渠道是否收到测试消息” “`

      本章小结

      本章详细介绍了 Alertmanager 的各种通知渠道配置:

      核心通知渠道

  1. 邮件通知:传统可靠的通知方式,支持 HTML 格式

  2. Slack 集成:现代团队协作的首选通知方式

  3. Webhook:灵活的自定义集成方案

  4. PagerDuty:专业的事件管理和值班系统

  5. 企业通信工具:Teams、Discord 等企业级集成

  6. 移动端通知:Telegram、Pushover 等移动友好方案

    配置要点

  7. 模板化:使用模板系统统一消息格式

  8. 多渠道:根据严重程度选择合适的通知渠道

  9. 限流控制:避免通知风暴影响系统性能

  10. 内容优化:提供有用的上下文信息和操作链接

    最佳实践

  11. 分层通知:不同严重程度使用不同通知策略

  12. 测试验证:定期测试通知渠道的可用性

  13. 监控优化:持续优化通知内容和频率

  14. 安全考虑:保护通知渠道的认证信息

    下一步学习

    在下一章中,我们将学习告警抑制与静默功能,包括:

  • 抑制规则的设计和配置
  • 静默管理和自动化
  • 告警降噪策略
  • 维护窗口管理 — 下一章: 告警抑制与静默