5.1 通知渠道概述
支持的通知渠道
Alertmanager 支持多种通知渠道,可以满足不同场景的需求:
mermaid
flowchart LR
A[Alertmanager] --> B[Email]
A --> C[Slack]
A --> D[Webhook]
A --> E[PagerDuty]
A --> F[OpsGenie]
A --> G[VictorOps]
A --> H[WeChat]
A --> I[Telegram]
A --> J[Discord]
A --> K[Microsoft Teams]
A --> L[Pushover]
A --> M[SNS]
A --> N[WebEx]
通知渠道特性对比
| 通知渠道 | 实时性 | 可靠性 | 交互性 | 移动支持 | 企业集成 | 成本 | |———|——–|——–|——–|———-|———-|——| | Email | 中 | 高 | 低 | 中 | 高 | 低 | | Slack | 高 | 高 | 高 | 高 | 高 | 中 | | Webhook | 高 | 中 | 高 | - | 高 | 低 | | PagerDuty | 高 | 高 | 高 | 高 | 高 | 高 | | OpsGenie | 高 | 高 | 高 | 高 | 高 | 高 | | WeChat | 高 | 高 | 中 | 高 | 中 | 低 | | Telegram | 高 | 中 | 中 | 高 | 低 | 低 | | SMS | 高 | 高 | 低 | 高 | 中 | 中 |
5.2 邮件通知配置
基础邮件配置
“`yaml
全局 SMTP 配置
global: smtp_smarthost: ‘smtp.gmail.com:587’ smtp_from: ‘alerts@example.com’ smtp_auth_username: ‘alerts@example.com’ smtp_auth_password: ‘app-password’ smtp_require_tls: true smtp_hello: ‘alertmanager.example.com’
邮件接收器配置
receivers: - name: ‘email-team’ email_configs: - to: ‘team@example.com’ subject: ‘[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}’ body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Labels: {{ range .Labels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }} {{ end }} “`
高级邮件配置
yaml
receivers:
- name: 'advanced-email'
email_configs:
# 基本配置
- to: 'alerts@example.com'
from: 'alertmanager@company.com' # 覆盖全局配置
subject: |
[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}
({{ len .Alerts }} alerts)
# 纯文本邮件正文
body: |
{{ if eq .Status "firing" }}
🚨 ALERT FIRING
{{ else }}
✅ ALERT RESOLVED
{{ end }}
Alert Group: {{ .GroupLabels.alertname }}
Cluster: {{ .GroupLabels.cluster | default "Unknown" }}
Environment: {{ .GroupLabels.environment | default "Unknown" }}
Severity: {{ .GroupLabels.severity | default "Unknown" }}
Total Alerts: {{ len .Alerts }}
Firing: {{ len .Alerts.Firing }}
Resolved: {{ len .Alerts.Resolved }}
{{ range .Alerts }}
=====================================
Alert: {{ .Labels.alertname }}
Instance: {{ .Labels.instance }}
Severity: {{ .Labels.severity }}
Status: {{ .Status }}
Summary: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ if .Annotations.runbook_url }}Runbook: {{ .Annotations.runbook_url }}{{ end }}
Started: {{ .StartsAt.Format "2006-01-02 15:04:05 MST" }}
{{ if .EndsAt }}Ended: {{ .EndsAt.Format "2006-01-02 15:04:05 MST" }}{{ end }}
Labels:
{{ range .Labels.SortedPairs }} {{ .Name }}: {{ .Value }}
{{ end }}
Annotations:
{{ range .Annotations.SortedPairs }} {{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
View in Alertmanager: {{ .ExternalURL }}
# HTML 邮件正文
html: |
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Alert Notification</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
.header { background-color: {{ if eq .Status "firing" }}#d32f2f{{ else }}#388e3c{{ end }}; color: white; padding: 15px; border-radius: 5px; }
.alert-group { margin: 20px 0; }
.alert-item { border: 1px solid #ddd; margin: 10px 0; padding: 15px; border-radius: 5px; }
.alert-critical { border-left: 5px solid #d32f2f; }
.alert-warning { border-left: 5px solid #f57c00; }
.alert-info { border-left: 5px solid #1976d2; }
.labels { background-color: #f5f5f5; padding: 10px; margin: 10px 0; border-radius: 3px; }
.timestamp { color: #666; font-size: 0.9em; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
</style>
</head>
<body>
<div class="header">
<h2>{{ if eq .Status "firing" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}</h2>
<p>{{ .GroupLabels.alertname }} - {{ len .Alerts }} alert(s)</p>
</div>
<div class="alert-group">
<h3>Alert Summary</h3>
<table>
<tr><th>Property</th><th>Value</th></tr>
<tr><td>Alert Group</td><td>{{ .GroupLabels.alertname }}</td></tr>
<tr><td>Cluster</td><td>{{ .GroupLabels.cluster | default "Unknown" }}</td></tr>
<tr><td>Environment</td><td>{{ .GroupLabels.environment | default "Unknown" }}</td></tr>
<tr><td>Total Alerts</td><td>{{ len .Alerts }}</td></tr>
<tr><td>Firing</td><td>{{ len .Alerts.Firing }}</td></tr>
<tr><td>Resolved</td><td>{{ len .Alerts.Resolved }}</td></tr>
</table>
</div>
{{ range .Alerts }}
<div class="alert-item alert-{{ .Labels.severity }}">
<h4>{{ .Labels.alertname }}</h4>
<p><strong>Instance:</strong> {{ .Labels.instance }}</p>
<p><strong>Severity:</strong> {{ .Labels.severity }}</p>
<p><strong>Status:</strong> {{ .Status }}</p>
{{ if .Annotations.summary }}
<p><strong>Summary:</strong> {{ .Annotations.summary }}</p>
{{ end }}
{{ if .Annotations.description }}
<p><strong>Description:</strong> {{ .Annotations.description }}</p>
{{ end }}
{{ if .Annotations.runbook_url }}
<p><strong>Runbook:</strong> <a href="{{ .Annotations.runbook_url }}">{{ .Annotations.runbook_url }}</a></p>
{{ end }}
<div class="labels">
<strong>Labels:</strong><br>
{{ range .Labels.SortedPairs }}
<span style="background-color: #e3f2fd; padding: 2px 6px; margin: 2px; border-radius: 3px; font-size: 0.9em;">
{{ .Name }}={{ .Value }}
</span>
{{ end }}
</div>
<p class="timestamp">
<strong>Started:</strong> {{ .StartsAt.Format "2006-01-02 15:04:05 MST" }}<br>
{{ if .EndsAt }}<strong>Ended:</strong> {{ .EndsAt.Format "2006-01-02 15:04:05 MST" }}{{ end }}
</p>
</div>
{{ end }}
<div style="margin-top: 30px; padding: 15px; background-color: #f5f5f5; border-radius: 5px;">
<p><strong>Actions:</strong></p>
<ul>
<li><a href="{{ .ExternalURL }}">View in Alertmanager</a></li>
<li><a href="{{ .ExternalURL }}/#/silences/new">Create Silence</a></li>
<li><a href="http://prometheus.example.com/alerts">View in Prometheus</a></li>
<li><a href="http://grafana.example.com/dashboard">View Dashboard</a></li>
</ul>
</div>
</body>
</html>
# 邮件头部
headers:
'X-Priority': '1' # 高优先级
'X-Mailer': 'Alertmanager'
'X-Alert-Count': '{{ len .Alerts }}'
'X-Alert-Status': '{{ .Status }}'
'X-Alert-Severity': '{{ .GroupLabels.severity }}'
# SMTP 配置覆盖
smarthost: 'smtp.company.com:587'
auth_username: 'alerts@company.com'
auth_password: 'company-password'
auth_secret: 'smtp-secret' # 从文件读取密码
auth_identity: 'alerts@company.com'
require_tls: true
# TLS 配置
tls_config:
ca_file: '/etc/ssl/certs/ca.pem'
cert_file: '/etc/ssl/certs/client.pem'
key_file: '/etc/ssl/private/client.key'
server_name: 'smtp.company.com'
insecure_skip_verify: false
多收件人邮件配置
yaml
receivers:
- name: 'multi-email'
email_configs:
# 主要收件人
- to: 'primary@example.com'
subject: '[PRIMARY] {{ .GroupLabels.alertname }}'
body: 'Primary alert notification...'
# 抄送收件人
- to: 'secondary@example.com'
subject: '[SECONDARY] {{ .GroupLabels.alertname }}'
body: 'Secondary alert notification...'
# 管理层通知(仅严重告警)
- to: 'management@example.com'
subject: '[MANAGEMENT] Critical Alert'
body: 'Management notification for critical alerts...'
# 可以在路由中通过条件控制
5.3 Slack 通知配置
基础 Slack 配置
“`yaml
全局 Slack 配置
global: slack_api_url: ‘https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK’
Slack 接收器配置
receivers: - name: ‘slack-team’ slack_configs: - channel: ‘#alerts’ username: ‘Alertmanager’ icon_emoji: ‘:warning:’ title: ‘{{ .GroupLabels.alertname }}’ text: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Instance: {{ .Labels.instance }} Severity: {{ .Labels.severity }} {{ end }} “`
高级 Slack 配置
yaml
receivers:
- name: 'advanced-slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#production-alerts'
username: 'AlertBot'
icon_url: 'https://example.com/alertmanager-icon.png'
# 动态标题
title: |
{{ if eq .Status "firing" }}🚨{{ else }}✅{{ end }}
{{ .GroupLabels.alertname }}
({{ len .Alerts }} alerts)
title_link: 'http://alertmanager.example.com'
# 富文本消息
text: |
{{ if eq .Status "firing" }}
*🚨 ALERT FIRING*
{{ else }}
*✅ ALERT RESOLVED*
{{ end }}
*Environment:* {{ .GroupLabels.environment | default "Unknown" }}
*Cluster:* {{ .GroupLabels.cluster | default "Unknown" }}
*Severity:* {{ .GroupLabels.severity | default "Unknown" }}
{{ range .Alerts }}
---
*Alert:* {{ .Annotations.summary }}
*Instance:* {{ .Labels.instance }}
*Service:* {{ .Labels.service | default "Unknown" }}
{{ if .Annotations.description }}*Description:* {{ .Annotations.description }}{{ end }}
{{ if .Annotations.runbook_url }}*Runbook:* <{{ .Annotations.runbook_url }}|View Runbook>{{ end }}
*Started:* {{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{ if .EndsAt }}*Ended:* {{ .EndsAt.Format "2006-01-02 15:04:05" }}{{ end }}
{{ end }}
# 动态颜色
color: |
{{ if eq .Status "firing" }}
{{ if eq .GroupLabels.severity "critical" }}danger
{{ else if eq .GroupLabels.severity "warning" }}warning
{{ else }}good{{ end }}
{{ else }}good{{ end }}
# 结构化字段
fields:
- title: 'Alert Count'
value: '{{ len .Alerts }}'
short: true
- title: 'Firing'
value: '{{ len .Alerts.Firing }}'
short: true
- title: 'Resolved'
value: '{{ len .Alerts.Resolved }}'
short: true
- title: 'Environment'
value: '{{ .GroupLabels.environment | default "Unknown" }}'
short: true
- title: 'Cluster'
value: '{{ .GroupLabels.cluster | default "Unknown" }}'
short: true
- title: 'Severity'
value: '{{ .GroupLabels.severity | default "Unknown" }}'
short: true
# 操作按钮
actions:
- type: 'button'
text: '🔍 View Details'
url: '{{ .ExternalURL }}'
style: 'primary'
- type: 'button'
text: '🔇 Create Silence'
url: '{{ .ExternalURL }}/#/silences/new'
style: 'default'
- type: 'button'
text: '📊 View Dashboard'
url: 'http://grafana.example.com/dashboard/alerts'
style: 'default'
- type: 'button'
text: '📖 Runbook'
url: 'http://runbooks.example.com/{{ .GroupLabels.alertname }}'
style: 'default'
# 图片附件
image_url: 'http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1&width=800&height=400'
thumb_url: 'http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1&width=200&height=100'
# 脚注
footer: 'Alertmanager'
footer_icon: 'https://example.com/alertmanager-small-icon.png'
# 发送已解决的告警
send_resolved: true
# HTTP 配置
http_config:
proxy_url: 'http://proxy.example.com:8080'
tls_config:
insecure_skip_verify: true
Slack 工作流集成
yaml
receivers:
- name: 'slack-workflow'
slack_configs:
# 使用 Slack Workflow Webhook
- api_url: 'https://hooks.slack.com/workflows/YOUR/WORKFLOW/WEBHOOK'
channel: '#incidents'
# 工作流专用格式
text: |
{
"alert_name": "{{ .GroupLabels.alertname }}",
"status": "{{ .Status }}",
"severity": "{{ .GroupLabels.severity }}",
"environment": "{{ .GroupLabels.environment }}",
"cluster": "{{ .GroupLabels.cluster }}",
"alert_count": {{ len .Alerts }},
"firing_count": {{ len .Alerts.Firing }},
"resolved_count": {{ len .Alerts.Resolved }},
"alerts": [
{{ range $i, $alert := .Alerts }}
{{ if $i }},{{ end }}
{
"summary": "{{ $alert.Annotations.summary }}",
"instance": "{{ $alert.Labels.instance }}",
"service": "{{ $alert.Labels.service }}",
"started_at": "{{ $alert.StartsAt.Format "2006-01-02T15:04:05Z" }}"
{{ if $alert.EndsAt }},"ended_at": "{{ $alert.EndsAt.Format "2006-01-02T15:04:05Z" }}"{{ end }}
}
{{ end }}
]
}
5.4 Webhook 通知配置
基础 Webhook 配置
yaml
receivers:
- name: 'webhook-receiver'
webhook_configs:
- url: 'http://webhook.example.com/alerts'
send_resolved: true
# HTTP 配置
http_config:
basic_auth:
username: 'webhook-user'
password: 'webhook-pass'
bearer_token: 'webhook-token'
# 自定义头部
http_headers:
'Content-Type': 'application/json'
'X-Custom-Header': 'alertmanager'
'X-API-Version': 'v1'
# 最大告警数量
max_alerts: 10
自定义 Webhook 处理器
“`python
webhook_handler.py
from flask import Flask, request, jsonify import json import logging from datetime import datetime app = Flask(name) logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name) @app.route(‘/alerts’, methods=[‘POST’]) def handle_alerts(): try: data = request.get_json() # 解析告警数据 status = data.get(‘status’) group_labels = data.get(‘groupLabels’, {}) common_labels = data.get(‘commonLabels’, {}) common_annotations = data.get(‘commonAnnotations’, {}) alerts = data.get(‘alerts’, []) logger.info(f”Received {len(alerts)} alerts with status: {status}“) # 处理告警 for alert in alerts: process_alert(alert, status, group_labels) # 发送到其他系统 if status == ‘firing’: send_to_ticketing_system(data) send_to_monitoring_dashboard(data) return jsonify({‘status’: ‘success’, ‘processed’: len(alerts)}) except Exception as e: logger.error(f”Error processing alerts: {str(e)}“) return jsonify({‘status’: ‘error’, ‘message’: str(e)}), 500 def process_alert(alert, status, group_labels): “”“处理单个告警”“” labels = alert.get(‘labels’, {}) annotations = alert.get(‘annotations’, {}) alert_info = { ‘alertname’: labels.get(‘alertname’), ‘instance’: labels.get(‘instance’), ‘severity’: labels.get(‘severity’), ‘status’: status, ‘summary’: annotations.get(‘summary’), ‘description’: annotations.get(‘description’), ‘starts_at’: alert.get(‘startsAt’), ‘ends_at’: alert.get(‘endsAt’), ‘group_labels’: group_labels } # 根据严重程度处理 if labels.get(‘severity’) == ‘critical’: handle_critical_alert(alert_info) elif labels.get(‘severity’) == ‘warning’: handle_warning_alert(alert_info) else: handle_info_alert(alert_info) def handle_critical_alert(alert_info): “”“处理严重告警”“” logger.critical(f”Critical alert: {alert_info[‘alertname’]}“) # 发送到 PagerDuty send_to_pagerduty(alert_info) # 发送短信 send_sms_notification(alert_info) # 创建事件工单 create_incident_ticket(alert_info) def handle_warning_alert(alert_info): “”“处理警告告警”“” logger.warning(f”Warning alert: {alert_info[‘alertname’]}“) # 发送到团队聊天室 send_to_chat(alert_info) # 更新监控仪表板 update_dashboard(alert_info) def handle_info_alert(alert_info): “”“处理信息告警”“” logger.info(f”Info alert: {alert_info[‘alertname’]}“) # 记录到日志系统 log_to_system(alert_info) def send_to_ticketing_system(data): “”“发送到工单系统”“” # 实现工单系统集成 pass def send_to_monitoring_dashboard(data): “”“发送到监控仪表板”“” # 实现仪表板集成 pass if name == ‘main’: app.run(host=‘0.0.0.0’, port=8080) “`
Webhook 数据格式
json
{
"receiver": "webhook-receiver",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighCPUUsage",
"instance": "server1:9100",
"job": "node-exporter",
"severity": "warning"
},
"annotations": {
"description": "CPU usage is above 80%",
"summary": "High CPU usage detected"
},
"startsAt": "2024-01-15T10:30:00.000Z",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://prometheus:9090/graph?g0.expr=...",
"fingerprint": "1234567890abcdef"
}
],
"groupLabels": {
"alertname": "HighCPUUsage"
},
"commonLabels": {
"alertname": "HighCPUUsage",
"severity": "warning"
},
"commonAnnotations": {
"description": "CPU usage is above 80%",
"summary": "High CPU usage detected"
},
"externalURL": "http://alertmanager:9093",
"version": "4",
"groupKey": "{}:{alertname=\"HighCPUUsage\"}",
"truncatedAlerts": 0
}
5.5 PagerDuty 集成
基础 PagerDuty 配置
“`yaml
全局 PagerDuty 配置
global: pagerduty_url: ‘https://events.pagerduty.com/v2/enqueue’
PagerDuty 接收器配置
receivers: - name: ‘pagerduty-team’ pagerduty_configs: - routing_key: ‘YOUR_PAGERDUTY_INTEGRATION_KEY’ description: ‘{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}’ # 事件详情 details: cluster: ‘{{ .GroupLabels.cluster }}’ environment: ‘{{ .GroupLabels.environment }}’ alert_count: ‘{{ len .Alerts }}’ firing_count: ‘{{ len .Alerts.Firing }}’ resolved_count: ‘{{ len .Alerts.Resolved }}’ # 严重程度映射 severity: | {{ if eq .GroupLabels.severity “critical” }}critical {{ else if eq .GroupLabels.severity “warning” }}warning {{ else }}info{{ end }} # 客户端信息 client: ‘Alertmanager’ client_url: ‘http://alertmanager.example.com’ # 自定义链接 links: - href: ‘http://prometheus.example.com/alerts’ text: ‘Prometheus Alerts’ - href: ‘http://grafana.example.com/dashboard’ text: ‘Grafana Dashboard’ - href: ‘http://runbooks.example.com/{{ .GroupLabels.alertname }}’ text: ‘Runbook’ # 图片 images: - src: ‘http://grafana.example.com/render/dashboard-solo/db/alerts?panelId=1’ alt: ‘Alert Chart’ href: ‘http://grafana.example.com/dashboard/alerts’ “`
高级 PagerDuty 配置
yaml
receivers:
- name: 'pagerduty-advanced'
pagerduty_configs:
# 严重告警配置
- routing_key: 'critical-team-key'
description: |
[CRITICAL] {{ .GroupLabels.alertname }}
({{ len .Alerts }} alerts in {{ .GroupLabels.cluster }})
# 动态严重程度
severity: |
{{ if eq .GroupLabels.severity "critical" }}critical
{{ else if eq .GroupLabels.severity "warning" }}warning
{{ else }}info{{ end }}
# 详细信息
details:
alert_group: '{{ .GroupLabels.alertname }}'
cluster: '{{ .GroupLabels.cluster }}'
environment: '{{ .GroupLabels.environment }}'
total_alerts: '{{ len .Alerts }}'
firing_alerts: '{{ len .Alerts.Firing }}'
resolved_alerts: '{{ len .Alerts.Resolved }}'
first_alert_time: '{{ (index .Alerts 0).StartsAt.Format "2006-01-02 15:04:05" }}'
alert_details: |
{{ range .Alerts }}
- {{ .Labels.alertname }} on {{ .Labels.instance }} ({{ .Labels.severity }})
Summary: {{ .Annotations.summary }}
Started: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
# 组件信息
component: '{{ .GroupLabels.service | default .GroupLabels.job }}'
group: '{{ .GroupLabels.team | default "infrastructure" }}'
class: '{{ .GroupLabels.alertname }}'
# 自定义字段
custom_details:
environment: '{{ .GroupLabels.environment }}'
cluster: '{{ .GroupLabels.cluster }}'
team: '{{ .GroupLabels.team }}'
runbook: '{{ .CommonAnnotations.runbook_url }}'
dashboard: 'http://grafana.example.com/d/alerts/alerts'
prometheus: 'http://prometheus.example.com/alerts'
# HTTP 配置
http_config:
proxy_url: 'http://proxy.example.com:8080'
tls_config:
insecure_skip_verify: false
5.6 企业通信工具集成
Microsoft Teams 配置
yaml
receivers:
- name: 'teams-notifications'
webhook_configs:
- url: 'https://outlook.office.com/webhook/YOUR-TEAMS-WEBHOOK-URL'
send_resolved: true
# Teams 消息格式
http_config:
proxy_url: 'http://proxy.company.com:8080'
http_headers:
'Content-Type': 'application/json'
# 使用模板发送 Teams 格式的消息
# 需要在模板中定义 Teams 消息格式
Teams 消息模板
go
{{/* teams.tmpl */}}
{{ define "teams.message" }}
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"themeColor": "{{ if eq .Status \"firing\" }}{{ if eq .GroupLabels.severity \"critical\" }}FF0000{{ else }}FFA500{{ end }}{{ else }}00FF00{{ end }}",
"summary": "{{ .GroupLabels.alertname }}",
"sections": [
{
"activityTitle": "{{ if eq .Status \"firing\" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}",
"activitySubtitle": "{{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts)",
"activityImage": "https://example.com/alertmanager-icon.png",
"facts": [
{
"name": "Environment",
"value": "{{ .GroupLabels.environment | default \"Unknown\" }}"
},
{
"name": "Cluster",
"value": "{{ .GroupLabels.cluster | default \"Unknown\" }}"
},
{
"name": "Severity",
"value": "{{ .GroupLabels.severity | default \"Unknown\" }}"
},
{
"name": "Total Alerts",
"value": "{{ len .Alerts }}"
},
{
"name": "Firing",
"value": "{{ len .Alerts.Firing }}"
},
{
"name": "Resolved",
"value": "{{ len .Alerts.Resolved }}"
}
],
"markdown": true
}
{{ range .Alerts }}
,{
"activityTitle": "{{ .Labels.alertname }}",
"activitySubtitle": "{{ .Labels.instance }}",
"facts": [
{
"name": "Summary",
"value": "{{ .Annotations.summary }}"
},
{
"name": "Description",
"value": "{{ .Annotations.description }}"
},
{
"name": "Started",
"value": "{{ .StartsAt.Format \"2006-01-02 15:04:05\" }}"
}
{{ if .EndsAt }}
,{
"name": "Ended",
"value": "{{ .EndsAt.Format \"2006-01-02 15:04:05\" }}"
}
{{ end }}
]
}
{{ end }}
],
"potentialAction": [
{
"@type": "OpenUri",
"name": "View in Alertmanager",
"targets": [
{
"os": "default",
"uri": "{{ .ExternalURL }}"
}
]
},
{
"@type": "OpenUri",
"name": "View Dashboard",
"targets": [
{
"os": "default",
"uri": "http://grafana.example.com/dashboard/alerts"
}
]
},
{
"@type": "OpenUri",
"name": "Create Silence",
"targets": [
{
"os": "default",
"uri": "{{ .ExternalURL }}/#/silences/new"
}
]
}
]
}
{{ end }}
Discord 配置
yaml
receivers:
- name: 'discord-notifications'
webhook_configs:
- url: 'https://discord.com/api/webhooks/YOUR/DISCORD/WEBHOOK'
send_resolved: true
http_headers:
'Content-Type': 'application/json'
# Discord 消息需要通过模板格式化
Discord 消息模板
go
{{/* discord.tmpl */}}
{{ define "discord.message" }}
{
"username": "Alertmanager",
"avatar_url": "https://example.com/alertmanager-avatar.png",
"embeds": [
{
"title": "{{ if eq .Status \"firing\" }}🚨 Alert Firing{{ else }}✅ Alert Resolved{{ end }}",
"description": "{{ .GroupLabels.alertname }} ({{ len .Alerts }} alerts)",
"color": {{ if eq .Status "firing" }}{{ if eq .GroupLabels.severity "critical" }}16711680{{ else }}16753920{{ end }}{{ else }}65280{{ end }},
"timestamp": "{{ (index .Alerts 0).StartsAt.Format \"2006-01-02T15:04:05Z\" }}",
"fields": [
{
"name": "Environment",
"value": "{{ .GroupLabels.environment | default \"Unknown\" }}",
"inline": true
},
{
"name": "Cluster",
"value": "{{ .GroupLabels.cluster | default \"Unknown\" }}",
"inline": true
},
{
"name": "Severity",
"value": "{{ .GroupLabels.severity | default \"Unknown\" }}",
"inline": true
}
{{ range .Alerts }}
,{
"name": "{{ .Labels.alertname }}",
"value": "**Instance:** {{ .Labels.instance }}\n**Summary:** {{ .Annotations.summary }}\n**Started:** {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}",
"inline": false
}
{{ end }}
],
"footer": {
"text": "Alertmanager",
"icon_url": "https://example.com/alertmanager-small-icon.png"
}
}
]
}
{{ end }}
5.7 移动端通知
Telegram 配置
yaml
receivers:
- name: 'telegram-notifications'
webhook_configs:
- url: 'https://api.telegram.org/botYOUR_BOT_TOKEN/sendMessage'
send_resolved: true
http_headers:
'Content-Type': 'application/json'
# Telegram 消息格式
Telegram 消息模板
go
{{/* telegram.tmpl */}}
{{ define "telegram.message" }}
{
"chat_id": "YOUR_CHAT_ID",
"parse_mode": "Markdown",
"text": "{{ if eq .Status \"firing\" }}🚨 *ALERT FIRING*{{ else }}✅ *ALERT RESOLVED*{{ end }}\n\n*Alert Group:* {{ .GroupLabels.alertname }}\n*Environment:* {{ .GroupLabels.environment | default \"Unknown\" }}\n*Cluster:* {{ .GroupLabels.cluster | default \"Unknown\" }}\n*Severity:* {{ .GroupLabels.severity | default \"Unknown\" }}\n*Total Alerts:* {{ len .Alerts }}\n\n{{ range .Alerts }}---\n*Alert:* {{ .Annotations.summary }}\n*Instance:* {{ .Labels.instance }}\n*Started:* {{ .StartsAt.Format \"2006-01-02 15:04:05\" }}\n{{ if .Annotations.description }}*Description:* {{ .Annotations.description }}\n{{ end }}{{ end }}\n[View in Alertmanager]({{ .ExternalURL }})",
"reply_markup": {
"inline_keyboard": [
[
{
"text": "🔍 View Details",
"url": "{{ .ExternalURL }}"
},
{
"text": "📊 Dashboard",
"url": "http://grafana.example.com/dashboard/alerts"
}
],
[
{
"text": "🔇 Create Silence",
"url": "{{ .ExternalURL }}/#/silences/new"
},
{
"text": "📖 Runbook",
"url": "http://runbooks.example.com/{{ .GroupLabels.alertname }}"
}
]
]
}
}
{{ end }}
Pushover 配置
yaml
receivers:
- name: 'pushover-notifications'
webhook_configs:
- url: 'https://api.pushover.net/1/messages.json'
send_resolved: true
http_headers:
'Content-Type': 'application/x-www-form-urlencoded'
# Pushover 需要特殊的数据格式
5.8 通知渠道最佳实践
多渠道通知策略
“`yaml
分层通知策略
route: receiver: ‘default’ routes: # 严重告警:多渠道通知 - match: severity: critical receiver: ‘critical-multi-channel’ continue: true # 警告告警:标准通知 - match: severity: warning receiver: ‘warning-standard’ # 信息告警:低优先级通知 - match: severity: info receiver: ‘info-low-priority’ receivers:
严重告警多渠道
name: ‘critical-multi-channel’ email_configs:
to: ‘oncall@example.com’ subject: ‘[CRITICAL] {{ .GroupLabels.alertname }}’ slack_configs:
channel: ‘#critical-alerts’ title: ‘🚨 CRITICAL: {{ .GroupLabels.alertname }}’ pagerduty_configs:
routing_key: ‘critical-pagerduty-key’ severity: ‘critical’ webhook_configs:
url: ‘http://sms-gateway.example.com/send’
发送短信通知
警告告警标准通知
name: ‘warning-standard’ email_configs:
to: ‘team@example.com’ subject: ‘[WARNING] {{ .GroupLabels.alertname }}’ slack_configs:
channel: ‘#alerts’ title: ‘⚠️ WARNING: {{ .GroupLabels.alertname }}’
信息告警低优先级
name: ‘info-low-priority’ email_configs:
to: ‘logs@example.com’ subject: ‘[INFO] {{ .GroupLabels.alertname }}’ “`
通知去重和限流
“`yaml
通过路由配置实现限流
route: receiver: ‘default’
全局限流配置
group_wait: 30s group_interval: 5m repeat_interval: 2h routes:
高频告警限流
match: frequency: high receiver: ‘high-frequency-alerts’ group_wait: 2m # 延长等待时间 group_interval: 15m # 降低发送频率 repeat_interval: 6h # 减少重复通知
低频告警快速响应
match: frequency: low receiver: ‘low-frequency-alerts’ group_wait: 10s # 快速响应 group_interval: 2m # 正常频率 repeat_interval: 1h # 正常重复 “`
通知内容优化
“`yaml
优化的通知模板
templates:
’/etc/alertmanager/templates/optimized.tmpl’ receivers:
name: ‘optimized-notifications’ email_configs:
to: ‘team@example.com’ subject: ‘{{ template “email.subject.optimized” . }}’ body: ‘{{ template “email.body.optimized” . }}’ slack_configs:
channel: ‘#alerts’ title: ‘{{ template “slack.title.optimized” . }}’ text: ‘{{ template “slack.text.optimized” . }}’ color: ‘{{ template “slack.color.optimized” . }}’ “`
通知测试和验证
“`bash #!/bin/bash
test-notifications.sh
ALERTMANAGER_URL=”http://localhost:9093” echo “=== 通知渠道测试 ===”
测试告警数据
test_alert=‘[ { “labels”: { “alertname”: “TestNotification”, “severity”: “warning”, “instance”: “test-server:9100”, “team”: “test”, “environment”: “test” }, “annotations”: { “summary”: “This is a test notification”, “description”: “Testing notification channels configuration” }, “startsAt”: “’$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ)‘” } ]’ echo “发送测试告警…” if curl -XPOST “$ALERTMANAGER_URL/api/v1/alerts” \ -H “Content-Type: application/json” \ -d “$test_alert”; then echo “✅ 测试告警发送成功” else echo “❌ 测试告警发送失败” exit 1 fi echo “\n等待通知发送…” sleep 30 echo “\n检查告警状态…” curl -s “$ALERTMANAGER_URL/api/v1/alerts” | jq ‘.data[] | select(.labels.alertname==“TestNotification”)’ echo “\n=== 测试完成 ===” echo “请检查各通知渠道是否收到测试消息” “`
本章小结
本章详细介绍了 Alertmanager 的各种通知渠道配置:
核心通知渠道
邮件通知:传统可靠的通知方式,支持 HTML 格式
Slack 集成:现代团队协作的首选通知方式
Webhook:灵活的自定义集成方案
PagerDuty:专业的事件管理和值班系统
企业通信工具:Teams、Discord 等企业级集成
移动端通知:Telegram、Pushover 等移动友好方案
配置要点
模板化:使用模板系统统一消息格式
多渠道:根据严重程度选择合适的通知渠道
限流控制:避免通知风暴影响系统性能
内容优化:提供有用的上下文信息和操作链接
最佳实践
分层通知:不同严重程度使用不同通知策略
测试验证:定期测试通知渠道的可用性
监控优化:持续优化通知内容和频率
安全考虑:保护通知渠道的认证信息
下一步学习
在下一章中,我们将学习告警抑制与静默功能,包括:
- 抑制规则的设计和配置
- 静默管理和自动化
- 告警降噪策略
- 维护窗口管理 — 下一章: 告警抑制与静默