五 kubernetes搭建prometheusAlertmanager告警( 三 )


最下方注释处是原来的告警配置,现在改为:
alerting:# 告警配置文件alertmanagers:# 修改:使用静态绑定- static_configs:# 修改:targets、指定地址与端口- targets:- alertmanager:80
指定告警配置文件
# 添加指定告警文件rule_files:- /etc/config/rules/*.rules
挂载告警文件
-.yaml
apiVersion: apps/v1kind: StatefulSetmetadata:name: prometheusnamespace: kube-systemlabels:k8s-app: prometheuskubernetes.io/cluster-service: "true"addonmanager.kubernetes.io/mode: Reconcileversion: v2.2.1spec:serviceName: "prometheus"replicas: 1podManagementPolicy: "Parallel"updateStrategy:type: "RollingUpdate"selector:matchLabels:k8s-app: prometheustemplate:metadata:labels:k8s-app: prometheusspec:priorityClassName: system-cluster-criticalserviceAccountName: prometheusinitContainers:- name: "init-chown-data"image: "busybox:latest"imagePullPolicy: "IfNotPresent"command: ["chown", "-R", "65534:65534", "/data"]volumeMounts:- name: prometheus-datamountPath: /datasubPath: ""containers:- name: prometheus-server-configmap-reloadimage: "jimmidyson/configmap-reload:v0.1"imagePullPolicy: "IfNotPresent"args:- --volume-dir=/etc/config- --webhook-url=http://localhost:9090/-/reloadvolumeMounts:- name: config-volumemountPath: /etc/configreadOnly: trueresources:limits:cpu: 10mmemory: 10Mirequests:cpu: 10mmemory: 10Mi- name: prometheus-serverimage: "prom/prometheus:v2.24.0"imagePullPolicy: "IfNotPresent"args:- --config.file=/etc/config/prometheus.yml- --storage.tsdb.path=/data- --web.console.libraries=/etc/prometheus/console_libraries- --web.console.templates=/etc/prometheus/consoles- --web.enable-lifecycleports:- containerPort: 9090readinessProbe:httpGet:path: /-/readyport: 9090initialDelaySeconds: 60timeoutSeconds: 30livenessProbe:httpGet:path: /-/healthyport: 9090initialDelaySeconds: 60timeoutSeconds: 30# based on 10 running nodes with 30 pods eachresources:limits:cpu: 1000mmemory: 1000Mirequests:cpu: 200mmemory: 1000MivolumeMounts:- name: config-volumemountPath: /etc/config- name: prometheus-datamountPath: /datasubPath: ""# 挂载告警文件- name: prometheus-rulesmountPath: /etc/config/rulessubPath: ""terminationGracePeriodSeconds: 300volumes:- name: config-volumeconfigMap:name: prometheus-config- name: prometheus-datanfs:server: xxxxxxxxxxxxx.cn-hangzhou.nas.aliyuncs.compath: /uat-pod-log/prometheus# 挂载告警配置文件- name: prometheus-rulesconfigMap:name: prometheus-rules
修改的位置是:
# 挂载告警文件- name: prometheus-rulesmountPath: /etc/config/rulessubPath: ""# 挂载告警配置文件- name: prometheus-rulesconfigMap:name: prometheus-rules
重新加载和-后访问ui页面
可以看到告警策略已出现,并且我这里还有个告警,正好可以验证一下发送是否成功
可以看到企业微信上和上都有了显示,注意时间+8
静默
可以设置某个告警,让其在那些时间段不发送告警
应用场景就是必须你今天晚上有大规模调整,那那个时间段可以搞成静默,不过也无所谓吧,反正你都知道要发生事情,恢复了看一眼告警是不是都恢复了就完了呗 。
我在搭建过程中发现了几个问题
刚开始使用的是prom/:v2.30.0 版本
【五kubernetes搭建prometheusAlertmanager告警】 使用的是prom/:v0.14.0
发现连接不上alert,报错404
查看日志看到连接的路径里有 v2
外部通过curl确实是报404,然后我curl的时候改成了v1,就可以了
然后升级alert版本后启动又有问题,最后我把降级到prom/:v2.24.0就好了