五 kubernetes搭建prometheusAlertmanager告警( 二 )


这里与原文件不同的是数据目录挂载改为阿里云NAS、以及增加模板文件挂载
端口暴露
-.yaml

五  kubernetes搭建prometheusAlertmanager告警

文章插图
apiVersion: v1kind: Servicemetadata:name: alertmanagernamespace: kube-systemlabels:kubernetes.io/cluster-service: "true"addonmanager.kubernetes.io/mode: Reconcilekubernetes.io/name: "Alertmanager"spec:ports:- name: httpport: 80protocol: TCPtargetPort: 9093selector:k8s-app: alertmanagertype: "ClusterIP"
这里可以增加node类型,然后使用ip+端口9093的形式访问web页面
我这里使用的是
配置代理
-.yaml
apiVersion: extensions/v1beta1kind: Ingressmetadata:name: alertmanager-ingressnamespace: kube-systemannotations:kubernetes.io/ingress.class: "nginx"nginx.ingress.kubernetes.io/affinity: "cookie"nginx.ingress.kubernetes.io/affinity-mode: "persistent"nginx.ingress.kubernetes.io/session-cookie-name: "route"spec:rules:- host: uatalert.xxx.comhttp:paths:- backend:serviceName: alertmanagerservicePort: 9093path: /pathType: ImplementationSpecific
我这个规则没写SLB的ip地址,因为会自动配置上
绑定hosts后访问出现页面
此时告警插件就安装完成了,下面还需要配置告警以及连接
配置告警
-rules.yaml
apiVersion: v1kind: ConfigMapmetadata:name: prometheus-rulesnamespace: kube-systemdata:# 通用角色general.rules: |groups:- name: general.rulesrules:- alert: InstanceDownexpr: up == 0for: 1mlabels:severity: error annotations:summary: "Instance {{ $labels.instance }} 停止工作"description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."# Node对所有资源的监控node.rules: |groups:- name: node.rulesrules:- alert: NodeFilesystemUsageexpr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80 for: 1mlabels:severity: warning annotations:summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})"- alert: NodeMemoryUsageexpr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80for: 1mlabels:severity: warningannotations:summary: "Instance {{ $labels.instance }} 内存使用率过高"description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})"- alert: NodeCPUUsageexpr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60for: 1mlabels:severity: warningannotations:summary: "Instance {{ $labels.instance }} CPU使用率过高"description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"# 对Pod监控pod.rules: |groups:- name: pod.rulesrules:- alert: PodRestartNumberexpr: floor(delta(kube_pod_container_status_restarts_total{namespace="uat"}[3m]) != 0)for: 1mlabels:severity: warningannotations:#summary: "pod {{ $labels.pod }} Pod发生重启"description: "Pod: {{ $labels.pod }}发生重启"
这里注意一下告警规则是不是匹配你的,其他的都还好,对pod监控这里用的是是否发生重启,且策略里指定了命名空间,需要改成你的或者去掉 。
挂载告警配置以及连接
-.yaml
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/apiVersion: v1kind: ConfigMapmetadata:name: prometheus-confignamespace: kube-systemlabels:kubernetes.io/cluster-service: "true"addonmanager.kubernetes.io/mode: EnsureExistsdata:prometheus.yml: |scrape_configs:- job_name: prometheusstatic_configs:- targets:- localhost:9090prometheus.yml: |# 配置采集目标scrape_configs:- job_name: kubernetes-nodesstatic_configs:- targets:# 采集自身- 172.16.60.9:9100- 172.16.60.10:9100- job_name: kubernetes-apiserverskubernetes_sd_configs:- role: endpointsrelabel_configs:- action: keepregex: default;kubernetes;httpssource_labels:- __meta_kubernetes_namespace- __meta_kubernetes_service_name- __meta_kubernetes_endpoint_port_namescheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token- job_name: kubernetes-nodes-kubeletkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token- job_name: kubernetes-nodes-cadvisorkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __metrics_path__replacement: /metrics/cadvisorscheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token- job_name: kubernetes-service-endpointskubernetes_sd_configs:- role: endpointsrelabel_configs:- action: keepregex: truesource_labels:- __meta_kubernetes_service_annotation_prometheus_io_scrape- action: replaceregex: (https?)source_labels:- __meta_kubernetes_service_annotation_prometheus_io_schemetarget_label: __scheme__- action: replaceregex: (.+)source_labels:- __meta_kubernetes_service_annotation_prometheus_io_pathtarget_label: __metrics_path__- action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2source_labels:- __address__- __meta_kubernetes_service_annotation_prometheus_io_porttarget_label: __address__- action: labelmapregex: __meta_kubernetes_service_label_(.+)- action: replacesource_labels:- __meta_kubernetes_namespacetarget_label: kubernetes_namespace- action: replacesource_labels:- __meta_kubernetes_service_nametarget_label: kubernetes_name- job_name: kubernetes-serviceskubernetes_sd_configs:- role: servicemetrics_path: /probeparams:module:- http_2xxrelabel_configs:- action: keepregex: truesource_labels:- __meta_kubernetes_service_annotation_prometheus_io_probe- source_labels:- __address__target_label: __param_target- replacement: blackboxtarget_label: __address__- source_labels:- __param_targettarget_label: instance- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels:- __meta_kubernetes_namespacetarget_label: kubernetes_namespace- source_labels:- __meta_kubernetes_service_nametarget_label: kubernetes_name- job_name: kubernetes-podskubernetes_sd_configs:- role: podrelabel_configs:- action: keepregex: truesource_labels:- __meta_kubernetes_pod_annotation_prometheus_io_scrape- action: replaceregex: (.+)source_labels:- __meta_kubernetes_pod_annotation_prometheus_io_pathtarget_label: __metrics_path__- action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2source_labels:- __address__- __meta_kubernetes_pod_annotation_prometheus_io_porttarget_label: __address__- action: labelmapregex: __meta_kubernetes_pod_label_(.+)- action: replacesource_labels:- __meta_kubernetes_namespacetarget_label: kubernetes_namespace- action: replacesource_labels:- __meta_kubernetes_pod_nametarget_label: kubernetes_pod_namealerting:# 告警配置文件alertmanagers:# 修改:使用静态绑定- static_configs:# 修改:targets、指定地址与端口- targets:- alertmanager:80# 添加指定告警文件rule_files:- /etc/config/rules/*.rules#alerting:#alertmanagers:#- kubernetes_sd_configs:#- role: pod#tls_config:#ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt#bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token#relabel_configs:#- source_labels: [__meta_kubernetes_namespace]#regex: kube-system#action: keep#- source_labels: [__meta_kubernetes_pod_label_k8s_app]#regex: alertmanager#action: keep#- source_labels: [__meta_kubernetes_pod_container_port_number]#regex:#action: drop