rancher添加prometheus-operator的告警配置

2021-01-04

首先查询crd资源,需要指定prometheus-operator所在的namespace

kubectl get PrometheusRule  -n cattle-prometheus

crd.png

把对应的crd资源导出yaml

kubectl get PrometheusRule c-55xkf  -n cattle-prometheus -o yaml > rules.yml

image.png

把文件作为模板,修改rules文件后再次apply到集群即可

kubectl apply -f rules.yml

image.png

示例

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  annotations:
  labels:
    cattle.io/creator: norman
    source: rancher-alert
  name: mycrd
  namespace: cattle-prometheus
spec:
  groups:
  - name: c-55xkf:event-alert
    rules:
    - alert: container restart
      annotations:
        current_value: 'The container {{ $labels.container }} in pod {{ $labels.pod }} has restarted at least {{ humanize $value}} times in the last hour on instance {{ $labels.instance }}.'
      expr: delta(kube_pod_container_status_restarts_total[20m])>0
      for: 10s
      labels:
        alert_name: container restart
        alert_type: metric
        cluster_name: 'test-cluster (ID: c-55xkf)'
        comparison: greater than
        duration: 10s
        expression: delta(kube_pod_container_status_restarts_total[20m])>0
        group_id: c-55xkf:event-alert
        rule_id: c-55xkf:event-alert_car-kqqpn
        severity: critical
        threshold_value: "0"

image.png

注意:

  1. 要修改name,prometheus会在rules对应目录下生成一个对应的新配置文件
  2. 每次更新前获取yaml中的 resourceVersion
  3. 如果新创建了告警组,下面需要至少保留一条告警配置,否则使用这个group_id创建的alerts rules不生效

标题:rancher添加prometheus-operator的告警配置
作者:fish2018
地址:http://www.devopser.org/articles/2020/12/22/1608622948424.html