prometheus Operator添加自定义监控
添加自定义监控
步骤:
- 第一步
建立一个ServiceMonitor对象,用于Prometheus添加监控项 - 第二步
为ServiceMonitor对象关联metrics数据接口的一个Service对象 - 第三步
确保Service对象可以正确获取到metrics数据
例一:自定义监控etcd
etcd证书配置
对于 etcd 集群一般情况下,为了安全都会开启 https 证书认证的方式,所以要想让 Prometheus 访问到 etcd 集群的监控数据,就需要提供相应的证书校验。
通过secret对象把etcd用到的证书保存到k8s集群中
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
将上面创建的 etcd-certs 对象配置到 prometheus 资源对象中
修改prometheus-prometheus.yaml添加secrets属性,完整如下
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
secrets:
- etcd-certs
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
kubectl apply -f prometheus-prometheus.yaml
可以进入容器查看证书已经加载
kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
ls /etc/prometheus/secrets/etcd-certs/
创建 ServiceMonitor
vi prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd-k8s
spec:
jobLabel: k8s-app
endpoints:
- port: port
interval: 30s
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
insecureSkipVerify: true
selector:
matchLabels:
k8s-app: etcd
namespaceSelector:
matchNames:
- kube-system
kubectl apply -f prometheus-serviceMonitorEtcd.yaml
上面我们在monitoring名称空间下面创建了名为etcd-k8s的ServiceMonitor对象,匹配kube-system这个命名空间下面的具有k8s-app=etcd这个label的Service,jobLabel表示用于检索job任务名称的标签,和前面不太一样的地方是endpoints属性的写法,配置上访问etcd的相关证书,endpoints属性下面可以配置很多抓取的参数,比如relabel、proxyUrl;tlsConfig表示用于配置抓取监控数据端点的tls认证,由于证书serverName和etcd中签发的可能不匹配,所以加上了insecureSkipVerify=true
创建 Service
ServiceMonitor 创建完成了,但是现在还没有关联的对应的Service对象
vi prometheus-etcdService.yaml
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2379
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 172.16.0.4
nodeName: etc-k8s01
- ip: 172.16.0.5
nodeName: etc-k8s02
- ip: 172.16.0.6
nodeName: etc-k8s03
ports:
- name: port
port: 2379
protocol: TCP
kubectl apply -f prometheus-etcdService.yaml
例二:自定义监控nginx
创建Nginx的deployment和service
vi nginx.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-demo
labels:
app: nginx-demo
spec:
replicas: 1
selector:
matchLabels:
app: nginx-demo
template:
metadata:
labels:
app: nginx-demo
spec:
containers:
- name: nginx-demo
image: billy98/nginx-prometheus-metrics:latest
ports:
- name: http-metrics
containerPort: 9527
- name: web
containerPort: 80
- name: test
containerPort: 1314
imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx-demo
name: nginx-demo
namespace: default
spec:
ports:
- name: http-metrics
port: 9527
protocol: TCP
targetPort: 9527
- name: web
port: 80
protocol: TCP
targetPort: 80
- name: test
port: 1314
protocol: TCP
targetPort: 1314
selector:
app: nginx-demo
type: ClusterIP
kubectl apply -f nginx.yaml
创建ServiceMonitor
vi nginx-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: nginx-demo
name: nginx-demo
namespace: monitoring
spec:
endpoints:
- interval: 15s
port: http-metrics
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: nginx-demo
kubectl apply -f nginx-servicemonitor.yaml
查看endpoints
[root@k8s03 ~]# kubectl get ep
NAME ENDPOINTS AGE
kubernetes 172.16.0.4:6443,172.16.0.5:6443,172.16.0.6:6443 2d
nginx-demo 192.168.236.136:9527,192.168.236.136:80,192.168.236.136:1314 31m
[root@k8s03 ~]# curl 192.168.236.136
hello world
[root@k8s03 ~]# curl 192.168.236.136:9527/metrics
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="active"} 3
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="waiting"} 2
nginx_http_connections{state="writing"} 1
# HELP nginx_http_request_bytes_sent Number of HTTP request bytes sent
# TYPE nginx_http_request_bytes_sent counter
nginx_http_request_bytes_sent{host="192.168.236.136"} 874567
nginx_http_request_bytes_sent{host="testservers"} 320
# HELP nginx_http_request_time HTTP request time
# TYPE nginx_http_request_time histogram
nginx_http_request_time_bucket{host="192.168.236.136",le="00.005"} 99
nginx_http_request_time_bucket{host="192.168.236.136",le="00.010"} 99
nginx_http_request_time_bucket{host="192.168.236.136",le="00.020"} 99
... ...
现在Prometheus的targets页面已经可以看到新加入的nginx-demo,在graph页面可以直接查询指标绘图
alertmanager添加自定义报警规则
在Prometheus Dashboard的Config页面可以看到关于AlertManager的配置
alerting:
alert_relabel_configs:
- separator: ;
regex: prometheus_replica
replacement: $1
action: labeldrop
alertmanagers:
- kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scheme: http
path_prefix: /
timeout: 10s
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: alertmanager-main
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: web
replacement: $1
action: keep
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
... ...
上面alertmanagers实例的配置我们可以看到是通过角色为endpoints的kubernetes的服务发现机制获取的,匹配的是服务名为alertmanager-main,端口名为web的Service服务
我们查看下alertmanager-main这个Service
[root@k8s03 manifests]# kubectl describe svc alertmanager-main -n monitoring
Name: alertmanager-main
Namespace: monitoring
Labels: alertmanager=main
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"alertmanager-main","namespace":"...
Selector: alertmanager=main,app=alertmanager
Type: NodePort
IP: 10.100.176.121
Port: web 9093/TCP
TargetPort: web/TCP
NodePort: web 31568/TCP
Endpoints: 192.168.236.134:9093,192.168.73.69:9093,192.168.73.71:9093
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>
可以看到服务名正是alertmanager-main,Port定义的名称也是web,符合上面的规则,所以Prometheus和AlertManager组件就正确关联上了。而对应的报警规则文件位于:/etc/prometheus/rules/prometheus-k8s-rulefiles-0/目录下面所有的YAML文件。
我们创建一个PrometheusRule资源对象后,会自动在上面的prometheus-k8s-rulefiles-0目录下面生成一个对应的
Prometheus资源对象里面有非常重要的一个属性ruleSelector,用来匹配rule规则的过滤器,要求匹配具有prometheus=k8s和role=alert-rules标签的PrometheusRule资源对象。
所以我们要想自定义一个报警规则,只需要创建一个具有prometheus=k8s和role=alert-rules标签的PrometheusRule对象就行了
vi prometheus-etcdRules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: etcd-rules
namespace: monitoring
spec:
groups:
- name: etcd
rules:
- alert: EtcdClusterUnavailable
annotations:
summary: etcd cluster small
description: If one more etcd peer goes down the cluster will be unavailable
expr: |
count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
for: 3m
labels:
severity: critical
kubectl apply -f prometheus-etcdRules.yaml
kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-etcd-rules.yaml
标题:prometheus Operator添加自定义监控
作者:fish2018
地址:https://www.devopser.org/articles/2019/08/21/1566379625905.html