Home | 簡體中文 | 繁體中文 | 雜文 | 知乎專欄 | Github | OSChina 博客 | 雲社區 | 雲棲社區 | Facebook | Linkedin | 視頻教程 | 打賞(Donations) | About
知乎專欄多維度架構 微信號 netkiller-ebook | QQ群:128659835 請註明“讀者”

1.3. prometheus 配置

1.3.1. rules 規則配置

prometheus.yml 配置檔案

			
rule_files:
  - "rules/node.yml"	# 載入單個配置檔案
  - "rules/*.rules"		# 通過通配符載入檔案
			
			

prometheus 支持兩種 rules

  • recording rules
  • alerting rules

1.3.1.1. recording rules

			
groups:
- name: cpu-node
  rules:
  - record: job_instance_mode:node_cpu_seconds:avg_rate5m
    expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))			
			
				

1.3.1.2. alerting rules

			
groups:
- name: example
  rules:

  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"