關於Prometheus + Spring Boot 應用監控的問題

Posted on 2021-03-06 by WalkonNet

1. Prometheus是什麼

Prometheus是一個具有活躍生態系統的開源系統監控和告警工具包。一言以蔽之，它是一套開源監控解決方案。

Prometheus主要特性：

多維數據模型，其中包含由指標名稱和鍵/值對標識的時間序列數據
PromQL，一種靈活的查詢語言
不依賴分佈式存儲；單服務器節點是自治的
時間序列收集通過HTTP上的pull模型進行
通過中間網關支持推送（push）時間序列
通過服務發現或靜態配置發現目標
支持多種模式的圖形和儀表盤

為什麼用pull（拉取）而不用push（推送）呢？

因為，pull有以下優勢：

進行更改時，可以在筆記本電腦上運行監控
可以更輕松地判斷目標是否下線
可以手動轉到目標並使用Web瀏覽器檢查其運行狀況

目標暴露HTTP端點，Prometheus服務端通過HTTP主動拉取數據。既然是服務端自己主動向目標拉取數據，那麼服務端運行在本地（我們自己的電腦上）也是可以的，隻要能訪問目標端點即可，同時就像心跳檢測一樣可以判斷目標是否下線，還有，服務端自己主動拉取，那麼想拉取誰的數據就拉取誰的數據，因而可以隨意切換拉取目標。

回想一下Skywalking是怎麼做的，SkyWalking有客戶端和服務端，需要在目標服務上安裝探針（agent），探針采集目標服務的指標數據，上報給服務端OAP服務，這個對目標有一定的侵入性，不過可以接受。Prometheus不需要探針，可以借助push gateway來實現push效果。

對瞭，有個名詞要先說清楚，metrics （譯：度量，指標），個人更傾向於把它翻譯成指標，後面說指標就是metrics

2. 基本概念

2.1. 數據模型

Prometheus基本上將所有數據存儲為時間序列：具有時間戳的值流，它們屬於同一個指標和同一組標記的維度。除瞭存儲的時間序列外，Prometheus還可以生成臨時派生的時間序列作為查詢的結果。

Metric names and labels

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

每個時間序列都由它的指標名稱和稱為標簽的可選鍵/值對唯一標識。

樣本構成實際的時間序列數據。每個樣本包括：

一個64位的浮點值
一個毫秒時間戳

給定指標名稱和一組標簽，時間序列通常使用這種符號來標識：

<metric name>{<label name>=<label value>, ...}

例如，有一個時間序列，指標名稱是api_http_requests_total，標簽有method=”POST”和handler=”/messages”，那麼它可能被表示成這樣：

api_http_requests_total{method="POST", handler="/messages"}

2.2. 指標類型

Counter

counter是一個累積量度，代表一個單調遞增的計數器，其值隻能增加或在重新啟動時重置為零。例如，可以使用計數器來表示已服務請求數，已完成任務或錯誤的數量。

不要使用計數器來顯示可以減小的值。例如，請勿對當前正在運行的進程數使用計數器，代替的應該使用量規。

Gauge

量規是一種指標，代表可以任意上下波動的單個數值。

量規通常用於測量值，例如溫度或當前內存使用量，還用於可能上升和下降的“計數”，例如並發請求數。

Histogram

直方圖對觀察結果（通常是請求持續時間或響應大小）進行抽樣，並在可配置的桶中對它們進行計數。它還提供瞭所有觀測值的總和。

一個基礎指標名稱為<basename>的直方圖在抓取期間會暴露多個時間序列：

觀察桶的累積計數器，表示為 <basename>_bucket{le=”<upper inclusive bound>”}
所有觀測值的總和，表示為 <basename>_sum
觀察到的事件數量，表示為 <basename>_count

Summary

與直方圖類似，摘要對觀察結果（通常是請求持續時間和響應大小等內容）進行抽樣分析。雖然它還提供瞭觀測值的總數和所有觀測值的總和，但它可以計算滑動時間窗口內的可配置分位數。

一個基礎指標名稱為<basename>的摘要在抓取期間暴露多個時間序列:

觀察桶的累積計數器，表示為 <basename>_bucket{le=”<upper inclusive bound>”}
所有觀測值的總和，表示為 <basename>_sum
觀察到的事件數量，表示為 <basename>_count

2.3. 作業和實例

在Prometheus的術語中，可以抓取的端點稱為實例，通常對應於單個進程。具有相同目的的實例集合，稱為作業。

例如，一個作業有四個實例：

job: api-server
instance 1: 1.2.3.4:5670
instance 2: 1.2.3.4:5671
instance 3: 5.6.7.8:5670
instance 4: 5.6.7.8:5671

當Prometheus抓取目標時，它會自動在抓取的時間序列上附加一些標簽，以識別被抓取的目標：

job：目標所屬的已配置的作業名稱
instance：被抓取的目標URL的<host>:<port>部分

3. 安裝與配置

Prometheus通過抓取指標HTTP端點從目標收集指標。由於Prometheus以相同的方式暴露自己的數據，因此它也可以抓取並監視其自身的健康狀況。

默認情況下，不用更改配置，直接運行就可以抓取prometheus自身的健康狀況數據

# Start Prometheus.
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path)

./prometheus --config.file=prometheus.yml

直接訪問 localhost:9090

訪問 localhost:9090/metrics 可以查看各項指標

舉個例子

輸入以下表達式，點“Execute”，可以看到以下效果

prometheus_target_interval_length_seconds

這應該返回多個不同的時間序列（以及每個序列的最新值），每個序列的指標名稱均為prometheus_target_interval_length_seconds，但具有不同的標簽。

這個是以圖形化的方式展示指標，通過localhost:9090/metrics查看也是一樣的

如果我們隻對99%的延遲感興趣，我們可以使用以下查詢：

prometheus_target_interval_length_seconds{quantile="0.99"}

為瞭計算返回的時間序列數，查詢應該這樣寫：

count(prometheus_target_interval_length_seconds)

接下來，讓我們利用Node Exporter來多添加幾個目標：

tar -xzvf node_exporter-*.*.tar.gz
cd node_exporter-*.*

# Start 3 example targets in separate terminals:
./node_exporter --web.listen-address 127.0.0.1:8080
./node_exporter --web.listen-address 127.0.0.1:8081
./node_exporter --web.listen-address 127.0.0.1:8082

接下來，配置Prometheus來抓取這三個新目標

首先，定義一個名為’node’的作業，這個作業負責從這三個目標端點抓取數據。假設，想象前兩個端點是生產環境的，另一個是非生產環境的，為瞭以示區別，我們將其打上兩個不同的標簽。在本示例中，我們將group=”production”標簽添加到第一個目標組，同時將group=”canary”添加到第二個目標。

scrape_configs:
 - job_name:  'node'

 # Override the global default and scrape targets from this job every 5 seconds.
 scrape_interval: 5s

 static_configs:
  - targets: ['localhost:8080', 'localhost:8081']
  labels:
   group: 'production'

  - targets: ['localhost:8082']
  labels:
   group: 'canary'

3.1. 配置

為瞭查看所有的命令行參數，運行如下命令

./prometheus -h

配置文件是YAML格式的，可以使用 –config.file參數指定

配置文件的主要結構如下：

global:
 # How frequently to scrape targets by default.
 [ scrape_interval: <duration> | default = 1m ]

 # How long until a scrape request times out.
 [ scrape_timeout: <duration> | default = 10s ]

 # How frequently to evaluate rules.
 [ evaluation_interval: <duration> | default = 1m ]

 # The labels to add to any time series or alerts when communicating with
 # external systems (federation, remote storage, Alertmanager).
 external_labels:
 [ <labelname>: <labelvalue> ... ]

 # File to which PromQL queries are logged.
 # Reloading the configuration will reopen the file.
 [ query_log_file: <string> ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
 [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
 [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
 alert_relabel_configs:
 [ - <relabel_config> ... ]
 alertmanagers:
 [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
 [ - <remote_write> ... ]

# Settings related to the remote read feature.
remote_read:
 [ - <remote_read> ... ]

4. 抓取 Spring Boot 應用

Prometheus希望抓取或輪詢單個應用程序實例以獲取指標。 Spring Boot在 /actuator/prometheus 提供瞭一個actuator端點，以適當的格式提供Prometheus抓取。

為瞭以Prometheus服務器可以抓取的格式公開指標，需要依賴 micrometer-registry-prometheus

<dependency>
 <groupId>io.micrometer</groupId>
 <artifactId>micrometer-registry-prometheus</artifactId>
 <version>1.6.4</version>
</dependency>

下面是一個示例 prometheus.yml

scrape_configs:
 - job_name: 'spring'
 metrics_path: '/actuator/prometheus'
 static_configs:
  - targets: ['HOST:PORT']

接下來，創建一個項目，名為prometheus-example

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>2.4.3</version>
  <relativePath/> <!-- lookup parent from repository -->
 </parent>
 <groupId>com.cjs.example</groupId>
 <artifactId>prometheus-example</artifactId>
 <version>0.0.1-SNAPSHOT</version>
 <name>prometheus-example</name>
 <description>Demo project for Spring Boot</description>
 <properties>
  <java.version>1.8</java.version>
 </properties>
 <dependencies>
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <dependency>
   <groupId>io.micrometer</groupId>
   <artifactId>micrometer-registry-prometheus</artifactId>
   <scope>runtime</scope>
  </dependency>
 </dependencies>

 <build>
  <plugins>
   <plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
   </plugin>
  </plugins>
 </build>

</project>

application.yml

spring:
 application:
 name: prometheus-example
management:
 endpoints:
 web:
  exposure:
  include: "*"
 metrics:
 tags:
  application: ${spring.application.name}

這句別忘瞭：management.metrics.tags.application=${spring.application.name}

Spring BootActuator 默認的端點很多，詳見

https://docs.spring.io/spring-boot/docs/2.4.3/reference/html/production-ready-features.html

啟動項目，瀏覽器訪問/actuator/prometheus 端點

配置Prometheus抓取該應用

scrape_configs:
 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'prometheus'
 # metrics_path defaults to '/metrics'
 # scheme defaults to 'http'.
 static_configs:
 - targets: ['localhost:9090']
 
 - job_name: 'springboot-prometheus'
 metrics_path: '/actuator/prometheus'
 static_configs:
  - targets: ['192.168.100.93:8080']

重啟服務

./prometheus --config.file=prometheus.yml

4.1. Grafana

https://grafana.com/docs/

https://grafana.com/tutorials/

下載&解壓

wget https://dl.grafana.com/oss/release/grafana-7.4.3.linux-amd64.tar.gz
tar -zxvf grafana-7.4.3.linux-amd64.tar.gz

啟動

./bin/grafana-server web

瀏覽器訪問 http://localhost:3000

默認賬號是 admin/admin

首次登陸後我們將密碼改成admin1234

先配置一個數據源，一會兒添加儀表盤的時候要選擇數據源的

Grafana官方提供瞭很多模板，我們可以直接使用

首先要找到我們想要的模板

比如，我們這裡隨便選瞭一個模板

可以直接將模板JSON文件下載下來導入，也可以直接輸入模板ID加載，這裡我們直接輸入模板ID

立竿見影，馬上就看到漂亮的展示界面瞭

我們再添加一個DashBoard （ID：12856）

到此這篇關於Prometheus + Spring Boot 應用監控的文章就介紹到這瞭,更多相關Prometheus + Spring Boot 應用監控內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet！

關於Prometheus + Spring Boot 應用監控的問題

1. Prometheus是什麼

2. 基本概念

2.1. 數據模型

2.2. 指標類型

2.3. 作業和實例

3. 安裝與配置

3.1. 配置

4. 抓取 Spring Boot 應用

4.1. Grafana

推薦閱讀：

發佈留言取消回覆

近期文章

1. Prometheus是什麼

2. 基本概念

2.1. 數據模型

2.2. 指標類型

2.3. 作業和實例

3. 安裝與配置

3.1. 配置

4. 抓取 Spring Boot 應用

4.1. Grafana

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆