Prometheus:Thanos部署与实践

Posted by zhangshun on September 27, 2020

Thanos部署与实践

1、安装prometheus-operator

1.1 本篇文章介绍的是在prometheus-operator基础上与thanos集成,首先需要安装prometheus-operator

1
2
3
4
5
git clone https://github.com/prometheus-operator/kube-prometheus.git
# 安装crd
kubectl apply -f kube-prometheus/manifests/setup/.
# 安装其他组件
kubectl apply -f kube-prometheus/manifests/.

具体安装操作参考Prometheus Operator手动安装

2、部署thanos

2.1 thanos包括以下几个组件:

  • Thanos Query: 实现了 Prometheus API,通过grpc协议去请求所有的store-api,然后将结果返回客户端。
  • Thanos Sidecar: 连接 Prometheus,暴露store-api提供给 Thanos Query 查询,并且每两小时将采集到的指标上传到对象存储,以供长期存储。
  • Thanos Store Gateway: 将对象存储的数据通过store-api暴露给 Thanos Query 去查询。
  • Thanos Ruler: 对监控数据进行评估和告警,还可以计算出新的监控数据,将这些新数据提供给 Thanos Query 查询并且/或者上传到对象存储,以供长期存储。
  • Thanos Compact: 将对象存储中的数据进行压缩和降低采样率,加速大时间区间监控数据查询的速度。

常用的组件是Thanos Query、Thanos Sidecar、Thanos Store Gateway。

Thanos Ruler是一个可选组件,原则上推荐尽量使用 Prometheus 自带的 rule 功能 (生成新指标+告警),这个功能需要一些 Prometheus 最新数据。

具体thanos架构解析参考Thanos架构详解

2.2 安装过程

2.2.1 修改prometheus CRD资源与thanos集成,创建sidecar,sidecar会每2小时向对象存储上传数据

prometheus-prometheus.yaml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  image: quay.io/prometheus/prometheus:v2.20.0
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.20.0
  thanos:	#添加thanos配置
    baseImage: quay.io/thanos/thanos
    version: v0.8.1
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objstore-config
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
          - ReadWriteMany
        storageClassName: "cfs-storageclass"
        resources:
          requests:
            storage: 10Gi

可以使用 kubectl explain prometheus.spec.thanos查看具体thanos的配置

2.2.2 创建thanos对象存储config,这里用的是腾讯云对象存储

thanos-config.yaml

type: COS
config:
  bucket: "tapm-thanos"
  region: "****************"
  app_id: "****************"
  secret_key: "****************"
  secret_id: "****************"

使用以下命令在monitoring名称空间内创建一个secret

1
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-config.yaml

2.2.3 创建thanos-query

thanos-query-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.15.0
  name: thanos-query
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 9090
    targetPort: http
  selector:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
thanos-query-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.15.0
  name: thanos-query
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.15.0
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-query
              namespaces:
              - monitoring
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - query
        - --log.level=info
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc.cluster.local
        image: quay.io/thanos/thanos:v0.15.0
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          periodSeconds: 30
        name: thanos-query
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 9090
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
      terminationGracePeriodSeconds: 120

2.2.4 创建thanos-Store Gateway

thanos-store-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.15.0
  name: thanos-store
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 10902
    targetPort: 10902
  selector:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
thanos-store-statefulSet.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    app.kubernetes.io/version: v0.15.0
  name: thanos-store
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: object-store-gateway
      app.kubernetes.io/instance: thanos-store
      app.kubernetes.io/name: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app.kubernetes.io/component: object-store-gateway
        app.kubernetes.io/instance: thanos-store
        app.kubernetes.io/name: thanos-store
        app.kubernetes.io/version: v0.15.0
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-store
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                  - thanos-store
              namespaces:
              - monitoring
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - store
        - --log.level=info
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: thanos.yaml
              name: thanos-objstore-config
        image: quay.io/thanos/thanos:v0.15.0
        livenessProbe:
          failureThreshold: 8
          httpGet:
            path: /-/healthy
            port: 10902
            scheme: HTTP
          periodSeconds: 30
        name: thanos-store
        ports:
        - containerPort: 10901
          name: grpc
        - containerPort: 10902
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: 10902
            scheme: HTTP
          periodSeconds: 5
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
      terminationGracePeriodSeconds: 120
      volumes: []
  volumeClaimTemplates:
  - metadata:
      labels:
        app.kubernetes.io/component: object-store-gateway
        app.kubernetes.io/instance: thanos-store
        app.kubernetes.io/name: thanos-store
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

2.2.5 修改grafana的数据源,地址改为thanos-query地址

grafana-dashboardDatasources.yaml

apiVersion: v1
data:
  datasources.yaml: ew0KICAgICJhcGlWZXJzaW9uIjogMSwNCiAgICAiZGF0YXNvdXJjZXMiOiBbDQogICAgICAgIHsNCiAgICAgICAgICAgICJhY2Nlc3MiOiAicHJv
eHkiLA0KICAgICAgICAgICAgImVkaXRhYmxlIjogZmFsc2UsDQogICAgICAgICAgICAibmFtZSI6ICJwcm9tZXRoZXVzIiwNCiAgICAgICAgICAgICJvcmdJZCI6IDEsDQogICAgICAgICAgICAidHlwZSI6ICJwcm9tZXRoZXVzIiwNCiAgICAgICAgICAgICJ1cmwiOiAiaHR0cDovL3RoYW5vcy1xdWVyeS5tb25pdG9yaW5nLnN2Yzo5MDkwIiwNCiAgICAgICAgICAgICJ2ZXJzaW9uIjogMQ0KICAgICAgICB9DQogICAgXQ0KfQ==kind: Secret
metadata:
  name: grafana-datasources
  namespace: monitoring
type: Opaque