k8s的监控方案

主要介绍k8s生态中的监控方案

方案选型

如果已存在完善的监控系统的话，推荐使用k8s原生的heapster，比较轻量，容易集成。

我选择的是prometheus, 它是比较完善的云平台级监控方案，继k8s之后同样已被列入云计算基金会项目, 除了具备heapster的能力之外，还支持监控广泛的应用(mysql, JMX, HAProxy等)和灵活的告警的能力，并具备多IDC federation的能力，兼容多种开源监控系统（StatsD, Ganglia, collectd, nagios等）。

本文主要参考

prometheus和heapster开发者之间的对话
CoreOS的blogMonitoring Kubernetes with Prometheus

下面分别介绍下两种方案

heapster

heapster的介绍:

通过向kubelet拉取stats的方式，可提供15分钟内的缓存供k8s的dashboard用，也支持第三方存储，如influxdb等，还具备REST API(经我实验，这个API还不完善缺少diskIO API).
heapster的监控范围

可监控的内容包括集群内的Container, Pod, Node 和 Namespace的性能或配置信息，目前container级别还不支持网络和硬盘信息，具体性能项如下

Metric Name	Description
cpu/limit	CPU hard limit in millicores.
cpu/node_capacity	Cpu capacity of a node.
cpu/node_allocatable	Cpu allocatable of a node.
cpu/node_reservation	Share of cpu that is reserved on the node allocatable.
cpu/node_utilization	CPU utilization as a share of node allocatable.
cpu/request	CPU request (the guaranteed amount of resources) in millicores.
cpu/usage	Cumulative CPU usage on all cores.
cpu/usage_rate	CPU usage on all cores in millicores.
filesystem/usage	Total number of bytes consumed on a filesystem.
filesystem/limit	The total size of filesystem in bytes.
filesystem/available	The number of available bytes remaining in a the filesystem
memory/limit	Memory hard limit in bytes.
memory/major_page_faults	Number of major page faults.
memory/major_page_faults_rate	Number of major page faults per second.
memory/node_capacity	Memory capacity of a node.
memory/node_allocatable	Memory allocatable of a node.
memory/node_reservation	Share of memory that is reserved on the node allocatable.
memory/node_utilization	Memory utilization as a share of memory allocatable.
memory/page_faults	Number of page faults.
memory/page_faults_rate	Number of page faults per second.
memory/request	Memory request (the guaranteed amount of resources) in bytes.
memory/usage	Total memory usage.
memory/working_set	Total working set usage. Working set is the memory being used and not easily dropped by the kernel.
network/rx	Cumulative number of bytes received over the network.
network/rx_errors	Cumulative number of errors while receiving over the network.
network/rx_errors_rate	Number of errors while receiving over the network per second.
network/rx_rate	Number of bytes received over the network per second.
network/tx	Cumulative number of bytes sent over the network
network/tx_errors	Cumulative number of errors while sending over the network
network/tx_errors_rate	Number of errors while sending over the network
network/tx_rate	Number of bytes sent over the network per second.
uptime	Number of milliseconds since the container was started.

Prometheus

Prometheus集成了数据采集，存储，异常告警多项功能，是一款一体化的完整方案。它针对大规模的集群环境设计了拉取式的数据采集方式、多维度数据存储格式以及服务发现等创新功能。

功能特点：

* 多维数据模型（有metric名称和键值对确定的时间序列）
* 灵活的查询语言
* 不依赖分布式存储
* 通过pull方式采集时间序列，通过http协议传输
* 支持通过中介网关的push时间序列的方式
* 监控数据通过服务或者静态配置来发现
* 支持多维度可视化分析和dashboard等

组件介绍：

这个生态里包含的组件，大多是可选的： * 核心prometheus server提供收集和存储时间序列数据 * 大量的client libraries来支持应用业务代码的探针 * 适用于短时任务的push gateway * 基于Rails/SQL语句的可视化分析 * 特殊用途的exporter（包括HAProxy、StatsD、Ganglia等） * 用于报警的alertmanager * 支持命令行查询的工具 * 其他工具大多数的组件都是用Go语言来完成的，使得它们方便构建和部署。

架构图：

架构图

Promethues直接或通过短期Jobs的中介网关拉取收集指标。它在本地存储所有抓取的数据样本，并对数据进行规则匹配检测，这样可以基于现有数据创建新的时间系列指标或生成警报。 PromDash或其他API使用者对收集的数据进行可视化。

引入Prometheus对k8s的影响

下图是Redhat研发人员的回答

promethue-affect-k8s

并不会对k8s产生太大的影响，其主要是通过api-server来发现需要监控的目标，然后会周期性的通过各个Node上kubelet来拉取数据。更详细的讨论见这里

部署Prometheus

下文是基于k8s-monitor项目来说的

Prometheus is an open-source monitoring solution that includes the gathering of metrics, their storage in an internal time series database as well as querying and alerting based on that data.

It offers a lot of integrations incl. Docker, Kubernetes, etc.

Prometheus can also visualize your data. However, in this recipe we include another open-source tool, Grafana, for the visualization part, as it offers a more powerful and flexible way to generate visuals and dashboards.

If you just want to get Prometheus and Grafana up and running you can deploy the whole recipe with a single command instead of going through all steps detailed out below:

kubectl create --filename manifests/

Deploying Prometheus

First, we need to create the configuration for our Prometheus. For this we use a Config Map, which we later mount into our Prometheus pod to configure it. This way we can change the configuration without having to redeploy Prometheus itself.

kubectl create --filename manifests/prometheus-core-configmap.yaml

Then, we create a service to be able to access Prometheus.

kubectl create --filename manifests/prometheus-core-service.yaml

Finally, we can deploy Prometheus itself.

kubectl create --filename manifests/prometheus-core-deployment.yaml

Further, we need the Prometheus Node Exporter deployed to each node. For this we use a Daemon Set and a fronting service for Prometheus to be able to access the node exporters.

kubectl create --filename manifests/prometheus-node-exporter-service.yaml
kubectl create --filename manifests/prometheus-node-exporter-daemonset.yaml

Wait a bit for all the pods to come up. Then Prometheus should be ready and running. We can check the Prometheus targets at https://mycluster.k8s.gigantic.io/api/v1/proxy/namespaces/default/services/prometheus/targets

Prometheus Targets

Deploying Alertmanager

we need to create the configuration for our Alertmanager. For this we use a Config Map, which we later mount into our Alertmanager pod to configure it. This way we can change the configuration without having to redeploy Alertmanager itself.

kubectl create --filename manifests/prometheus-alert-configmap.yaml

Then, we create a service to be able to access Alertmanager.

kubectl create --filename manifests/prometheus-alert-service.yaml

Finally, we can deploy Alertmanager itself.

kubectl create --filename manifests/prometheus-alert-deployment.yaml

Wait a bit for all the pods to come up. Then Alertmanager should be ready and running. We can check the Alertmanager targets at https://mycluster.k8s.gigantic.io/api/v1/proxy/namespaces/default/services/alertmanager/

Alertmanager

Deploying Grafana

Now that we have Prometheus up and running we can deploy Grafana to have a nicer frontend for our metrics.

Again, we create a service to be able to access Grafana and a deployment to manage the pods.

kubectl create --filename manifests/grafana-services.yaml
kubectl create --filename manifests/grafana-deployment.yaml

Wait a bit for Grafana to come up. Then you can access Grafana at https://mycluster.k8s.gigantic.io/api/v1/proxy/namespaces/default/services/grafana/

Setting Up Grafana

TLDR: If you don't want to go through all the manual steps below you can let the following job use the API to configure Grafana to a similar state.

kubectl create --filename manifests/grafana-import-dashboards-job.yaml

Once we're in Grafana we need to first configure Prometheus as a data source.

Grafana UI / Data Sources / Add data source
- Name: prometheus
- Type: Prometheus
- Url: http://prometheus:9090
- Add

Grafana Datasource

Then go to the Dashboards tab and import the Prometheus Stats dashboard, which shows the status of Prometheus itself.

Grafana Datasource Dashboard

You can check it out to see how your Prometheus is doing.

Grafana Datasource Dashboard

Last, but not least we can import a sample Kubernetes cluster monitoring dashboard, to get a first overview over our cluster metrics.

Grafana UI / Dashboards / Import
- Grafana.net Dashboard: https://grafana.net/dashboards/597
- Load
- Prometheus: prometheus
- Save & Open

Grafana Import Dashboard

Voilá. You have a nice first dashboard with metrics of your Kubernetes cluster.

Grafana Import Dashboard

Next Steps

Next, you should get into the Grafana and Prometheus documentations to get to know the tools and either build your own dashboards or extend the samples from above.

You can also check out grafana.net for some more example dashboards and plugins.

More Alertmanager documentations in here