Google announced the general availability of Kubernetes control plane metrics in Google Kubernetes Engine (GKE). These metrics are directly integrated with Google Cloud Monitoring to provide a one-stop solution for troubleshooting GKE related issues. Integration with third-party observability tools is also possible through the Cloud Monitoring API.
Although GKE fully manages the Kubernetes control plane, the newly exposed metrics can be useful for troubleshooting. For example, understanding API server health can be aided by a combination of metrics. This includes the use
apiserver_request_duration_seconds to track API server load, number of requests returning errors, and request response latency.
The new measures available can also help solve planning problems. The following metrics can all be used to help determine why pods aren’t moving from pending to scheduled:
– Advertising –
scheduler_pending_pods scheduler_schedule_attempts_total scheduler_preemption_attempts_total scheduler_preemption_victims scheduler_scheduling_attempt_duration_seconds
An increase in the number of backlogs may indicate a scheduling issue that may be caused by an underlying resource issue.
The new metrics are all displayed in the Kubernetes Engine part of the Cloud Console. This is available on the Observability tab under Control Plane.
With this integration, it is possible to create alert strategies in Cloud Altering on these new available metrics. Continuing with the planning issues described above, an alert could be created on both
scheduler_pending_pods. The first upward metric could indicate that higher priority pods are displacing other pods from the schedule. However, the two increasing metrics could mean that there are not enough resources available for the pods.
When enabled, metrics are collected using the Google Cloud Managed Service icon for Prometheus. Metrics will be sent to Cloud Monitoring in the same GCP project as the Kubernetes cluster. These metrics can then be queried using PromQL through the Cloud Monitoring API and Metrics Explorer. Additionally, any third-party observability tool could ingest the metrics using the Cloud Monitoring API.
GKE clusters running on control plane version 1.23.6 and later can access metrics from the Kubernetes API Server, Scheduler, and Controller Manager. Note that these metrics are not available for GKE Clusters Autopilot. The following command can be used to enable metrics collection from the API Server, Scheduler, and Controller Manager:
gcloud container clusters update [CLUSTER_ID] --zone=[ZONE] --project=[PROJECT_ID] --monitoring=SYSTEM,API_SERVER,SCHEDULER,CONTROLLER_MANAGER
Metrics can also be configured through Terraform using the monitoring_config icon block.
Kubernetes Control Plane metrics are billed at the standard price for metrics ingested by the Google Cloud Managed Service for Prometheus. For more release details, please refer to the blog post.