top of page

Plainly Explained: Using Robusta & ChatGPT to Improve Alerting and Troubleshooting in Kubernetes

This is going to be broken down into fairly explicit instruction based on my first presentation at a local Kubernetes 757 Meetup group that Ryan Renn and I are the organizers for.

First things first, this is targeted at audiences who want to know when something in their Kubernetes cluster is "broken". Broken can mean many things, but let's consider pod statuses like "Error" and "CrashLoopBackOff". This also targets people who might be new to Kubernetes, have some understanding at a conceptual level, and can deploy workloads to clusters in some way. No judgement for how you deploy. The objective here is easier alerting faster and using commoditized tools to help us reduce mean-time-to-resolution.


  1. You must have a Slack organization for which you have elevated/admin permissions

  2. You must have python3, pip, Helm and google-cloud-sdk installed on your workstation

  3. You must have an OpenAI account

If you do not have the above, go get them and come back. I do not advocate running any of this for the first time in a Live or Production environment. It will probably work, but I recommend testing this out in a Sandbox environment first.

  1. Create a Kubernetes cluster. Doesn't matter how, really

  2. Generate an OpenAI API key and document it for later

  3. Install robusta-cli on your workstation

pip3 install -U robusta-cli --no-cache

4. Generate a config with Robusta. Most of the defaults are fine.

robusta gen-config

5. Configure the Slack Integration

6. Choose a channel to send alerts to

7. Don't configure MsTeams because it's MsTeams :)

8. Configure the Robusta UI Sink

9. Add your email

10. Add an organization name

11. Configure Prometheus

12. Read and accept the EULA

13. Answer the prompt for sending Exception Reports

14. Connect to your cluster (command is considering GKE cluster in GCP)

gcloud container clusters get-credentials my-special-cluster --zone us-central1-c --project bright-lighthouse-348293

15. Use Helm to add Robusta

helm repo add robusta && helm repo update

16. Use an editor to modify the generated_values.yaml file


AFTER - added chat_gpt_token at the top and playbooks at the bottom

  signing_key: signing_key_value
  account_id: account_id_value
  chat_gpt_token: chat_gpt_token_value
    url: ""
# Add the 'Ask ChatGPT' button to all Prometheus alerts
- triggers:
  - on_prometheus_alert: {}
  - chat_gpt_enricher: {}

17. Install robusta on your cluster with the generated_values.yaml

helm install robusta robusta/robusta -f ./generated_values.yaml --set clusterName=my-special-cluster

18. Make sure the Robusta pods are on the cluster

kubectl get pods -A | grep robusta

19. Check the Robusta logs if you want to

robusta logs

20. Deploy a crashing pod to test

kubectl apply -f

21. Verify the pod you deployed is crashing

kubectl get pods -A

22. Trigger the alert if you're impatient

robusta playbooks trigger prometheus_alert alert_name=KubePodCrashLooping namespace=default pod_name=example-pod

23. Receive an alert in Slack that a pod is crashing and click the "Ask ChatGPT" button in Slack to get troubleshooting help.

Bonus: If you have a team which is at a different level of experience, fork the repo and modify as you require.

By the time I finished writing this, I realized this is much more involved than I thought, but the instructions are still fairly explicit. I think it's a neat tool, especially for people who don't know how or where to get started with Kubernetes. This solution is not perfect. ChatGPT will not solve all your problems, but it can make life easier by decreasing the time between alert and troubleshooting. If you like what you see, go check out



bottom of page