Plainly Explained: Using Robusta & ChatGPT to Improve Alerting and Troubleshooting in Kubernetes
This is going to be broken down into fairly explicit instruction based on my first presentation at a local Kubernetes 757 Meetup group that Ryan Renn and I are the organizers for.
First things first, this is targeted at audiences who want to know when something in their Kubernetes cluster is "broken". Broken can mean many things, but let's consider pod statuses like "Error" and "CrashLoopBackOff". This also targets people who might be new to Kubernetes, have some understanding at a conceptual level, and can deploy workloads to clusters in some way. No judgement for how you deploy. The objective here is easier alerting faster and using commoditized tools to help us reduce mean-time-to-resolution.
PREREQUISITES:
You must have a Slack organization for which you have elevated/admin permissions
You must have python3, pip, Helm and google-cloud-sdk installed on your workstation
You must have an OpenAI account
If you do not have the above, go get them and come back. I do not advocate running any of this for the first time in a Live or Production environment. It will probably work, but I recommend testing this out in a Sandbox environment first.
Create a Kubernetes cluster. Doesn't matter how, really
Generate an OpenAI API key and document it for later
Install robusta-cli on your workstation
pip3 install -U robusta-cli --no-cache
4. Generate a config with Robusta. Most of the defaults are fine.
robusta gen-config
5. Configure the Slack Integration




6. Choose a channel to send alerts to

7. Don't configure MsTeams because it's MsTeams :)

8. Configure the Robusta UI Sink

9. Add your email

10. Add an organization name

11. Configure Prometheus

12. Read and accept the EULA

13. Answer the prompt for sending Exception Reports

14. Connect to your cluster (command is considering GKE cluster in GCP)
gcloud container clusters get-credentials my-special-cluster --zone us-central1-c --project bright-lighthouse-348293
15. Use Helm to add Robusta
helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
16. Use an editor to modify the generated_values.yaml file
BEFORE

AFTER - added chat_gpt_token at the top and playbooks at the bottom

globalConfig:
signing_key: signing_key_value
account_id: account_id_value
chat_gpt_token: chat_gpt_token_value
playbookRepos:
chatgpt_robusta_actions:
url: "https://github.com/robusta-dev/kubernetes-chatgpt-bot.git"
customPlaybooks:
# Add the 'Ask ChatGPT' button to all Prometheus alerts
- triggers:
- on_prometheus_alert: {}
actions:
- chat_gpt_enricher: {}
17. Install robusta on your cluster with the generated_values.yaml
helm install robusta robusta/robusta -f ./generated_values.yaml --set clusterName=my-special-cluster
18. Make sure the Robusta pods are on the cluster
kubectl get pods -A | grep robusta
19. Check the Robusta logs if you want to
robusta logs
20. Deploy a crashing pod to test
kubectl apply -f https://gist.githubusercontent.com/robusta-lab/283609047306dc1f05cf59806ade30b6/raw
21. Verify the pod you deployed is crashing
kubectl get pods -A
22. Trigger the alert if you're impatient
robusta playbooks trigger prometheus_alert alert_name=KubePodCrashLooping namespace=default pod_name=example-pod
23. Receive an alert in Slack that a pod is crashing and click the "Ask ChatGPT" button in Slack to get troubleshooting help.
Bonus: If you have a team which is at a different level of experience, fork the repo and modify chat_gpt.py as you require.
By the time I finished writing this, I realized this is much more involved than I thought, but the instructions are still fairly explicit. I think it's a neat tool, especially for people who don't know how or where to get started with Kubernetes. This solution is not perfect. ChatGPT will not solve all your problems, but it can make life easier by decreasing the time between alert and troubleshooting. If you like what you see, go check out https://robusta.dev.
