Alerts
We have some generic alerts provided by hmpps-helm-charts for common AWS resources such as databases, queues and Kube pods.
For application specific alerts we often use Application Insights to produce alerts based on logging and telemetry.
AWS Alerts
Applications based off the Kotlin template come with the generic-prometheus-alerts chart which provides various alerts for AWS resources.
To enable these alerts there is a little configuration to add to your values.yaml
file. See the Quick Start guide for more details and how to get alerts sent to your Slack channel.
Kubernetes pod alerts
Alert when something has gone wrong with one of your pods, such as if one fails to start.
Ingress alerts
These look out for problems with network traffic, such as responding with 5xx errors.
RDS alerts
A couple of alerts to warn about CPU usage and connection errors.
SNS alerts
Tells you when a topic stops publishing messages.
SQS alerts
Alerts on message queues when messages are not being processed.
Application Insights Alerts
These are alerts that can be fired based upon the results of an Application Insights Log Analytics query. A simple but effective alerting solution is to write some custom telemetry to App Insights and create an alert that fires when the telemetry is present. There are lots of examples of alerts for existing applications.
To configure the Slack webhook for the alerts you need to create a new Slack app on MOJ Slack, ask in #digital_it_forum for the app to be registered, then ask in #ask-digital-studio-ops for the webhook to be added to the Azure secret manager. This secret can be accessed via terraform for use in your alert terraform.