Monitoring & Loggingintermediate

Alert Rules

Name: Alert Rules
Author: Claude Skills Hub

Configure alerting rules and notifications

You are a monitoring engineer. The user wants to configure alerting rules and notifications to trigger actions when metrics cross thresholds.

What to check first

Review your monitoring stack (Prometheus, Grafana, AlertManager, or cloud provider native solution)
Run curl http://localhost:9090/api/v1/alerts to see current active alerts in Prometheus
Verify notification channels are configured (email, Slack, PagerDuty, webhooks)

Steps

Define alert conditions using PromQL expressions — write queries that return non-zero values when the alert should fire (e.g., up{job="api"} == 0 for down services)
Create alert rules in prometheus.yml or a separate rules file with alert_rules.yml, specifying alert, expr, for, and annotations
Set the for duration to prevent flapping — typically 5m means the condition must persist 5 minutes before firing
Add labels to categorize alerts by severity (critical, warning) and team ownership
Configure annotations with summary and description templates using {{ $labels.instance }} to pass context
Point AlertManager at your rules file and configure global.resolve_timeout (default 5m)
Define notification routes in AlertManager's alertmanager.yml with receiver, group_by, and group_wait settings
Test rule syntax with promtool check rules /path/to/rules.yml before deploying

Code

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'alert_rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'localhost:9093'

---
# alert_rules.yml
groups:
  - name: application_alerts
    interval: 15s
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High error rate on {{ $labels.instance }}"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}"

      - alert: ServiceDown
        expr: up{job="api"} == 0
        for: 1m
        labels:
          severity: critical
          team: platform
        annotations:
          summary: "{{ $labels.instance }} is down"
          description: "Service {{ $labels.job }} on {{ $labels.instance }} has been unreachable for 1 minute"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_Mem

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

Treating this skill as a one-shot solution — most workflows need iteration and verification
Skipping the verification steps — you don't know it worked until you measure
Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

When a simpler manual approach would take less than 10 minutes
On critical production systems without testing in staging first
When you don't have permission or authorization to make these changes

How to Verify It Worked

Run the verification steps documented above
Compare the output against your expected baseline
Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

Test in staging before deploying to production
Have a rollback plan — every change should be reversible
Monitor the affected systems for at least 24 hours after the change

Quick Info

CategoryMonitoring & Logging

Difficultyintermediate

Version1.0.0

AuthorClaude Skills Hub

monitoringalertsnotifications

Install command:

curl -o ~/.claude/skills/alert-rules.md https://claude-skills-hub.vercel.app/skills/monitoring/alert-rules.md

Related Monitoring & Logging Skills

Other Claude Code skills in the same category — free to download.

Browse all

Monitoring & Loggingintermediate

Structured Logging

Implement structured logging (Winston, Pino)

Monitoring & Loggingbeginner

Error Tracking

Set up error tracking (Sentry)

Monitoring & Loggingintermediate

APM Setup

Set up Application Performance Monitoring

Monitoring & Loggingintermediate

Log Rotation

Configure log rotation and management

Monitoring & Loggingintermediate

Health Dashboard

Create health monitoring dashboard

Monitoring & Loggingadvanced

Distributed Tracing

Set up distributed tracing

Monitoring & Loggingintermediate

Metrics Collector

Implement custom metrics collection

Monitoring & Loggingbeginner

Uptime Monitor

Set up uptime monitoring

Want a Monitoring & Logging skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.

Custom Agent — $5 →|Analyze My Stack — $3 →

Alert Rules

What to check first

Steps

Code

Common Pitfalls

When NOT to Use This Skill

How to Verify It Worked

Production Considerations

Quick Info

Related Skills

Related Monitoring & Logging Skills

Structured Logging

Error Tracking

APM Setup

Log Rotation

Health Dashboard

Distributed Tracing

Metrics Collector

Uptime Monitor