Skip to content

nthlayer generate-alerts

Generate Prometheus alert rules from a service specification.

Synopsis

nthlayer generate-alerts <service-file> [options]

Description

The generate-alerts command creates Prometheus alerting rules based on your service's technology stack and tier. It leverages 400+ battle-tested alert templates from the awesome-prometheus-alerts community repository.

Options

Option Description
--output, -o PATH Output file path (default: generated/alerts/{service}.yaml)
--env, --environment ENV Environment name (dev, staging, prod)
--auto-env Auto-detect environment from CI/CD context
--dry-run Preview alerts without writing file
--runbook-url URL Base URL for runbook links
--notification-channel CHANNEL Notification channel (slack, etc.)

Examples

Basic Generation

nthlayer generate-alerts services/payment-api.yaml

Generates alerts based on the service's technology stack:

# generated/alerts/payment-api.yaml
groups:
  - name: payment-api-alerts
    rules:
      - alert: PaymentAPIHighErrorRate
        expr: |
          sum(rate(http_requests_total{service="payment-api",status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total{service="payment-api"}[5m])) > 0.01
        for: 5m
        labels:
          severity: critical
          service: payment-api
          tier: "1"
        annotations:
          summary: "High error rate on payment-api"
          description: "Error rate is {{ $value | humanizePercentage }} (threshold: 1%)"
          runbook_url: "https://runbooks.example.com/payment-api/high-error-rate"

      - alert: PaymentAPIHighLatency
        expr: |
          histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="payment-api"}[5m])) by (le)) > 0.5
        for: 5m
        labels:
          severity: warning
          service: payment-api
        annotations:
          summary: "High p99 latency on payment-api"
          description: "p99 latency is {{ $value | humanizeDuration }}"

With Runbook URLs

nthlayer generate-alerts services/api.yaml \
  --runbook-url https://wiki.example.com/runbooks

Preview Mode

nthlayer generate-alerts services/api.yaml --dry-run

Technology-Specific Alerts

Based on the technologies section in your service.yaml, appropriate alerts are generated:

Technology Alert Examples
PostgreSQL Connection pool exhaustion, replication lag, slow queries
Redis Memory usage, connection count, evictions
Kafka Consumer lag, partition offline, under-replicated
Kubernetes Pod restarts, OOMKilled, resource limits
HTTP/API Error rate, latency percentiles, availability

Example Service with Technologies

service:
  name: payment-api
  tier: tier-1
  team: payments-team

technologies:
  - name: postgresql
    role: primary-database
  - name: redis
    role: cache
  - name: kafka
    role: event-bus

This generates alerts for the service itself plus technology-specific alerts for PostgreSQL, Redis, and Kafka.

Tier-Based Severity

Alert severity is adjusted based on service tier:

Tier Error Rate Critical Latency Warning
Tier 1 (Critical) > 0.1% > 200ms
Tier 2 (Standard) > 1% > 500ms
Tier 3 (Low) > 5% > 1s

CI/CD Integration

jobs:
  generate:
    steps:
      - name: Generate Alerts
        run: |
          nthlayer generate-alerts services/api.yaml \
            --runbook-url ${{ vars.RUNBOOK_BASE_URL }} \
            --env prod

Output Structure

generated/
└── alerts/
    ├── payment-api.yaml
    ├── user-service.yaml
    └── order-service.yaml

See Also