nthlayer validate-slo¶
Validate that SLO metrics exist in Prometheus before deployment.
This command checks that all metrics referenced in SLO definitions actually exist and are queryable, preventing broken SLO dashboards and alerts.
Usage¶
Options¶
| Option | Description |
|---|---|
--env ENVIRONMENT | Environment name (dev, staging, prod) |
--format {table,json} | Output format (default: table) |
--prometheus-url URL | Prometheus server URL |
--demo | Show demo output with sample data |
Environment Variables¶
| Variable | Description |
|---|---|
PROMETHEUS_URL | Default Prometheus URL |
PROMETHEUS_USERNAME | Basic auth username |
PROMETHEUS_PASSWORD | Basic auth password |
Exit Codes¶
| Code | Meaning |
|---|---|
0 | All SLO metrics exist and are valid |
1 | Some metrics missing (deployment may fail) |
2 | Critical SLO metrics missing (block deployment) |
Examples¶
Basic Validation¶
Output:
╭──────────────────────────────────────────────────────────────────────────────╮
│ SLO Validation: checkout-service │
╰──────────────────────────────────────────────────────────────────────────────╯
Target: http://prometheus:9090
Environment: production
SLO: availability (99.95% target)
Indicator: success_ratio
✓ total_query: sum(rate(http_requests_total{service="checkout"}[5m]))
└─ Metric: http_requests_total ✓ exists (1.2M samples)
✓ good_query: sum(rate(http_requests_total{service="checkout",status!~"5.."}[5m]))
└─ Metric: http_requests_total ✓ exists
SLO: latency (p99 < 200ms)
Indicator: latency
✓ query: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{...}[5m]))
└─ Metric: http_request_duration_seconds_bucket ✓ exists
Summary:
SLOs validated: 2
Metrics checked: 3
All metrics exist ✓
✓ SLO validation passed - safe to deploy
Missing Metrics¶
When metrics are missing:
SLO: availability (99.95% target)
Indicator: success_ratio
✗ total_query: sum(rate(api_requests_total{service="checkout"}[5m]))
└─ Metric: api_requests_total ✗ NOT FOUND
Suggestions:
• Did you mean: http_requests_total?
• Check metric name in application instrumentation
• Ensure service label matches: service="checkout"
Summary:
SLOs validated: 2
Metrics checked: 3
Missing: 1 (critical)
✗ SLO validation failed - missing critical metrics
JSON Output¶
{
"service": "checkout-service",
"prometheus_url": "http://prometheus:9090",
"valid": true,
"slos": [
{
"name": "availability",
"target": 99.95,
"valid": true,
"metrics": [
{
"name": "http_requests_total",
"exists": true,
"sample_count": 1200000
}
]
}
],
"summary": {
"slos_validated": 2,
"metrics_checked": 3,
"metrics_missing": 0
}
}
Demo Mode¶
What Gets Validated¶
SLO Indicator Metrics¶
All metrics referenced in SLO indicator queries:
resources:
- kind: SLO
name: availability
spec:
objective: 99.95
indicators:
- type: availability
success_ratio:
total_query: sum(rate(http_requests_total{service="checkout"}[5m]))
good_query: sum(rate(http_requests_total{service="checkout",status!~"5.."}[5m]))
Recording Rule Dependencies¶
If SLOs use recording rules, validates those exist too:
# If SLO references a recording rule
total_query: checkout:requests:rate5m
# Validates that checkout:requests:rate5m exists
Label Consistency¶
Checks that label selectors match existing label values:
# Validates that service="checkout" exists in the metric
total_query: sum(rate(http_requests_total{service="checkout"}[5m]))
Validation vs Verification¶
| Command | Purpose | When to Use |
|---|---|---|
validate-slo | Check SLO metric existence | Before deploying SLO configs |
verify | Full contract verification | Before promoting to production |
Use validate-slo as a quick check during development. Use verify in CI/CD pipelines for comprehensive validation.
CI/CD Integration¶
Pre-Deployment Check¶
# Validate before deploying Prometheus rules
nthlayer validate-slo service.yaml --prometheus-url $PROMETHEUS_URL
if [ $? -ne 0 ]; then
echo "SLO metrics missing - check instrumentation"
exit 1
fi
# Safe to deploy
kubectl apply -f generated/recording-rules.yaml
GitHub Actions¶
- name: Validate SLO Metrics
run: |
nthlayer validate-slo service.yaml \
--prometheus-url ${{ secrets.PROMETHEUS_URL }} \
--format json > slo-validation.json
if [ $(jq '.valid' slo-validation.json) != "true" ]; then
echo "::error::SLO metrics validation failed"
exit 1
fi
Troubleshooting¶
Metric Not Found¶
- Check metric name - Typos in metric names are common
- Check labels - Ensure label selectors match your instrumentation
- Check time range - New metrics may not have data yet
- Check scrape config - Ensure Prometheus is scraping the target
Label Mismatch¶
Recording Rule Missing¶
# Check if recording rule exists
curl "$PROMETHEUS_URL/api/v1/rules" | jq '.data.groups[].rules[] | select(.name == "checkout:requests:rate5m")'
See Also¶
- nthlayer verify - Full contract verification
- nthlayer apply - Generate SLO configs
- SLO Concepts - Understanding SLOs