Live Demo¶
20 Hours of SRE Work in 5 Minutes¶
Define your service once. NthLayer generates SLOs, alerts, dashboards, recording rules, and runbooks—with technology-specific best practices built in.
-
99.6% Time Savings
-
Tech-Aware (PostgreSQL, Redis, Kafka)
-
One Command for Everything
-
Real-Time Error Budget Tracking
Live Grafana Dashboards¶
See auto-generated dashboards for 6 production services. Each dashboard includes SLO metrics, service health, and technology-specific panels.
-
payment-api
PostgreSQL + Redis
-
checkout-service
MySQL + Redis
-
notification-worker
Worker + Redis
-
analytics-stream
Stream + MongoDB
-
identity-service
PostgreSQL + Redis
-
search-api
Elasticsearch + Redis
Dashboard Structure
Each dashboard is organized into: SLO Metrics → Service Health → Dependencies
Generated Alerts¶
118 production-ready Prometheus alerts across all services, sourced from awesome-prometheus-alerts.
-
payment-api · 15 PostgreSQL alerts
PostgresqlDown, PostgresqlRestarted, SlowQueries...
-
checkout-service · 26 MySQL + Redis alerts
MysqlDown, RedisMemoryHigh, ReplicationLag...
-
notification-worker · 12 Redis alerts
RedisDown, RedisMemoryHigh, TooManyConnections...
-
analytics-stream · 19 MongoDB + Redis alerts
MongodbDown, CursorsTimeouts, ReplicasetLag...
-
identity-service · 27 PostgreSQL + Redis alerts
PostgresqlDown, DeadLocks, RedisRejected...
-
search-api · 19 Elasticsearch alerts
ClusterRed, JvmHeapHigh, DiskSpaceLow...
SLO Portfolio¶
Track org-wide reliability with tier-based health scoring:
================================================================================
NthLayer SLO Portfolio
================================================================================
Organization Health: 78% (14/18 services meeting SLOs)
By Tier:
Critical: ████████░░ 83% (5/6 services)
Standard: ███████░░░ 75% (6/8 services)
Low: ███████░░░ 75% (3/4 services)
--------------------------------------------------------------------------------
Services Needing Attention:
--------------------------------------------------------------------------------
payment-api (Tier 1)
availability: 156% budget burned - EXHAUSTED
Remaining: -12.5 hours
search-api (Tier 2)
latency-p99: 95% budget burned - WARNING
Remaining: 1.2 hours
--------------------------------------------------------------------------------
Total: 18 services, 16 with SLOs, 45 SLOs
Cross-Vendor Aggregation
Why this matters: PagerDuty can't give you this view—they want you locked into their ecosystem. NthLayer aggregates SLOs across any backend (Prometheus, Datadog, etc.) in a single, vendor-neutral portfolio.
PagerDuty Integration¶
Complete incident response setup with tier-based escalation policies.
-
Team Management
Auto-creates teams with manager roles assigned to API key owner
-
On-Call Schedules
Primary, secondary, and manager schedules with weekly rotation
-
Tier-Based Timing
Critical: 5→15→30min | High: 15→30→60min | Low: 60min only
-
Service Linking
Services linked to escalation policies with urgency settings
Support Models¶
| Model | Description |
|---|---|
self | Team handles all alerts 24/7 |
shared | Team (day) + SRE (off-hours) |
sre | SRE handles all alerts |
business_hours | Team (9-5) + low-priority queue |
What Gets Generated¶
From a single service.yaml, NthLayer generates:
| Output | Description | Example |
|---|---|---|
| Dashboard | 22 panels: health, SLOs, latency, errors, dependencies | View JSON |
| SLOs | 3 SLOs with 30-day error budgets and burn rate calculations | View YAML |
| Alerts | 15 PostgreSQL alerts with service labels and severity routing | View YAML |
| Recording Rules | 21 pre-aggregated metrics for 10x faster dashboard queries | View YAML |
Try It Yourself¶
# Install NthLayer
pipx install nthlayer
# Interactive setup (configures Prometheus, Grafana, PagerDuty)
nthlayer setup
# Generate configs for your service
nthlayer apply payment-api.yaml
# View org-wide SLO health
nthlayer portfolio
-
Get Started
Install NthLayer and generate your first reliability stack
-
Full Documentation
Comprehensive guides for all features