Adoption Path¶

NthLayer can be adopted incrementally. You don't need to enable all features on day one. This guide walks through a proven three-phase approach that lets teams build confidence before enabling enforcement.

Overview¶

Phase	What You Do	Risk Level	Time to Value
1. Generate	Run locally, review output	None	1 day
2. Validate	Add to CI, warnings only	Low	1 week
3. Protect	Enable gates, block deploys	Medium	2-4 weeks

Phase 1: Generate Only¶

Goal: See what NthLayer produces without any CI/CD integration.

Duration: 1-3 days

Steps¶

Install NthLayer
```
pip install nthlayer
```

Create a service spec

nthlayer init
# Or manually create service.yaml

Generate artifacts locally

nthlayer apply services/checkout-api.yaml --output-dir ./generated

Review the output

generated/
└── checkout-api/
    ├── dashboard.json      # Grafana dashboard
    ├── alerts.yaml         # Prometheus alert rules
    ├── recording_rules.yaml
    └── slo.yaml            # SLO definitions

Compare to your existing setup
Are the generated alerts better than what you have?
Does the dashboard cover what you need?
Are SLO targets reasonable for this service?

Success Criteria¶

[ ] Generated artifacts look correct
[ ] You understand what each file does
[ ] You've identified any customizations needed

What You Learn¶

How tier affects defaults
What NthLayer generates vs what you need to customize
Whether your service.yaml needs adjustments

Phase 2: Validate in CI¶

Goal: Run NthLayer in CI to catch issues early, but don't block deploys yet.

Duration: 1-2 weeks

Steps¶

Add NthLayer to your CI pipeline

# .github/workflows/ci.yml
- name: Generate and validate reliability config
  run: |
    pip install nthlayer
    nthlayer apply services/${{ matrix.service }}.yaml --lint

Enable verification in warning mode

- name: Verify metrics exist (warnings only)
  run: |
    nthlayer verify services/${{ matrix.service }}.yaml --no-fail
  env:
    PROMETHEUS_URL: ${{ secrets.PROMETHEUS_URL }}

Commit generated artifacts

- name: Check for uncommitted changes
  run: |
    git diff --exit-code generated/

What to Watch For¶

Lint failures: Invalid PromQL in generated alerts
Verification warnings: Missing metrics in Prometheus
Drift: Generated files that weren't committed

Success Criteria¶

[ ] CI runs NthLayer on every PR
[ ] Team reviews NthLayer output in PRs
[ ] No unexpected lint failures
[ ] Verification warnings are understood (not necessarily fixed)

What You Learn¶

Which services are missing required metrics
Whether your Prometheus setup works with NthLayer
Team comfort level with the generated artifacts

Phase 3: Protect in CD¶

Goal: Enable deployment gates that block risky deploys.

Duration: 2-4 weeks (gradual rollout)

Steps¶

Start with non-critical services

Pick 2-3 standard or low tier services first:

# service.yaml
tier: standard  # Start here, not critical

Enable check-deploy in warning mode

# CD pipeline
- name: Check deployment gate
  run: |
    nthlayer check-deploy services/${{ matrix.service }}.yaml || echo "Gate warning (not blocking)"
  env:
    PROMETHEUS_URL: ${{ secrets.PROMETHEUS_URL }}

Monitor for false positives

Track: - How often would deploys have been blocked? - Were those blocks justified? - Any false positives?

Graduate to blocking mode

- name: Check deployment gate
  run: |
    nthlayer check-deploy services/${{ matrix.service }}.yaml
  # Now exit code 2 will fail the pipeline

Expand to critical services

Only after confidence is built:

tier: critical  # Now enable for high-stakes services

Rollout Schedule¶

Week	Services	Mode
1	2-3 low tier	Warning only
2	All low tier	Blocking
3	Standard tier	Warning only
4	Standard tier	Blocking
5+	Critical tier	Warning, then blocking

Success Criteria¶

[ ] Gates correctly block deploys with exhausted error budgets
[ ] No false positives blocking valid deploys
[ ] Team trusts the gate decisions
[ ] Escalation path exists for gate overrides

What You Learn¶

Whether your SLO targets are realistic
How often services are actually at risk
Team response to automated enforcement

Common Adoption Patterns¶

Pattern A: Platform Team Drives¶

Platform team adopts NthLayer
Creates org-wide service templates
Onboards service teams one by one
Mandates adoption for new services

Best for: Organizations with strong platform teams

Pattern B: Service Team Experiments¶

One service team tries NthLayer
Shares results with other teams
Organic adoption spreads
Platform team eventually standardizes

Best for: Bottom-up engineering cultures

Pattern C: Incident-Driven¶

Major incident reveals monitoring gaps
NthLayer adopted for affected services
Expanded based on incident learnings
Eventually becomes standard

Best for: Organizations learning from failures

Rollback Plan¶

If adoption isn't working:

Phase 3 → Phase 2¶

Remove check-deploy from CD
Keep verify --no-fail in CI
Investigate why gates were problematic

Phase 2 → Phase 1¶

Remove NthLayer from CI
Continue using generated artifacts manually
Investigate lint/verify issues

Phase 1 → Nothing¶

Stop using NthLayer
Keep existing monitoring setup
Document what didn't work for future reference

Timeline Summary¶

Milestone	Typical Duration
First service.yaml created	Day 1
First generated artifacts reviewed	Day 1-3
NthLayer running in CI	Week 1
First service with blocking gate	Week 3-4
All services with blocking gates	Month 2-3
Full org standardization	Month 3-6

The key is incremental confidence: each phase proves value before the next adds risk.