Protect¶

Block risky deploys before they cause incidents.

NthLayer's protection layer enforces reliability policies in production. When error budget is exhausted, deployments are blocked - automatically.

The Protection Stack¶

Feature	Command	What It Does
Deployment Gates	`nthlayer check-deploy`	Block deploys when budget exhausted
Drift Detection	`nthlayer drift`	Detect reliability degradation trends
SLO Portfolio	`nthlayer portfolio`	Org-wide reliability visibility
Error Budgets	`nthlayer slo collect`	Real-time budget consumption

Why Deployment Gates?¶

Traditional monitoring tells you after a deployment causes problems:

Deploy → Incident → Page → Investigate → Rollback → Postmortem
                      ↑
               "The deploy broke it"

Deployment gates tell you before:

Check Budget → Block/Allow → Deploy
      ↑
  "Budget exhausted - don't deploy"

Quick Start¶

1. Check Before Deploy¶

nthlayer check-deploy services/payment-api.yaml \
  --prometheus-url http://prometheus:9090

Output:

╭──────────────────────────────────────────────────────────────╮
│  Deployment Gate: payment-api                                │
╰──────────────────────────────────────────────────────────────╯

  Tier:          critical
  Window:        30d

  SLO Status:
    availability   99.87%  (target: 99.95%)   42% remaining   ⚠ WARNING
    latency_p99    187ms   (target: 200ms)    78% remaining   ✓ OK

  Decision: ⚠ PROCEED WITH CAUTION
  Exit code: 1

2. View Portfolio Health¶

nthlayer portfolio --path services/

Output:

╭──────────────────────────────────────────────────────────────╮
│  NthLayer SLO Portfolio                                      │
╰──────────────────────────────────────────────────────────────╯

  Organization Health: 78% (14/18 services meeting SLOs)

  By Tier:
    Critical:  ████████░░  83% (5/6 services)
    Standard:  ███████░░░  75% (6/8 services)
    Low:       ███████░░░  75% (3/4 services)

  ⚠ Services Needing Attention:
    payment-api    availability  156% burned  EXHAUSTED
    search-api     latency       95% burned   WARNING

Exit Codes¶

Deployment gates use exit codes for CI/CD integration:

Code	Decision	Pipeline Action
`0`	Approved	Deploy proceeds
`1`	Warning	Deploy with caution
`2`	Blocked	Fail pipeline

Tier-Based Thresholds¶

Default thresholds vary by service tier:

Tier	Warning	Blocking
Critical	<20% remaining	<10% remaining
Standard	<20% remaining	None (advisory)
Low	<30% remaining	None (advisory)

Critical services block deploys at 10% remaining budget. Lower tiers only warn.

Custom Policies¶

Override defaults with a DeploymentGate resource:

resources:
  - kind: DeploymentGate
    name: strict-gate
    spec:
      thresholds:
        warning: 30
        blocking: 15

      conditions:
        # Stricter during business hours
        - name: business-hours
          when: "hour >= 9 AND hour <= 17 AND weekday"
          blocking: 20

        # Complete freeze during incidents
        - name: low-budget
          when: "budget_remaining < 5"
          blocking: 100

CI/CD Integration¶

GitHub Actions¶

jobs:
  deploy:
    steps:
      - name: Check Deployment Gate
        run: |
          nthlayer check-deploy services/api.yaml \
            --prometheus-url ${{ secrets.PROMETHEUS_URL }}

      - name: Deploy
        if: success()  # Only if gate passed
        run: kubectl apply -f generated/

ArgoCD PreSync Hook¶

apiVersion: batch/v1
kind: Job
metadata:
  name: deployment-gate
  annotations:
    argocd.argoproj.io/hook: PreSync
spec:
  template:
    spec:
      containers:
        - name: check
          image: ghcr.io/nthlayer/nthlayer:latest
          command:
            - nthlayer
            - check-deploy
            - /config/service.yaml
            - --prometheus-url
            - $(PROMETHEUS_URL)

Drift Detection¶

Deployment gates check the current budget state. Drift detection looks at trends over time:

┌─────────────────────────────────────────────────────────────────────┐
│  Deployment Gate: "Is budget OK right now?"                         │
│  Drift Detection: "Is budget trending toward exhaustion?"           │
└─────────────────────────────────────────────────────────────────────┘

Analyze Drift¶

nthlayer drift services/payment-api.yaml \
  --prometheus-url http://prometheus:9090

Output:

╭──────────────────────────────────────────────────────────────╮
│  Drift Analysis: payment-api                                 │
╰──────────────────────────────────────────────────────────────╯

  Current Budget:     72.34%
  Trend:              -0.52%/week
  Pattern:            Gradual Decline
  Days to Exhaustion: 138 days

  ⚠ Severity: WARN
  Recommendation: Investigate recent changes.

Drift Patterns¶

Pattern	What It Means	Action
Gradual Decline	Slow erosion over time	Investigate technical debt
Step Change Down	Sudden drop (incident/bad deploy)	Immediate investigation
Volatile	High variance, no trend	Check for intermittent issues
Stable	No significant trend	Continue monitoring

Integrate with Deployment Gates¶

Add drift analysis to your deployment gate check:

nthlayer check-deploy services/api.yaml \
  --prometheus-url http://prometheus:9090 \
  --include-drift

Org-Wide Drift View¶

nthlayer portfolio --drift --prometheus-url http://prometheus:9090

The Google SRE Connection¶

NthLayer automates the Error Budget Policy from the Google SRE Book:

"If our SLO says we can have 0.1% downtime per month, and we've already used 0.08%, we should be very careful about deploying new features."

SRE Concept	Manual Process	NthLayer Automation
Error Budget Policy	Spreadsheet tracking	`nthlayer check-deploy`
Release Freeze	Calendar reminders	Automatic blocking
Budget Visibility	Monthly reports	`nthlayer portfolio`

Next Steps¶

Deployment Gates - Full command reference
Drift Detection - Trend analysis command
SLO Portfolio - Organization-wide view
Error Budgets - Understanding SLOs
CI/CD Integration - Pipeline examples