Deployment Gates¶

Deployment gates block or warn about deployments when error budget is exhausted. This is the core of NthLayer's Shift Left approach - catching reliability issues before they reach production.

Why Deployment Gates?¶

Traditional monitoring tells you after a deployment causes problems. Deployment gates tell you before - enabling you to:

Prevent incidents by blocking deploys when reliability is already degraded
Make informed decisions with real-time error budget visibility
Enforce SLO discipline across your organization

How It Works¶

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   CI/CD     │────▶│  NthLayer   │────▶│ Prometheus  │
│  Pipeline   │     │ check-deploy │     │   Query     │
└─────────────┘     └─────────────┘     └─────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   Result    │
                    │ 0=Approved  │
                    │ 1=Warning   │
                    │ 2=Blocked   │
                    └─────────────┘

Pipeline triggers: Your CI/CD pipeline calls nthlayer check-deploy
Query SLOs: NthLayer queries Prometheus for current SLI values
Calculate budget: Error budget consumption is calculated
Return decision: Exit code determines if deploy proceeds

Default Thresholds¶

Thresholds are based on service tier:

Tier	Warning	Blocking
Critical	<20% remaining	<10% remaining
Standard	<20% remaining	None (advisory)
Low	<30% remaining	None (advisory)

Critical services block deploys at 10% remaining budget. Standard and low tiers only warn.

Custom Policies¶

Override defaults with a DeploymentGate resource in your service.yaml:

resources:
  - kind: DeploymentGate
    name: custom-gate
    spec:
      thresholds:
        warning: 30
        blocking: 5

Conditional Thresholds¶

Apply different thresholds based on conditions:

resources:
  - kind: DeploymentGate
    name: smart-gate
    spec:
      thresholds:
        warning: 20
        blocking: 10

      conditions:
        # Stricter during business hours
        - name: business-hours
          when: "hour >= 9 AND hour <= 17 AND weekday"
          blocking: 15

        # Complete freeze during incidents
        - name: incident-freeze
          when: "budget_remaining < 5"
          blocking: 100

Team Exceptions¶

Allow specific teams to bypass gates:

spec:
  exceptions:
    - team: sre-oncall
      allow: always

Integration Points¶

CI/CD Pipelines¶

NthLayer integrates with:

GitHub Actions - Reusable action
ArgoCD - PreSync hook
GitLab CI - Reusable template
Tekton - Reusable task

See examples/cicd/ for templates.

Manual Check¶

Test locally before integrating:

nthlayer check-deploy services/api.yaml \
  --prometheus-url http://prometheus:9090

echo "Exit code: $?"

Blast Radius Awareness¶

Gates consider downstream dependencies:

resources:
  - kind: Dependencies
    name: downstream
    spec:
      services:
        - name: checkout-service
          criticality: high
        - name: analytics
          criticality: low

High-criticality downstream services increase the gate's caution level.

Best Practices¶

1. Start with Advisory Mode¶

Begin with warnings only, then enable blocking:

spec:
  thresholds:
    warning: 20
    blocking: null  # Advisory only

2. Use Environment-Specific Thresholds¶

Be lenient in dev, strict in prod:

# dev environment
spec:
  thresholds:
    warning: 50
    blocking: null

# prod environment
spec:
  thresholds:
    warning: 20
    blocking: 10

3. Define SLO Queries¶

Gates need working SLO queries to calculate budget:

- kind: SLO
  name: availability
  spec:
    objective: 99.95
    window: 30d
    indicator:
      query: |
        sum(rate(http_requests_total{status!~"5.."}[5m]))
        /
        sum(rate(http_requests_total[5m]))

4. Test Before Production¶

Use the demo mode to see gate behavior:

nthlayer check-deploy services/api.yaml --demo