Skip to content

Service YAML Schema

Complete reference for service specification files.

Full Schema

# Required fields
name: string              # Service identifier (lowercase, hyphens)
tier: integer | string    # 1/2/3 or critical/standard/low
type: string              # api, worker, stream

# Optional fields
team: string              # Team name
description: string       # Human-readable description

# Technology dependencies
dependencies:
  - string                # postgresql, redis, kafka, etc.

# Custom resource definitions
resources:
  - kind: string          # SLO, Alert, PagerDuty, Dependencies
    name: string          # Resource identifier
    spec: object          # Resource-specific configuration

# Environment-specific overrides
environments:
  production: object      # Override for production
  staging: object         # Override for staging

Field Reference

name

Required | string

Service identifier. Must be:

  • Lowercase letters, numbers, hyphens only
  • Cannot start or end with hyphen
  • Unique within your organization
name: payment-api        # Valid
name: Payment-API        # Invalid (uppercase)
name: -payment-api       # Invalid (starts with hyphen)

tier

Required | integer or string

Service priority level:

Value Meaning Default SLO
1 / critical Business-critical 99.95%
2 / standard Important 99.9%
3 / low Best-effort 99.5%
tier: 1
# or
tier: critical

type

Required | string

Service type determines default metrics and SLOs:

Type Description
api HTTP/REST service
worker Background job processor
stream Event/message processor

team

Optional | string

Team responsible for the service. Used for:

  • PagerDuty team creation
  • Dashboard grouping
  • Alert routing
team: payments

dependencies

Optional | list[string]

Technology dependencies. Each adds dashboard panels and alerts.

dependencies:
  - postgresql
  - redis
  - kafka

Supported values: See Technologies

resources

Optional | list[Resource]

Custom resource definitions.

SLO Resource

resources:
  - kind: SLO
    name: availability      # Unique name
    spec:
      objective: 99.95      # Target percentage
      window: 30d           # Time window
      threshold_ms: 200     # For latency SLOs
      indicator:
        type: availability  # availability, latency, throughput
        percentile: 99      # For latency SLOs
        query: |            # PromQL query
          sum(rate(...))

Alert Resource

resources:
  - kind: Alert
    name: high-error-rate
    spec:
      expr: |               # PromQL expression
        sum(rate(http_requests_total{status=~"5.."}[5m])) /
        sum(rate(http_requests_total[5m])) > 0.01
      for: 5m               # Duration before firing
      severity: critical    # critical, warning, info
      labels:
        team: payments
      annotations:
        summary: "High error rate"
        description: "{{ $value | humanizePercentage }}"

PagerDuty Resource

resources:
  - kind: PagerDuty
    name: alerting
    spec:
      urgency: high         # high, low
      auto_create: true     # Create team/service automatically
      service_id: PXXXXXX   # Use existing service
      escalation_policy: PXXXXXX

Dependencies Resource

resources:
  - kind: Dependencies
    name: upstream
    spec:
      services:
        - name: user-service
          criticality: high
        - name: inventory-service
          criticality: medium

environments

Optional | object

Environment-specific overrides:

environments:
  production:
    tier: critical
    resources:
      - kind: SLO
        name: availability
        spec:
          objective: 99.99

  staging:
    tier: standard
    resources:
      - kind: SLO
        name: availability
        spec:
          objective: 99.5

Complete Example

name: payment-api
team: payments
tier: critical
type: api
description: Handles payment processing

dependencies:
  - postgresql
  - redis

resources:
  - kind: SLO
    name: availability
    spec:
      objective: 99.95
      window: 30d
      indicator:
        type: availability
        query: |
          sum(rate(http_requests_total{service="payment-api",status!~"5.."}[5m])) /
          sum(rate(http_requests_total{service="payment-api"}[5m]))

  - kind: SLO
    name: latency-p99
    spec:
      objective: 99.0
      window: 30d
      threshold_ms: 200
      indicator:
        type: latency
        percentile: 99
        query: |
          histogram_quantile(0.99,
            sum by (le) (rate(http_request_duration_seconds_bucket{service="payment-api"}[5m]))
          )

  - kind: Alert
    name: high-error-rate
    spec:
      expr: |
        sum(rate(http_requests_total{service="payment-api",status=~"5.."}[5m])) /
        sum(rate(http_requests_total{service="payment-api"}[5m])) > 0.01
      for: 5m
      severity: critical

  - kind: PagerDuty
    name: alerting
    spec:
      urgency: high
      auto_create: true

environments:
  production:
    resources:
      - kind: SLO
        name: availability
        spec:
          objective: 99.99

Validation

nthlayer validate payment-api.yaml