Service Specifications¶
A service spec is a YAML file that describes your service and its reliability requirements.
Structure¶
# Basic metadata
name: payment-api
team: payments
tier: critical # 1, 2, 3 or critical, standard, low
type: api # api, worker, stream
# Dependencies (optional)
dependencies:
- postgresql
- redis
# Custom resources (optional)
resources:
- kind: SLO
name: availability
spec:
objective: 99.95
...
Required Fields¶
| Field | Description |
|---|---|
name | Service identifier (lowercase, hyphens allowed) |
tier | Priority level: 1/critical, 2/standard, 3/low |
type | Service type: api, worker, stream |
Optional Fields¶
| Field | Description |
|---|---|
team | Team responsible for the service |
description | Human-readable description |
dependencies | List of technology dependencies |
resources | Custom SLO, alert, or PagerDuty configs |
Service Types¶
API (type: api)¶
HTTP/REST services with request/response patterns.
Default Metrics:
http_requests_total- Request count by statushttp_request_duration_seconds- Latency histogram
Default SLOs:
- Availability: % of non-5xx responses
- Latency: p99 response time
Worker (type: worker)¶
Background job processors.
Default Metrics:
job_processed_total- Jobs completedjob_duration_seconds- Processing timejob_failed_total- Failed jobs
Default SLOs:
- Throughput: Jobs processed per minute
- Success rate: % of successful jobs
Stream (type: stream)¶
Event/message processors.
Default Metrics:
messages_processed_total- Messages handledconsumer_lag- Messages behindprocessing_duration_seconds- Processing time
Default SLOs:
- Throughput: Messages per second
- Lag: Consumer lag in messages
Dependencies¶
List technologies your service depends on:
Each dependency adds:
- Dashboard panels with technology-specific metrics
- Alerts for common failure modes
- Recording rules for aggregation
See Technologies for all 18 supported technologies.
Custom Resources¶
Override defaults or add custom configurations:
Custom SLO¶
resources:
- kind: SLO
name: availability
spec:
objective: 99.99 # Override default
window: 7d # Custom window
indicator:
type: availability
query: |
sum(rate(http_requests_total{service="payment-api",status!~"5.."}[5m])) /
sum(rate(http_requests_total{service="payment-api"}[5m]))
Custom Alert¶
resources:
- kind: Alert
name: high-latency
spec:
expr: |
histogram_quantile(0.99,
sum by (le) (rate(http_request_duration_seconds_bucket{service="payment-api"}[5m]))
) > 0.5
for: 5m
severity: warning
annotations:
summary: "High latency on payment-api"
PagerDuty¶
Environment Overrides¶
Different settings per environment:
# payment-api.yaml
name: payment-api
tier: critical
type: api
environments:
production:
resources:
- kind: SLO
name: availability
spec:
objective: 99.99
staging:
resources:
- kind: SLO
name: availability
spec:
objective: 99.9
Validation¶
Validate your spec before applying:
✓ payment-api.yaml is valid
- 2 SLOs defined
- 2 dependencies (postgresql, redis)
- PagerDuty integration enabled
Examples¶
See the examples/services directory for complete examples.