Health Checks

Stategraph provides two health check endpoints for deployment environments like ECS/ALB, Kubernetes, and other container orchestrators.

Endpoints

Liveness Probe

GET /health/live

Returns 200 OK as long as the nginx process is running. This endpoint is served directly by nginx and does not depend on the backend application or database.

Use this for:
- ALB target group health checks
- ECS container health checks
- Kubernetes liveness probes

Readiness Probe

GET /health/ready

Returns 200 OK only when the backend application is running and ready to serve requests. This endpoint is proxied by nginx to the backend and will return a non-200 status if the backend has not started yet (e.g., during database migrations).

Use this for:
- Kubernetes readiness probes
- Load balancer routing decisions (only send traffic to ready instances)
- Monitoring systems

Legacy Endpoint

GET /api/v1/health

This endpoint is served by the backend application and behaves the same as /health/ready — it returns 200 OK only when the backend is running. It is supported for backwards compatibility but new deployments should use /health/live and /health/ready.

Startup Behavior

During container startup, Stategraph runs database migrations before starting the backend HTTP server. The timeline looks like this:

t=0   Container starts
t=1   nginx starts listening on port 8080
t=1   /health/live returns 200
t=2   Database migrations begin
t=5+  Migrations complete, backend starts
t=5+  /health/ready returns 200

During the migration window:
- /health/live returns 200 (nginx is up)
- /health/ready returns 502 (backend not yet listening)

After migrations complete:
- /health/live returns 200
- /health/ready returns 200

ALB / ECS Configuration

Recommended Settings

Setting	Value	Reason
Health check path	`/health/live`	Available immediately, survives migration window
Health check interval	30s	Standard interval
Healthy threshold	2	Two consecutive successes
Unhealthy threshold	5	Tolerant during startup
Health check grace period	120s	Allow time for migrations on first deploy

ECS Task Definition

{
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -f http://localhost:8080/health/live || exit 1"],
    "interval": 30,
    "timeout": 5,
    "retries": 5,
    "startPeriod": 120
  }
}

The startPeriod of 120 seconds gives migrations time to complete before ECS starts checking health.

ALB Target Group

resource "aws_lb_target_group" "stategraph" {
  # ...

  health_check {
    path                = "/health/live"
    interval            = 30
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 5
  }
}

Verifying Readiness Before Routing Traffic

If you want to ensure the backend is fully ready before routing traffic, use /health/ready as the ALB health check path instead. Set a longer grace period (180s) to account for migration time:

health_check {
  path                = "/health/ready"
  interval            = 30
  timeout             = 5
  healthy_threshold   = 2
  unhealthy_threshold = 10
}

Kubernetes Configuration

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 10