Health Check API

The Health Check API provides operational endpoints for monitoring service availability and health status. These endpoints are designed for use with container orchestration platforms (Kubernetes, Docker Swarm) and monitoring systems.

Endpoints Summary

Method	Path	Description
`GET`	`/health`	Basic health check with version info
`GET`	`/health/live`	Liveness probe (server responsive)
`GET`	`/health/ready`	Readiness probe (database connectivity)
`GET`	`/metrics`	Prometheus metrics endpoint

GET /health

Basic health check that confirms the service process is running.

Request

curl http://localhost:8080/health

Response

Status: 200 OK

{
  "status": "ok",
  "version": "0.1.0"
}

Response Fields

Field	Type	Description
`status`	string	Always returns “ok” when service is running
`version`	string	Current application version from Cargo.toml

GET /health/live

Liveness probe endpoint that performs a minimal check to confirm the server process is responsive. This endpoint is intended for container orchestration systems to detect if the application needs to be restarted.

Request

curl http://localhost:8080/health/live

Response

Status: 200 OK

No response body. Returns only HTTP status code.

Usage

Configure Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30

GET /health/ready

Readiness probe endpoint that verifies the service is ready to accept traffic by checking database connectivity. This endpoint executes a SELECT 1 query against the database to confirm the connection is active.

Request

curl http://localhost:8080/health/ready

Response - Healthy

Status: 200 OK

{
  "status": "ready",
  "version": "0.1.0",
  "database": "connected"
}

Response - Unhealthy

Status: 503 Service Unavailable

{
  "status": "unhealthy",
  "version": "0.1.0",
  "database": "disconnected"
}

Response Fields

Field	Type	Description
`status`	string	“ready” when healthy, “unhealthy” when database is disconnected
`version`	string	Current application version from Cargo.toml
`database`	string	“connected” or “disconnected” based on database query result

Usage

Configure Kubernetes readiness probe:

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

GET /metrics

Prometheus metrics endpoint that exposes operational metrics in Prometheus format.

Request

curl http://localhost:8080/metrics

Response

Status: 200 OK

Content-Type: text/plain; version=0.0.4

Returns metrics in Prometheus text exposition format:

# HELP sso_http_request_duration_seconds HTTP request duration in seconds by method, route pattern, and status class
# TYPE sso_http_request_duration_seconds histogram
sso_http_request_duration_seconds_bucket{method="GET",route="/api/user",status="2xx",le="0.005"} 142
sso_http_request_duration_seconds_bucket{method="GET",route="/api/user",status="2xx",le="0.01"} 287
...

# HELP sso_db_pool_connections_total Current number of connections in the database pool
# TYPE sso_db_pool_connections_total gauge
sso_db_pool_connections_total{backend="sqlite"} 5

# HELP sso_job_queue_depth Number of pending jobs in the system job queue
# TYPE sso_job_queue_depth gauge
sso_job_queue_depth 0

Available Metrics

Metric	Type	Labels	Description
`sso_http_request_duration_seconds`	Histogram	`method`, `route`, `status`	HTTP request latency
`sso_db_pool_connections_total`	Gauge	`backend`	Current pool connections
`sso_db_pool_connections_idle`	Gauge	`backend`	Idle pool connections
`sso_db_pool_connections_max`	Gauge	`backend`	Max configured connections
`sso_job_queue_depth`	Gauge	-	Pending background jobs
`sso_job_processing_duration_seconds`	Histogram	`job_type`	Job execution latency
`sso_webhook_delivery_latency_seconds`	Histogram	-	Webhook delivery time
`sso_active_users_total`	Gauge	-	Total active users
`sso_total_organizations`	Gauge	-	Total organizations
`sso_mfa_enabled_users_total`	Gauge	-	Users with MFA enabled
`sso_mfa_adoption_percentage`	Gauge	-	MFA adoption rate
`sso_login_failures_total`	Counter	`reason`	Failed login attempts
`sso_auth_tokens_issued_total`	Counter	-	Tokens issued

Prometheus Configuration

Add to your prometheus.yml:

scrape_configs:
  - job_name: 'sso'
    static_configs:
      - targets: ['sso-server:8080']
    scrape_interval: 15s
    metrics_path: /metrics

Example PromQL Queries

# P95 HTTP request latency
histogram_quantile(0.95, rate(sso_http_request_duration_seconds_bucket[5m]))

# Request rate by route
rate(sso_http_request_duration_seconds_count[5m])

# Connection pool utilization
sso_db_pool_connections_total / sso_db_pool_connections_max

# Background job backpressure
sso_job_queue_depth > 100

Implementation Details

Authentication: None required. All health and metrics endpoints are publicly accessible.
Rate Limiting: Health endpoints are not rate-limited.
Database Check: The /health/ready endpoint executes SELECT 1 to verify database connectivity. This is a lightweight query that works across SQLite, PostgreSQL, and MySQL.
Response Time: All endpoints are designed for fast responses (<100ms under normal conditions).
Metrics Update: Gauge metrics (users, organizations, pool stats) are updated every 30 seconds via background task.