Metrics and Performance Monitoring
The essential playbook for implementing metrics and performance monitoring in your SaaS.
Intro
Set up a small but production-ready monitoring baseline for SaaS performance.
Track:
- request latency
- throughput
- error rate
- CPU
- memory
- disk
- database timings
- worker/queue health
- critical business endpoints
The goal is to detect slow requests, resource bottlenecks, and regressions before users report them. Do not build a full observability stack too early.
Quick Fix / Quick Setup
Use this as a minimal baseline on a Linux VPS.
# 1) Install Node Exporter for host metrics
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-*.linux-amd64.tar.gz
tar -xvf node_exporter-*.linux-amd64.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter || true
cat <<'EOF' | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=default.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter# 2) Install cAdvisor for container metrics if using Docker
docker run -d \
--name=cadvisor \
--restart=unless-stopped \
-p 8080:8080 \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
gcr.io/cadvisor/cadvisor:latest# 3) Add a simple health + timing middleware in FastAPI
# app/main.py
import time
from fastapi import FastAPI, Request
import logging
app = FastAPI()
logger = logging.getLogger("perf")
@app.middleware("http")
async def timing_middleware(request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
duration_ms = round((time.perf_counter() - start) * 1000, 2)
logger.info("request_completed", extra={
"path": request.url.path,
"method": request.method,
"status_code": response.status_code,
"duration_ms": duration_ms,
})
response.headers["X-Response-Time-ms"] = str(duration_ms)
return response
@app.get("/health")
def health():
return {"ok": True}# 4) Verify metrics endpoints
curl http://127.0.0.1:9100/metrics | head
curl http://127.0.0.1:8080/metrics | head
curl -i http://127.0.0.1:8000/health# 5) Alert on these first
- p95 latency > 1000ms for 5m
- error rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- DB connections near maxStart with host metrics, request latency, error rate, and one health endpoint. Add database and background job metrics next. Do not begin with dozens of dashboards you will not use.
What’s happening
Performance monitoring should answer four questions:
- is the app up
- is it fast enough
- is it failing
- what resource is limiting it
For a small SaaS, the useful baseline is:
- RED metrics: request Rate, Errors, Duration
- system metrics: CPU, memory, disk, network
- database timings
- worker/queue health
- dependency latency
Common production slowdowns usually come from:
- unindexed queries
- blocked workers
- insufficient CPU or RAM
- oversized responses
- external API latency
- queue backlog
- lock contention
- bad deploys
Server graphs alone are not enough. Infrastructure metrics can show pressure, but they do not tell you which route, query, job, or dependency is slow.
Monitoring is operational only when it includes alerts and response paths.
Worker Topology
Step-by-step implementation
1) Define practical service targets
Pick a small set of SLO-like thresholds.
Example:
- API p95 latency: under 1000ms
- 5xx rate: under 2%
- login endpoint p95: under 700ms
- checkout endpoint p95: under 1500ms
- queue delay: under 60s
- uptime probe: passing every minute
Do not define dozens of targets.
2) Instrument request timing in the app
You need:
- method
- normalized route
- status code
- duration
- request ID if possible
FastAPI timing middleware example:
import time
import uuid
import logging
from fastapi import FastAPI, Request
app = FastAPI()
logger = logging.getLogger("app")
def normalize_path(path: str) -> str:
# Replace dynamic IDs where possible to avoid high-cardinality labels
parts = path.strip("/").split("/")
normalized = []
for p in parts:
if p.isdigit():
normalized.append(":id")
elif len(p) > 20 and "-" in p:
normalized.append(":token")
else:
normalized.append(p)
return "/" + "/".join(normalized) if normalized != [""] else "/"
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
start = time.perf_counter()
response = await call_next(request)
duration_ms = round((time.perf_counter() - start) * 1000, 2)
route = normalize_path(request.url.path)
logger.info("request_completed", extra={
"request_id": request_id,
"method": request.method,
"route": route,
"status_code": response.status_code,
"duration_ms": duration_ms,
})
response.headers["X-Request-ID"] = request_id
response.headers["X-Response-Time-ms"] = str(duration_ms)
return responseIf you already have structured logging, send these fields to logs and build latency/error graphs from them. Pair this with Logging Setup (Application + Server).
3) Add host metrics
Use Node Exporter on Linux.
Verify:
curl -s http://127.0.0.1:9100/metrics | head -50Track at minimum:
- CPU utilization
- load average
- memory usage
- swap usage
- disk usage
- disk IO wait
- filesystem inode usage
- network throughput
- open file descriptors
If disk or memory pressure appears, continue with High CPU / Memory Usage.
4) Add container metrics if using Docker
Use cAdvisor for container-level CPU, memory, filesystem, and network visibility.
Run:
docker run -d \
--name=cadvisor \
--restart=unless-stopped \
-p 8080:8080 \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
gcr.io/cadvisor/cadvisor:latestVerify:
curl -s http://127.0.0.1:8080/metrics | head -50
docker stats5) Add health and readiness endpoints
Use a simple uptime endpoint and, if needed, a dependency-aware readiness endpoint.
FastAPI example:
from fastapi import FastAPI
import os
import psycopg
app = FastAPI()
@app.get("/health")
def health():
return {"ok": True}
@app.get("/ready")
def ready():
db_url = os.environ["DATABASE_URL"]
with psycopg.connect(db_url) as conn:
with conn.cursor() as cur:
cur.execute("select 1;")
cur.fetchone()
return {"ready": True}Use /health for lightweight uptime checks. Use /ready when the service should only receive traffic if dependencies are working.
6) Monitor database performance
For PostgreSQL, enable pg_stat_statements and inspect slow queries.
Enable extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;Check active connections and long-running queries:
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"Check expensive queries:
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"Monitor:
- query duration
- slow query count
- connection pool usage
- active connections
- lock waits
- deadlocks
- cache hit ratio
- replication lag if applicable
If request latency spikes together with DB query time, inspect indexes and transaction duration first.
7) Monitor workers and queues
If you use Celery, RQ, Sidekiq-equivalent services, or custom workers, measure:
- queue depth
- job age
- processing time
- retries
- failures
- worker count
- broker connection health
Celery quick checks:
celery -A app inspect active
celery -A app inspect reserved
redis-cli infoA healthy API can still feel broken if jobs are backed up.
8) Monitor reverse proxy and API edges
For Nginx, track:
- request count
- upstream response time
- active connections
- 4xx count
- 5xx count
- timeout count
Validation commands:
nginx -t
journalctl -u nginx -n 200 --no-pager
ss -tulpn
ss -sFor endpoint-level monitoring and abuse control, also use API Monitoring and Rate Limits.
9) Build only a few dashboards
Recommended dashboard set:
-
Overview
- uptime
- request rate
- 5xx rate
- p95 latency
- CPU
- memory
- disk
- queue depth
-
API performance
- p50/p95/p99 by endpoint group
- 4xx/5xx by endpoint group
- in-flight requests
- response size
-
Database
- active connections
- slow queries
- locks
- top queries
- replication lag
-
Workers
- pending jobs
- processing time
- retries
- failures
Suggested visual:
- dashboard wireframe with one top-level overview and drill-down panels for API, DB, and workers
10) Add alerts with actionability
Start with these alerts only:
- API p95 latency > 1000ms for 5m
- API 5xx rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- queue depth above normal baseline for 10m
- DB connections near max
- health check failureRules:
- alert on sustained problems, not one-off spikes
- page only for user-impacting or revenue-impacting failures
- include remediation notes
- include dashboard link
- include runbook link
Use Debugging Production Issues as the incident workflow and tie metric alerts to fix pages.
11) Validate the monitoring
Test each signal.
Examples:
# basic health
curl -i http://127.0.0.1:8000/health
# quick synthetic load
ab -n 200 -c 20 http://127.0.0.1:8000/health
# system pressure
top
htop
free -m
vmstat 1 5
iostat -xz 1 5Check whether:
- latency increases are visible
- error spikes are visible
- dashboards update
- alerts trigger
- runbooks are linked
- notifications reach the right channel
12) Review weekly
Keep the monitoring set small.
Review:
- noisy alerts
- broken dashboards
- useless metrics
- missing metrics discovered from incidents
- threshold changes after product growth
If a metric does not change an operational decision, remove it.
Common causes
Most monitoring gaps or false confidence come from these issues:
- no request timing instrumentation in the app
- monitoring only uptime, not latency or error rate
- slow database queries or missing indexes
- CPU saturation from too few workers, loops, or heavy serialization
- memory pressure from leaks, large caches, or oversized payloads
- disk pressure from logs, uploads, backups, or missing rotation
- external API latency affecting auth, payments, email, storage, or webhooks
- noisy alerts with poor thresholds
- dynamic URL labels creating high-cardinality metrics
- no correlation between app logs, proxy logs, and request IDs
Debugging tips
Use these checks during incidents.
Basic endpoint and exporter checks
curl -i http://127.0.0.1:8000/health
curl -s http://127.0.0.1:9100/metrics | head -50
curl -s http://127.0.0.1:8080/metrics | head -50System resource checks
top
htop
free -m
vmstat 1 5
iostat -xz 1 5
df -h
du -sh /var/log/* | sort -h
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | headNetwork and service checks
ss -tulpn
ss -s
journalctl -u nginx -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
nginx -tDocker checks
docker stats
docker logs --tail=200 <container_name>Redis and worker checks
redis-cli info
celery -A app inspect active
celery -A app inspect reservedPostgreSQL checks
psql "$DATABASE_URL" -c "select now();"
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"Quick synthetic load
ab -n 200 -c 20 http://127.0.0.1:8000/healthPractical interpretation:
- low CPU + high latency often means IO wait, DB locks, or external dependency slowdown
- rising memory over time suggests leaks, unbounded caches, or oversized jobs
- bad p95 with normal p50 points to outliers, not constant slowness
- error spikes right after deploys usually indicate regressions or config mismatch
- queue depth growth with healthy API latency points to worker-side issues
Pair metrics with exception visibility using Error Tracking with Sentry.
Checklist
- ✓
/healthendpoint exists and is monitored externally - ✓
/readyexists if dependency-aware routing is needed - ✓ request duration is measured for all API requests
- ✓ path labels are normalized to avoid high-cardinality metrics
- ✓ 5xx error rate is graphed and alerted
- ✓ CPU, memory, disk, and load average are graphed
- ✓ database slow queries are enabled and reviewed
- ✓ queue depth and failed jobs are monitored
- ✓ reverse proxy timings are available
- ✓ dashboards exist for app, DB, worker, and proxy
- ✓ alerts use actionable thresholds
- ✓ alerts link to runbooks or fix pages
- ✓ metrics retention matches budget and troubleshooting needs
- ✓ monitoring is tested after deployment
- ✓ critical flows like login, signup, checkout, and webhooks are monitored
- ✓ production checks are included in SaaS Production Checklist
Related guides
- Error Tracking with Sentry
- API Monitoring and Rate Limits
- Debugging Production Issues
- High CPU / Memory Usage
- SaaS Production Checklist
FAQ
What should I monitor first for a small SaaS?
Start with:
- request latency
- 5xx error rate
- request throughput
- CPU
- memory
- disk usage
- database slow queries
- queue backlog
These cover most incidents without overcomplicating setup.
Do I need Prometheus and Grafana immediately?
No. You can start with hosted monitoring, cloud metrics, or structured logs plus a few exporters. Add Prometheus and Grafana when you need more control or self-hosted metrics.
What latency metric matters most?
p95 is the most practical main signal for most small SaaS apps. p50 hides outliers. p99 is often too noisy early on. Track p95 per critical endpoint group.
How often should alerts fire?
Use sustained windows like 5 to 10 minutes for most alerts. Immediate alerts are appropriate for hard-down conditions or critical flows such as login or checkout failure.
Should I monitor every route separately?
No. Group routes by endpoint type or normalize dynamic paths. Monitoring every unique path creates high-cardinality metrics and expensive dashboards.
How do metrics differ from error tracking?
Metrics show aggregate behavior over time, such as latency, throughput, and error rate. Error tracking captures individual exceptions, stack traces, release impact, and affected users. In production, you usually want both.
Final takeaway
You do not need enterprise observability on day one.
You do need a small, reliable baseline:
- latency
- errors
- throughput
- host resources
- database timings
- queue health
- critical endpoint visibility
Measure what helps you detect regressions and act quickly. If an alert does not change what you do next, remove it or rewrite it.