Metrics and Performance Monitoring — SaaS Builder Playbooks

Intro

Set up a small but production-ready monitoring baseline for SaaS performance.

Track:

request latency
throughput
error rate
CPU
memory
disk
database timings
worker/queue health
critical business endpoints

The goal is to detect slow requests, resource bottlenecks, and regressions before users report them. Do not build a full observability stack too early.

Quick Fix / Quick Setup

Use this as a minimal baseline on a Linux VPS.

bash

# 1) Install Node Exporter for host metrics
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.8.1.linux-amd64.tar.gz
sudo cp node_exporter-1.8.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter || true

cat <<'EOF' | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

bash

# 2) Install cAdvisor for container metrics if using Docker
docker run -d \
  --name=cadvisor \
  --restart=unless-stopped \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

python

# 3) Add a simple health + timing middleware in FastAPI
# app/main.py

import time
from fastapi import FastAPI, Request
import logging

app = FastAPI()
logger = logging.getLogger("perf")

@app.middleware("http")
async def timing_middleware(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    duration_ms = round((time.perf_counter() - start) * 1000, 2)
    logger.info("request_completed", extra={
        "path": request.url.path,
        "method": request.method,
        "status_code": response.status_code,
        "duration_ms": duration_ms,
    })
    response.headers["X-Response-Time-ms"] = str(duration_ms)
    return response

@app.get("/health")
def health():
    return {"ok": True}

bash

# 4) Verify metrics endpoints
curl http://127.0.0.1:9100/metrics | head
curl http://127.0.0.1:8080/metrics | head
curl -i http://127.0.0.1:8000/health

text

# 5) Alert on these first
- p95 latency > 1000ms for 5m
- error rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- DB connections near max

Start with host metrics, request latency, error rate, and one health endpoint. Add database and background job metrics next. Do not begin with dozens of dashboards you will not use.

What’s happening

Performance monitoring should answer four questions:

is the app up
is it fast enough
is it failing
what resource is limiting it

For a small SaaS, the useful baseline is:

RED metrics: request Rate, Errors, Duration
system metrics: CPU, memory, disk, network
database timings
worker/queue health
dependency latency

Common production slowdowns usually come from:

unindexed queries
blocked workers
insufficient CPU or RAM
oversized responses
external API latency
queue backlog
lock contention
bad deploys

Server graphs alone are not enough. Infrastructure metrics can show pressure, but they do not tell you which route, query, job, or dependency is slow.

Monitoring is operational only when it includes alerts and response paths.

Web Service

Queue

Worker

External Service

Worker Topology

Step-by-step implementation

1) Define practical service targets

Pick a small set of SLO-like thresholds.

Example:

API p95 latency: under 1000ms
5xx rate: under 2%
login endpoint p95: under 700ms
checkout endpoint p95: under 1500ms
queue delay: under 60s
uptime probe: passing every minute

Do not define dozens of targets.

2) Instrument request timing in the app

You need:

method
normalized route
status code
duration
request ID if possible

FastAPI timing middleware example:

python

import time
import uuid
import logging
from fastapi import FastAPI, Request

app = FastAPI()
logger = logging.getLogger("app")

def normalize_path(path: str) -> str:
    # Replace dynamic IDs where possible to avoid high-cardinality labels
    parts = path.strip("/").split("/")
    normalized = []
    for p in parts:
        if p.isdigit():
            normalized.append(":id")
        elif len(p) > 20 and "-" in p:
            normalized.append(":token")
        else:
            normalized.append(p)
    return "/" + "/".join(normalized) if normalized != [""] else "/"

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
    start = time.perf_counter()
    response = await call_next(request)
    duration_ms = round((time.perf_counter() - start) * 1000, 2)

    route = normalize_path(request.url.path)

    logger.info("request_completed", extra={
        "request_id": request_id,
        "method": request.method,
        "route": route,
        "status_code": response.status_code,
        "duration_ms": duration_ms,
    })

    response.headers["X-Request-ID"] = request_id
    response.headers["X-Response-Time-ms"] = str(duration_ms)
    return response

If you already have structured logging, send these fields to logs and build latency/error graphs from them. Pair this with Logging Setup (Application + Server).

3) Add host metrics

Use Node Exporter on Linux.

Verify:

bash

curl -s http://127.0.0.1:9100/metrics | head -50

Track at minimum:

CPU utilization
load average
memory usage
swap usage
disk usage
disk IO wait
filesystem inode usage
network throughput
open file descriptors

If disk or memory pressure appears, continue with High CPU / Memory Usage.

4) Add container metrics if using Docker

Use cAdvisor for container-level CPU, memory, filesystem, and network visibility.

Run:

bash

docker run -d \
  --name=cadvisor \
  --restart=unless-stopped \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Verify:

bash

curl -s http://127.0.0.1:8080/metrics | head -50
docker stats

5) Add health and readiness endpoints

Use a simple uptime endpoint and, if needed, a dependency-aware readiness endpoint.

FastAPI example:

python

from fastapi import FastAPI
import os
import psycopg

app = FastAPI()

@app.get("/health")
def health():
    return {"ok": True}

@app.get("/ready")
def ready():
    db_url = os.environ["DATABASE_URL"]
    with psycopg.connect(db_url) as conn:
        with conn.cursor() as cur:
            cur.execute("select 1;")
            cur.fetchone()
    return {"ready": True}

Use /health for lightweight uptime checks. Use /ready when the service should only receive traffic if dependencies are working.

6) Monitor database performance

For PostgreSQL, enable pg_stat_statements and inspect slow queries.

Enable extension:

sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Check active connections and long-running queries:

bash

psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"

Check expensive queries:

bash

psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Monitor:

query duration
slow query count
connection pool usage
active connections
lock waits
deadlocks
cache hit ratio
replication lag if applicable

If request latency spikes together with DB query time, inspect indexes and transaction duration first.

7) Monitor workers and queues

If you use Celery, RQ, Sidekiq-equivalent services, or custom workers, measure:

queue depth
job age
processing time
retries
failures
worker count
broker connection health

Celery quick checks:

bash

celery -A app inspect active
celery -A app inspect reserved
redis-cli info

A healthy API can still feel broken if jobs are backed up.

8) Monitor reverse proxy and API edges

For Nginx, track:

request count
upstream response time
active connections
4xx count
5xx count
timeout count

Validation commands:

bash

nginx -t
journalctl -u nginx -n 200 --no-pager
ss -tulpn
ss -s

For endpoint-level monitoring and abuse control, also use API Monitoring and Rate Limits.

9) Build only a few dashboards

Recommended dashboard set:

Overview
- uptime
- request rate
- 5xx rate
- p95 latency
- CPU
- memory
- disk
- queue depth
API performance
- p50/p95/p99 by endpoint group
- 4xx/5xx by endpoint group
- in-flight requests
- response size
Database
- active connections
- slow queries
- locks
- top queries
- replication lag
Workers
- pending jobs
- processing time
- retries
- failures

10) Add alerts with actionability

Start with these alerts only:

text

- API p95 latency > 1000ms for 5m
- API 5xx rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- queue depth above normal baseline for 10m
- DB connections near max
- health check failure

Rules:

alert on sustained problems, not one-off spikes
page only for user-impacting or revenue-impacting failures
include remediation notes
include dashboard link
include runbook link

Use Debugging Production Issues as the incident workflow and tie metric alerts to fix pages.

11) Validate the monitoring

Test each signal.

Examples:

bash

# basic health
curl -i http://127.0.0.1:8000/health

# quick synthetic load
ab -n 200 -c 20 http://127.0.0.1:8000/health

# system pressure
top
htop
free -m
vmstat 1 5
iostat -xz 1 5

Check whether:

latency increases are visible
error spikes are visible
dashboards update
alerts trigger
runbooks are linked
notifications reach the right channel

12) Review weekly

Keep the monitoring set small.

Review:

noisy alerts
broken dashboards
useless metrics
missing metrics discovered from incidents
threshold changes after product growth

If a metric does not change an operational decision, remove it.

Common causes

Most monitoring gaps or false confidence come from these issues:

no request timing instrumentation in the app
monitoring only uptime, not latency or error rate
slow database queries or missing indexes
CPU saturation from too few workers, loops, or heavy serialization
memory pressure from leaks, large caches, or oversized payloads
disk pressure from logs, uploads, backups, or missing rotation
external API latency affecting auth, payments, email, storage, or webhooks
noisy alerts with poor thresholds
dynamic URL labels creating high-cardinality metrics
no correlation between app logs, proxy logs, and request IDs

Debugging tips

Use these checks during incidents.

Basic endpoint and exporter checks

bash

curl -i http://127.0.0.1:8000/health
curl -s http://127.0.0.1:9100/metrics | head -50
curl -s http://127.0.0.1:8080/metrics | head -50

System resource checks

bash

top
htop
free -m
vmstat 1 5
iostat -xz 1 5
df -h
du -sh /var/log/* | sort -h
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

Network and service checks

bash

ss -tulpn
ss -s
journalctl -u nginx -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
nginx -t

Docker checks

bash

docker stats
docker logs --tail=200 <container_name>

Redis and worker checks

bash

redis-cli info
celery -A app inspect active
celery -A app inspect reserved

PostgreSQL checks

bash

psql "$DATABASE_URL" -c "select now();"
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Quick synthetic load

bash

ab -n 200 -c 20 http://127.0.0.1:8000/health

Practical interpretation:

low CPU + high latency often means IO wait, DB locks, or external dependency slowdown
rising memory over time suggests leaks, unbounded caches, or oversized jobs
bad p95 with normal p50 points to outliers, not constant slowness
error spikes right after deploys usually indicate regressions or config mismatch
queue depth growth with healthy API latency points to worker-side issues

Pair metrics with exception visibility using Error Tracking with Sentry.

Checklist

✓ /health endpoint exists and is monitored externally
✓ /ready exists if dependency-aware routing is needed
✓ request duration is measured for all API requests
✓ path labels are normalized to avoid high-cardinality metrics
✓ 5xx error rate is graphed and alerted
✓ CPU, memory, disk, and load average are graphed
✓ database slow queries are enabled and reviewed
✓ queue depth and failed jobs are monitored
✓ reverse proxy timings are available
✓ dashboards exist for app, DB, worker, and proxy
✓ alerts use actionable thresholds
✓ alerts link to runbooks or fix pages
✓ metrics retention matches budget and troubleshooting needs
✓ monitoring is tested after deployment
✓ critical flows like login, signup, checkout, and webhooks are monitored
✓ production checks are included in SaaS Production Checklist

Related guides

FAQ

What should I monitor first for a small SaaS?

Start with:

request latency
5xx error rate
request throughput
CPU
memory
disk usage
database slow queries
queue backlog

These cover most incidents without overcomplicating setup.

Do I need Prometheus and Grafana immediately?

No. You can start with hosted monitoring, cloud metrics, or structured logs plus a few exporters. Add Prometheus and Grafana when you need more control or self-hosted metrics.

What latency metric matters most?

p95 is the most practical main signal for most small SaaS apps. p50 hides outliers. p99 is often too noisy early on. Track p95 per critical endpoint group.

How often should alerts fire?

Use sustained windows like 5 to 10 minutes for most alerts. Immediate alerts are appropriate for hard-down conditions or critical flows such as login or checkout failure.

Should I monitor every route separately?

No. Group routes by endpoint type or normalize dynamic paths. Monitoring every unique path creates high-cardinality metrics and expensive dashboards.

How do metrics differ from error tracking?

Metrics show aggregate behavior over time, such as latency, throughput, and error rate. Error tracking captures individual exceptions, stack traces, release impact, and affected users. In production, you usually want both.

Final takeaway

You do not need enterprise observability on day one.

You do need a small, reliable baseline:

latency
errors
throughput
host resources
database timings
queue health
critical endpoint visibility

Measure what helps you detect regressions and act quickly. If an alert does not change what you do next, remove it or rewrite it.