Metrics and Performance Monitoring

The essential playbook for implementing metrics and performance monitoring in your SaaS.

Intro

Set up a small but production-ready monitoring baseline for SaaS performance.

Track:

  • request latency
  • throughput
  • error rate
  • CPU
  • memory
  • disk
  • database timings
  • worker/queue health
  • critical business endpoints

The goal is to detect slow requests, resource bottlenecks, and regressions before users report them. Do not build a full observability stack too early.

Quick Fix / Quick Setup

Use this as a minimal baseline on a Linux VPS.

bash
# 1) Install Node Exporter for host metrics
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-*.linux-amd64.tar.gz
tar -xvf node_exporter-*.linux-amd64.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter || true

cat <<'EOF' | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
bash
# 2) Install cAdvisor for container metrics if using Docker
docker run -d \
  --name=cadvisor \
  --restart=unless-stopped \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest
python
# 3) Add a simple health + timing middleware in FastAPI
# app/main.py

import time
from fastapi import FastAPI, Request
import logging

app = FastAPI()
logger = logging.getLogger("perf")

@app.middleware("http")
async def timing_middleware(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    duration_ms = round((time.perf_counter() - start) * 1000, 2)
    logger.info("request_completed", extra={
        "path": request.url.path,
        "method": request.method,
        "status_code": response.status_code,
        "duration_ms": duration_ms,
    })
    response.headers["X-Response-Time-ms"] = str(duration_ms)
    return response

@app.get("/health")
def health():
    return {"ok": True}
bash
# 4) Verify metrics endpoints
curl http://127.0.0.1:9100/metrics | head
curl http://127.0.0.1:8080/metrics | head
curl -i http://127.0.0.1:8000/health
text
# 5) Alert on these first
- p95 latency > 1000ms for 5m
- error rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- DB connections near max

Start with host metrics, request latency, error rate, and one health endpoint. Add database and background job metrics next. Do not begin with dozens of dashboards you will not use.

What’s happening

Performance monitoring should answer four questions:

  • is the app up
  • is it fast enough
  • is it failing
  • what resource is limiting it

For a small SaaS, the useful baseline is:

  • RED metrics: request Rate, Errors, Duration
  • system metrics: CPU, memory, disk, network
  • database timings
  • worker/queue health
  • dependency latency

Common production slowdowns usually come from:

  • unindexed queries
  • blocked workers
  • insufficient CPU or RAM
  • oversized responses
  • external API latency
  • queue backlog
  • lock contention
  • bad deploys

Server graphs alone are not enough. Infrastructure metrics can show pressure, but they do not tell you which route, query, job, or dependency is slow.

Monitoring is operational only when it includes alerts and response paths.

Web Service
Queue
Worker
External Service

Worker Topology

Step-by-step implementation

1) Define practical service targets

Pick a small set of SLO-like thresholds.

Example:

  • API p95 latency: under 1000ms
  • 5xx rate: under 2%
  • login endpoint p95: under 700ms
  • checkout endpoint p95: under 1500ms
  • queue delay: under 60s
  • uptime probe: passing every minute

Do not define dozens of targets.

2) Instrument request timing in the app

You need:

  • method
  • normalized route
  • status code
  • duration
  • request ID if possible

FastAPI timing middleware example:

python
import time
import uuid
import logging
from fastapi import FastAPI, Request

app = FastAPI()
logger = logging.getLogger("app")

def normalize_path(path: str) -> str:
    # Replace dynamic IDs where possible to avoid high-cardinality labels
    parts = path.strip("/").split("/")
    normalized = []
    for p in parts:
        if p.isdigit():
            normalized.append(":id")
        elif len(p) > 20 and "-" in p:
            normalized.append(":token")
        else:
            normalized.append(p)
    return "/" + "/".join(normalized) if normalized != [""] else "/"

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
    start = time.perf_counter()
    response = await call_next(request)
    duration_ms = round((time.perf_counter() - start) * 1000, 2)

    route = normalize_path(request.url.path)

    logger.info("request_completed", extra={
        "request_id": request_id,
        "method": request.method,
        "route": route,
        "status_code": response.status_code,
        "duration_ms": duration_ms,
    })

    response.headers["X-Request-ID"] = request_id
    response.headers["X-Response-Time-ms"] = str(duration_ms)
    return response

If you already have structured logging, send these fields to logs and build latency/error graphs from them. Pair this with Logging Setup (Application + Server).

3) Add host metrics

Use Node Exporter on Linux.

Verify:

bash
curl -s http://127.0.0.1:9100/metrics | head -50

Track at minimum:

  • CPU utilization
  • load average
  • memory usage
  • swap usage
  • disk usage
  • disk IO wait
  • filesystem inode usage
  • network throughput
  • open file descriptors

If disk or memory pressure appears, continue with High CPU / Memory Usage.

4) Add container metrics if using Docker

Use cAdvisor for container-level CPU, memory, filesystem, and network visibility.

Run:

bash
docker run -d \
  --name=cadvisor \
  --restart=unless-stopped \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Verify:

bash
curl -s http://127.0.0.1:8080/metrics | head -50
docker stats

5) Add health and readiness endpoints

Use a simple uptime endpoint and, if needed, a dependency-aware readiness endpoint.

FastAPI example:

python
from fastapi import FastAPI
import os
import psycopg

app = FastAPI()

@app.get("/health")
def health():
    return {"ok": True}

@app.get("/ready")
def ready():
    db_url = os.environ["DATABASE_URL"]
    with psycopg.connect(db_url) as conn:
        with conn.cursor() as cur:
            cur.execute("select 1;")
            cur.fetchone()
    return {"ready": True}

Use /health for lightweight uptime checks. Use /ready when the service should only receive traffic if dependencies are working.

6) Monitor database performance

For PostgreSQL, enable pg_stat_statements and inspect slow queries.

Enable extension:

sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Check active connections and long-running queries:

bash
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"

Check expensive queries:

bash
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Monitor:

  • query duration
  • slow query count
  • connection pool usage
  • active connections
  • lock waits
  • deadlocks
  • cache hit ratio
  • replication lag if applicable

If request latency spikes together with DB query time, inspect indexes and transaction duration first.

7) Monitor workers and queues

If you use Celery, RQ, Sidekiq-equivalent services, or custom workers, measure:

  • queue depth
  • job age
  • processing time
  • retries
  • failures
  • worker count
  • broker connection health

Celery quick checks:

bash
celery -A app inspect active
celery -A app inspect reserved
redis-cli info

A healthy API can still feel broken if jobs are backed up.

8) Monitor reverse proxy and API edges

For Nginx, track:

  • request count
  • upstream response time
  • active connections
  • 4xx count
  • 5xx count
  • timeout count

Validation commands:

bash
nginx -t
journalctl -u nginx -n 200 --no-pager
ss -tulpn
ss -s

For endpoint-level monitoring and abuse control, also use API Monitoring and Rate Limits.

9) Build only a few dashboards

Recommended dashboard set:

  1. Overview

    • uptime
    • request rate
    • 5xx rate
    • p95 latency
    • CPU
    • memory
    • disk
    • queue depth
  2. API performance

    • p50/p95/p99 by endpoint group
    • 4xx/5xx by endpoint group
    • in-flight requests
    • response size
  3. Database

    • active connections
    • slow queries
    • locks
    • top queries
    • replication lag
  4. Workers

    • pending jobs
    • processing time
    • retries
    • failures

Suggested visual:

  • dashboard wireframe with one top-level overview and drill-down panels for API, DB, and workers

10) Add alerts with actionability

Start with these alerts only:

text
- API p95 latency > 1000ms for 5m
- API 5xx rate > 2% for 5m
- CPU > 85% for 10m
- memory > 85% for 10m
- disk usage > 80%
- queue depth above normal baseline for 10m
- DB connections near max
- health check failure

Rules:

  • alert on sustained problems, not one-off spikes
  • page only for user-impacting or revenue-impacting failures
  • include remediation notes
  • include dashboard link
  • include runbook link

Use Debugging Production Issues as the incident workflow and tie metric alerts to fix pages.

11) Validate the monitoring

Test each signal.

Examples:

bash
# basic health
curl -i http://127.0.0.1:8000/health

# quick synthetic load
ab -n 200 -c 20 http://127.0.0.1:8000/health

# system pressure
top
htop
free -m
vmstat 1 5
iostat -xz 1 5

Check whether:

  • latency increases are visible
  • error spikes are visible
  • dashboards update
  • alerts trigger
  • runbooks are linked
  • notifications reach the right channel

12) Review weekly

Keep the monitoring set small.

Review:

  • noisy alerts
  • broken dashboards
  • useless metrics
  • missing metrics discovered from incidents
  • threshold changes after product growth

If a metric does not change an operational decision, remove it.

Common causes

Most monitoring gaps or false confidence come from these issues:

  • no request timing instrumentation in the app
  • monitoring only uptime, not latency or error rate
  • slow database queries or missing indexes
  • CPU saturation from too few workers, loops, or heavy serialization
  • memory pressure from leaks, large caches, or oversized payloads
  • disk pressure from logs, uploads, backups, or missing rotation
  • external API latency affecting auth, payments, email, storage, or webhooks
  • noisy alerts with poor thresholds
  • dynamic URL labels creating high-cardinality metrics
  • no correlation between app logs, proxy logs, and request IDs

Debugging tips

Use these checks during incidents.

Basic endpoint and exporter checks

bash
curl -i http://127.0.0.1:8000/health
curl -s http://127.0.0.1:9100/metrics | head -50
curl -s http://127.0.0.1:8080/metrics | head -50

System resource checks

bash
top
htop
free -m
vmstat 1 5
iostat -xz 1 5
df -h
du -sh /var/log/* | sort -h
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

Network and service checks

bash
ss -tulpn
ss -s
journalctl -u nginx -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
nginx -t

Docker checks

bash
docker stats
docker logs --tail=200 <container_name>

Redis and worker checks

bash
redis-cli info
celery -A app inspect active
celery -A app inspect reserved

PostgreSQL checks

bash
psql "$DATABASE_URL" -c "select now();"
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, wait_event, query_start, left(query,120) from pg_stat_activity order by query_start asc;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Quick synthetic load

bash
ab -n 200 -c 20 http://127.0.0.1:8000/health

Practical interpretation:

  • low CPU + high latency often means IO wait, DB locks, or external dependency slowdown
  • rising memory over time suggests leaks, unbounded caches, or oversized jobs
  • bad p95 with normal p50 points to outliers, not constant slowness
  • error spikes right after deploys usually indicate regressions or config mismatch
  • queue depth growth with healthy API latency points to worker-side issues

Pair metrics with exception visibility using Error Tracking with Sentry.

Checklist

  • /health endpoint exists and is monitored externally
  • /ready exists if dependency-aware routing is needed
  • request duration is measured for all API requests
  • path labels are normalized to avoid high-cardinality metrics
  • 5xx error rate is graphed and alerted
  • CPU, memory, disk, and load average are graphed
  • database slow queries are enabled and reviewed
  • queue depth and failed jobs are monitored
  • reverse proxy timings are available
  • dashboards exist for app, DB, worker, and proxy
  • alerts use actionable thresholds
  • alerts link to runbooks or fix pages
  • metrics retention matches budget and troubleshooting needs
  • monitoring is tested after deployment
  • critical flows like login, signup, checkout, and webhooks are monitored
  • production checks are included in SaaS Production Checklist

Related guides

FAQ

What should I monitor first for a small SaaS?

Start with:

  • request latency
  • 5xx error rate
  • request throughput
  • CPU
  • memory
  • disk usage
  • database slow queries
  • queue backlog

These cover most incidents without overcomplicating setup.

Do I need Prometheus and Grafana immediately?

No. You can start with hosted monitoring, cloud metrics, or structured logs plus a few exporters. Add Prometheus and Grafana when you need more control or self-hosted metrics.

What latency metric matters most?

p95 is the most practical main signal for most small SaaS apps. p50 hides outliers. p99 is often too noisy early on. Track p95 per critical endpoint group.

How often should alerts fire?

Use sustained windows like 5 to 10 minutes for most alerts. Immediate alerts are appropriate for hard-down conditions or critical flows such as login or checkout failure.

Should I monitor every route separately?

No. Group routes by endpoint type or normalize dynamic paths. Monitoring every unique path creates high-cardinality metrics and expensive dashboards.

How do metrics differ from error tracking?

Metrics show aggregate behavior over time, such as latency, throughput, and error rate. Error tracking captures individual exceptions, stack traces, release impact, and affected users. In production, you usually want both.

Final takeaway

You do not need enterprise observability on day one.

You do need a small, reliable baseline:

  • latency
  • errors
  • throughput
  • host resources
  • database timings
  • queue health
  • critical endpoint visibility

Measure what helps you detect regressions and act quickly. If an alert does not change what you do next, remove it or rewrite it.