API Monitoring and Rate Limits — SaaS Builder Playbooks

This page covers the minimum production setup for monitoring API health and enforcing rate limits. The goal is to track request volume, latency, status codes, and abuse patterns, then apply rate limits that protect your app without blocking normal users.

Quick Fix / Quick Setup

Start with a working baseline:

/health endpoint for uptime checks
request logging on every API call
Prometheus-compatible metrics
global rate limit
stricter limits on auth and expensive routes
Redis-backed counters if you run multiple instances

FastAPI example:

bash

pip install slowapi prometheus-fastapi-instrumentator redis

python

from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

limiter = Limiter(
    key_func=get_remote_address,
    default_limits=["60/minute"]
)

app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

Instrumentator().instrument(app).expose(app)

@app.get("/health")
def health():
    return {"ok": True}

@app.get("/api/data")
@limiter.limit("30/minute")
def get_data(request: Request):
    return {"status": "ok"}

Minimum production note:

use per-IP limits first
expose /metrics
add a dedicated /health
switch to Redis-backed shared rate limits if traffic goes through multiple app instances or containers

Basic request logging middleware example:

python

import time
import uuid
import logging
from fastapi import Request

logger = logging.getLogger("api")

@app.middleware("http")
async def log_requests(request: Request, call_next):
    request_id = str(uuid.uuid4())
    start = time.time()

    response = None
    try:
        response = await call_next(request)
        return response
    finally:
        duration_ms = round((time.time() - start) * 1000, 2)
        logger.info(
            "api_request",
            extra={
                "request_id": request_id,
                "method": request.method,
                "path": request.url.path,
                "query": str(request.url.query),
                "status_code": getattr(response, "status_code", 500),
                "duration_ms": duration_ms,
                "client_ip": request.client.host if request.client else None,
                "user_agent": request.headers.get("user-agent"),
            },
        )

Recommended first alerts:

5xx error rate spike
p95 latency spike
health check failures
sudden increase in 429 responses

client

reverse proxy

app

Redis limiter

metrics/logging

dashboard/alerts

→

diagram showing request flow: client -> reverse proxy -> app -> Redis limiter -> metrics/logging -> dashboard/alerts

What’s happening

Without API monitoring, you do not see rising error rates, slow endpoints, burst traffic, or abusive clients until users report problems.

Without rate limits, one client, bot, or buggy script can consume worker capacity, increase database load, and degrade service for everyone.

Basic health checks are not enough. You need endpoint-level visibility:

request count
status code distribution
latency percentiles
429 rate-limit responses
top endpoints by traffic
top clients by request volume

Rate limiting must match your architecture:

in-memory counters are acceptable only for a single process or single instance
multiple instances require shared state such as Redis or an API gateway
proxy-aware client IP extraction must be configured correctly, or all traffic may collapse into one bucket

Step-by-step implementation

1. Define what to measure

Track these first:

requests per minute
status code counts: 2xx, 4xx, 5xx
p50, p95, and p99 latency
rate-limit hits
top endpoints by volume
top failing endpoints
active workers or process saturation
upstream dependency latency if your API calls other services

A practical minimum dashboard should answer:

Is the API up?
Which endpoints are slow?
Are 5xx errors increasing?
Are clients hitting rate limits?
Is one client causing burst load?

2. Add request logging middleware

Every request log should include:

request_id
user_id if authenticated
api_key_id or token identifier if applicable
client_ip
method
path
status_code
duration_ms
user_agent

Example with user ID support:

python

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.time()
    request_id = str(uuid.uuid4())
    request.state.request_id = request_id

    response = None
    try:
        response = await call_next(request)
        return response
    finally:
        duration_ms = round((time.time() - start) * 1000, 2)
        user_id = getattr(getattr(request, "state", None), "user_id", None)

        logger.info(
            "api_request",
            extra={
                "request_id": request_id,
                "user_id": user_id,
                "client_ip": request.client.host if request.client else None,
                "method": request.method,
                "path": request.url.path,
                "status_code": getattr(response, "status_code", 500),
                "duration_ms": duration_ms,
                "user_agent": request.headers.get("user-agent"),
            },
        )

If you are behind Nginx or a load balancer, make sure forwarded IP headers are set correctly.

Nginx example:

nginx

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

If using trusted proxy handling, only trust known proxies. Do not blindly trust public X-Forwarded-For.

3. Add API metrics collection

Expose metrics via /metrics or send them to your APM.

FastAPI + Prometheus:

python

from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app, endpoint="/metrics")

Metrics to capture:

request total by method/path/status
latency histogram by route
429 responses
exception count
worker/process stats if available

If route labels are too granular, normalize paths. Avoid high-cardinality labels like raw user IDs or full URLs with IDs.

4. Create dashboards

Build separate dashboards for:

Throughput

requests per minute
requests by endpoint
requests by status family

Errors

5xx rate
4xx rate
top failing routes
exception counts

Latency

p50/p95/p99 by endpoint
slowest endpoints
latency after deploys

Abuse and limits

total 429s
429s by route
top IPs or API keys by request volume
burst traffic patterns

5. Add health and readiness checks

Use at least one lightweight health endpoint:

python

@app.get("/health")
def health():
    return {"ok": True}

If needed, add a deeper readiness check:

python

@app.get("/ready")
def ready():
    # verify DB, Redis, queue, or upstream dependencies
    return {"ready": True}

Use /health for simple uptime checks. Use /ready when you want deployment systems or orchestrators to verify dependencies.

Do not make your public uptime probe depend on every downstream service unless you intentionally want full dependency failure to mark the app unhealthy.

6. Implement baseline rate limits

Start with a global limit, then override sensitive routes.

Example:

global: 60/minute per IP
login: 5/minute per IP
signup: 5/minute per IP
password reset: 3/minute per IP
expensive search/report route: 10/minute per user or API key

FastAPI route-specific example:

python

@app.post("/auth/login")
@limiter.limit("5/minute")
def login(request: Request):
    return {"ok": True}

@app.post("/auth/password-reset")
@limiter.limit("3/minute")
def password_reset(request: Request):
    return {"ok": True}

@app.get("/api/search")
@limiter.limit("10/minute")
def search(request: Request):
    return {"results": []}

Protect these more aggressively:

login
signup
password reset
magic link
OTP verification
search
export/report generation
AI or compute-heavy routes
public API endpoints

7. Use Redis for shared counters in multi-instance deployments

If you run:

multiple containers
multiple app replicas
autoscaling workers
load-balanced traffic

do not use local in-memory counters.

Use Redis-backed rate limiting or your API gateway.

Redis checks:

bash

redis-cli PING
redis-cli KEYS '*rate*'
redis-cli MONITOR

Operational rule:

one instance: in-process can work temporarily
more than one instance: shared backend required

8. Return proper 429 responses

Clients need a predictable error response.

Example response:

http

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json

json

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Retry later."
}

If supported by your stack, include:

X-RateLimit-Limit
X-RateLimit-Remaining
Retry-After

This helps frontend and API consumers back off correctly.

9. Handle exemptions deliberately

Do not accidentally exempt routes.

Common cases:

health checks
internal admin routes
webhook endpoints
trusted internal automation

Webhook endpoints can be exempted, but only if you also have:

signature verification
idempotency handling
logging
replay protection where relevant

If you need help debugging webhook behavior, see Payment Webhooks Failing.

10. Add alerts

Start with these:

health check failures
5xx error rate spike
p95 latency above threshold
sudden jump in 429s
sudden traffic spike from a single route or token

Practical alert examples:

5xx > 2% for 5 minutes
p95 latency > 1000ms for 10 minutes
429 count > baseline for 10 minutes
/health failing from 2 regions

Use alerts that map to action. Avoid noisy alerts with no owner.

For alert setup patterns, see Alerting System Setup.

11. Review and tune limits with real data

Do not guess forever. Review weekly:

top routes by traffic
top 429-producing routes
clients frequently near limit
shared-IP false positives
frontend retry loops or polling

If real users behind NAT or office networks hit limits, move some rules from IP-only to layered keys:

per IP
per user ID
per API key
per route

12. Document your policy

Document:

global default limits
route-specific overrides
429 response behavior
retry guidance
authenticated vs anonymous policy
webhook exemptions
support path for higher API limits

If you expose auth endpoints publicly, combine this page with your auth hardening work:

SaaS Production Checklist

Common causes

No request-level metrics or logs, so API failures are only noticed through support tickets
Rate limiting stored in local memory while running multiple app instances, causing inconsistent enforcement
Reverse proxy headers not configured correctly, so all requests appear from one IP
Frontend polling, retries, or websocket fallback logic causing accidental request floods
No per-route overrides, so login or search endpoints are either unprotected or overly restricted
429 responses returned without Retry-After or clear body, making clients retry incorrectly
Webhook endpoints accidentally rate-limited, causing delivery failures from payment providers
Metrics middleware registered incorrectly or after exception handlers, so failed requests are not counted

Debugging tips

Run these first:

bash

curl -i https://yourapp.com/health
curl -i https://yourapp.com/metrics
curl -I https://yourapp.com/api/data

Trigger the limiter:

bash

for i in {1..70}; do
  curl -s -o /dev/null -w "%{http_code}\n" https://yourapp.com/api/data
done

Inspect Redis-backed counters:

bash

redis-cli KEYS '*rate*'
redis-cli MONITOR

Check app logs:

bash

journalctl -u gunicorn -n 200 --no-pager
docker logs <app_container> --tail 200

Check proxy forwarding config:

bash

nginx -T | grep -i 'real_ip\|proxy_set_header\|x-forwarded-for'

Test forwarded IP behavior carefully:

bash

curl -H 'X-Forwarded-For: 1.2.3.4' -i https://yourapp.com/api/data

Generate load:

bash

ab -n 200 -c 20 https://yourapp.com/api/data
hey -n 500 -c 50 https://yourapp.com/api/data

What to look for:

all requests using the same IP bucket
429s not appearing in metrics
failed requests missing from logs
high latency with low request volume
route-specific decorators not being applied
Redis not being reached by all app instances

If metrics look low but users report slowness, check:

database latency
upstream API latency
worker saturation
queue depth
connection pool exhaustion

If you need broader logging coverage, see Logging Setup (Application + Server).

If you need dashboard and latency guidance, see Metrics and Performance Monitoring.

If you need incident handling after alerts fire, see Incident Response Playbook.

Checklist

✓ Health endpoint exists and is monitored
✓ Request logs include request ID, IP, user ID, path, status, and duration
✓ Metrics endpoint is exposed and scraped
✓ Dashboard shows request rate, error rate, and latency percentiles
✓ Alerts exist for 5xx spikes, latency spikes, and 429 spikes
✓ Default global rate limit is configured
✓ Sensitive endpoints have stricter custom limits
✓ Shared backend like Redis is used for multi-instance deployments
✓ Proxy headers are trusted only from known proxies
✓ 429 responses include useful headers or error details
✓ Webhook and internal routes are handled deliberately, not accidentally exempted
✓ Rate limits are documented for frontend and API clients

Product CTA

Use a reusable production setup to standardize:

health checks
request logging
metrics
alerting
shared rate limiting

This avoids ad hoc middleware and inconsistent abuse protection across endpoints. The goal is one repeatable API baseline for every MVP and small SaaS service.

Related guides

FAQ

Should I rate-limit authenticated and anonymous traffic differently?

Yes. Anonymous traffic is usually best limited by IP. Authenticated traffic should also use user ID or API key to avoid punishing shared IPs and to stop one account from abusing the API.

Do I need Redis for rate limiting?

If you run a single instance, in-process limits can work temporarily. If you run multiple instances, containers, or autoscaling, use Redis or a gateway-level limiter so counters are shared.

What should I alert on first?

Start with 5xx error rate, p95 latency, uptime failures, and sudden increases in 429 responses. These catch most API incidents early.

Can I exclude webhook endpoints from rate limits?

Yes, but do it intentionally. Payment or third-party webhooks should be protected with signature verification and logging if they are exempted from normal limits.

Should rate limiting happen in the app or at the reverse proxy?

For small SaaS, app-level is easier to start. For higher traffic or multi-service APIs, use gateway or proxy limits plus app-level limits on sensitive routes.

What is a safe default limit?

Start with conservative defaults like 60 requests/minute per IP for general endpoints, then tune from production traffic patterns.

Why am I seeing too many 429s from legitimate users?

Common causes are frontend retry loops, polling too frequently, office/shared IPs, and limits applied to static or health routes by mistake.

Final takeaway

Minimum viable API monitoring is:

request logs
latency and error dashboards
health checks
actionable alerts

Minimum viable rate limiting is:

one global default
stricter limits on auth and expensive endpoints
shared storage or gateway enforcement for multi-instance deployments

Before trusting your numbers, verify:

proxy/IP handling
metrics coverage
failed request logging
429 response behavior

Treat 429 spikes and latency increases as production signals. Tune both client behavior and server capacity.