API Monitoring and Rate Limits
The essential playbook for implementing api monitoring and rate limits in your SaaS.
This page covers the minimum production setup for monitoring API health and enforcing rate limits. The goal is to track request volume, latency, status codes, and abuse patterns, then apply rate limits that protect your app without blocking normal users.
Quick Fix / Quick Setup
Start with a working baseline:
/healthendpoint for uptime checks- request logging on every API call
- Prometheus-compatible metrics
- global rate limit
- stricter limits on auth and expensive routes
- Redis-backed counters if you run multiple instances
FastAPI example:
pip install slowapi prometheus-fastapi-instrumentator redisfrom fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address
from prometheus_fastapi_instrumentator import Instrumentator
app = FastAPI()
limiter = Limiter(
key_func=get_remote_address,
default_limits=["60/minute"]
)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
Instrumentator().instrument(app).expose(app)
@app.get("/health")
def health():
return {"ok": True}
@app.get("/api/data")
@limiter.limit("30/minute")
def get_data(request: Request):
return {"status": "ok"}Minimum production note:
- use per-IP limits first
- expose
/metrics - add a dedicated
/health - switch to Redis-backed shared rate limits if traffic goes through multiple app instances or containers
Basic request logging middleware example:
import time
import uuid
import logging
from fastapi import Request
logger = logging.getLogger("api")
@app.middleware("http")
async def log_requests(request: Request, call_next):
request_id = str(uuid.uuid4())
start = time.time()
response = None
try:
response = await call_next(request)
return response
finally:
duration_ms = round((time.time() - start) * 1000, 2)
logger.info(
"api_request",
extra={
"request_id": request_id,
"method": request.method,
"path": request.url.path,
"query": str(request.url.query),
"status_code": getattr(response, "status_code", 500),
"duration_ms": duration_ms,
"client_ip": request.client.host if request.client else None,
"user_agent": request.headers.get("user-agent"),
},
)Recommended first alerts:
- 5xx error rate spike
- p95 latency spike
- health check failures
- sudden increase in 429 responses
diagram showing request flow: client -> reverse proxy -> app -> Redis limiter -> metrics/logging -> dashboard/alerts
What’s happening
Without API monitoring, you do not see rising error rates, slow endpoints, burst traffic, or abusive clients until users report problems.
Without rate limits, one client, bot, or buggy script can consume worker capacity, increase database load, and degrade service for everyone.
Basic health checks are not enough. You need endpoint-level visibility:
- request count
- status code distribution
- latency percentiles
- 429 rate-limit responses
- top endpoints by traffic
- top clients by request volume
Rate limiting must match your architecture:
- in-memory counters are acceptable only for a single process or single instance
- multiple instances require shared state such as Redis or an API gateway
- proxy-aware client IP extraction must be configured correctly, or all traffic may collapse into one bucket
Step-by-step implementation
1. Define what to measure
Track these first:
- requests per minute
- status code counts: 2xx, 4xx, 5xx
- p50, p95, and p99 latency
- rate-limit hits
- top endpoints by volume
- top failing endpoints
- active workers or process saturation
- upstream dependency latency if your API calls other services
A practical minimum dashboard should answer:
- Is the API up?
- Which endpoints are slow?
- Are 5xx errors increasing?
- Are clients hitting rate limits?
- Is one client causing burst load?
2. Add request logging middleware
Every request log should include:
request_iduser_idif authenticatedapi_key_idor token identifier if applicableclient_ipmethodpathstatus_codeduration_msuser_agent
Example with user ID support:
@app.middleware("http")
async def log_requests(request: Request, call_next):
start = time.time()
request_id = str(uuid.uuid4())
request.state.request_id = request_id
response = None
try:
response = await call_next(request)
return response
finally:
duration_ms = round((time.time() - start) * 1000, 2)
user_id = getattr(getattr(request, "state", None), "user_id", None)
logger.info(
"api_request",
extra={
"request_id": request_id,
"user_id": user_id,
"client_ip": request.client.host if request.client else None,
"method": request.method,
"path": request.url.path,
"status_code": getattr(response, "status_code", 500),
"duration_ms": duration_ms,
"user_agent": request.headers.get("user-agent"),
},
)If you are behind Nginx or a load balancer, make sure forwarded IP headers are set correctly.
Nginx example:
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;If using trusted proxy handling, only trust known proxies. Do not blindly trust public X-Forwarded-For.
3. Add API metrics collection
Expose metrics via /metrics or send them to your APM.
FastAPI + Prometheus:
from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app, endpoint="/metrics")Metrics to capture:
- request total by method/path/status
- latency histogram by route
- 429 responses
- exception count
- worker/process stats if available
If route labels are too granular, normalize paths. Avoid high-cardinality labels like raw user IDs or full URLs with IDs.
4. Create dashboards
Build separate dashboards for:
Throughput
- requests per minute
- requests by endpoint
- requests by status family
Errors
- 5xx rate
- 4xx rate
- top failing routes
- exception counts
Latency
- p50/p95/p99 by endpoint
- slowest endpoints
- latency after deploys
Abuse and limits
- total 429s
- 429s by route
- top IPs or API keys by request volume
- burst traffic patterns
Suggested visual:
- dashboard mockup showing throughput, latency, 5xx, and 429 panels
5. Add health and readiness checks
Use at least one lightweight health endpoint:
@app.get("/health")
def health():
return {"ok": True}If needed, add a deeper readiness check:
@app.get("/ready")
def ready():
# verify DB, Redis, queue, or upstream dependencies
return {"ready": True}Use /health for simple uptime checks.
Use /ready when you want deployment systems or orchestrators to verify dependencies.
Do not make your public uptime probe depend on every downstream service unless you intentionally want full dependency failure to mark the app unhealthy.
6. Implement baseline rate limits
Start with a global limit, then override sensitive routes.
Example:
- global:
60/minuteper IP - login:
5/minuteper IP - signup:
5/minuteper IP - password reset:
3/minuteper IP - expensive search/report route:
10/minuteper user or API key
FastAPI route-specific example:
@app.post("/auth/login")
@limiter.limit("5/minute")
def login(request: Request):
return {"ok": True}
@app.post("/auth/password-reset")
@limiter.limit("3/minute")
def password_reset(request: Request):
return {"ok": True}
@app.get("/api/search")
@limiter.limit("10/minute")
def search(request: Request):
return {"results": []}Protect these more aggressively:
- login
- signup
- password reset
- magic link
- OTP verification
- search
- export/report generation
- AI or compute-heavy routes
- public API endpoints
7. Use Redis for shared counters in multi-instance deployments
If you run:
- multiple containers
- multiple app replicas
- autoscaling workers
- load-balanced traffic
do not use local in-memory counters.
Use Redis-backed rate limiting or your API gateway.
Redis checks:
redis-cli PING
redis-cli KEYS '*rate*'
redis-cli MONITOROperational rule:
- one instance: in-process can work temporarily
- more than one instance: shared backend required
8. Return proper 429 responses
Clients need a predictable error response.
Example response:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json{
"error": "rate_limit_exceeded",
"message": "Too many requests. Retry later."
}If supported by your stack, include:
X-RateLimit-LimitX-RateLimit-RemainingRetry-After
This helps frontend and API consumers back off correctly.
9. Handle exemptions deliberately
Do not accidentally exempt routes.
Common cases:
- health checks
- internal admin routes
- webhook endpoints
- trusted internal automation
Webhook endpoints can be exempted, but only if you also have:
- signature verification
- idempotency handling
- logging
- replay protection where relevant
If you need help debugging webhook behavior, see Payment Webhooks Failing.
10. Add alerts
Start with these:
- health check failures
- 5xx error rate spike
- p95 latency above threshold
- sudden jump in 429s
- sudden traffic spike from a single route or token
Practical alert examples:
- 5xx > 2% for 5 minutes
- p95 latency > 1000ms for 10 minutes
- 429 count > baseline for 10 minutes
/healthfailing from 2 regions
Use alerts that map to action. Avoid noisy alerts with no owner.
For alert setup patterns, see Alerting System Setup.
11. Review and tune limits with real data
Do not guess forever. Review weekly:
- top routes by traffic
- top 429-producing routes
- clients frequently near limit
- shared-IP false positives
- frontend retry loops or polling
If real users behind NAT or office networks hit limits, move some rules from IP-only to layered keys:
- per IP
- per user ID
- per API key
- per route
12. Document your policy
Document:
- global default limits
- route-specific overrides
- 429 response behavior
- retry guidance
- authenticated vs anonymous policy
- webhook exemptions
- support path for higher API limits
If you expose auth endpoints publicly, combine this page with your auth hardening work:
Common causes
- No request-level metrics or logs, so API failures are only noticed through support tickets
- Rate limiting stored in local memory while running multiple app instances, causing inconsistent enforcement
- Reverse proxy headers not configured correctly, so all requests appear from one IP
- Frontend polling, retries, or websocket fallback logic causing accidental request floods
- No per-route overrides, so login or search endpoints are either unprotected or overly restricted
- 429 responses returned without
Retry-Afteror clear body, making clients retry incorrectly - Webhook endpoints accidentally rate-limited, causing delivery failures from payment providers
- Metrics middleware registered incorrectly or after exception handlers, so failed requests are not counted
Debugging tips
Run these first:
curl -i https://yourapp.com/health
curl -i https://yourapp.com/metrics
curl -I https://yourapp.com/api/dataTrigger the limiter:
for i in {1..70}; do
curl -s -o /dev/null -w "%{http_code}\n" https://yourapp.com/api/data
doneInspect Redis-backed counters:
redis-cli KEYS '*rate*'
redis-cli MONITORCheck app logs:
journalctl -u gunicorn -n 200 --no-pager
docker logs <app_container> --tail 200Check proxy forwarding config:
nginx -T | grep -i 'real_ip\|proxy_set_header\|x-forwarded-for'Test forwarded IP behavior carefully:
curl -H 'X-Forwarded-For: 1.2.3.4' -i https://yourapp.com/api/dataGenerate load:
ab -n 200 -c 20 https://yourapp.com/api/data
hey -n 500 -c 50 https://yourapp.com/api/dataWhat to look for:
- all requests using the same IP bucket
- 429s not appearing in metrics
- failed requests missing from logs
- high latency with low request volume
- route-specific decorators not being applied
- Redis not being reached by all app instances
If metrics look low but users report slowness, check:
- database latency
- upstream API latency
- worker saturation
- queue depth
- connection pool exhaustion
If you need broader logging coverage, see Logging Setup (Application + Server).
If you need dashboard and latency guidance, see Metrics and Performance Monitoring.
If you need incident handling after alerts fire, see Incident Response Playbook.
Checklist
- ✓ Health endpoint exists and is monitored
- ✓ Request logs include request ID, IP, user ID, path, status, and duration
- ✓ Metrics endpoint is exposed and scraped
- ✓ Dashboard shows request rate, error rate, and latency percentiles
- ✓ Alerts exist for 5xx spikes, latency spikes, and 429 spikes
- ✓ Default global rate limit is configured
- ✓ Sensitive endpoints have stricter custom limits
- ✓ Shared backend like Redis is used for multi-instance deployments
- ✓ Proxy headers are trusted only from known proxies
- ✓ 429 responses include useful headers or error details
- ✓ Webhook and internal routes are handled deliberately, not accidentally exempted
- ✓ Rate limits are documented for frontend and API clients
Product CTA
Use a reusable production setup to standardize:
- health checks
- request logging
- metrics
- alerting
- shared rate limiting
This avoids ad hoc middleware and inconsistent abuse protection across endpoints. The goal is one repeatable API baseline for every MVP and small SaaS service.
Related guides
- Logging Setup (Application + Server)
- Metrics and Performance Monitoring
- Alerting System Setup
- Incident Response Playbook
- SaaS Production Checklist
FAQ
Should I rate-limit authenticated and anonymous traffic differently?
Yes. Anonymous traffic is usually best limited by IP. Authenticated traffic should also use user ID or API key to avoid punishing shared IPs and to stop one account from abusing the API.
Do I need Redis for rate limiting?
If you run a single instance, in-process limits can work temporarily. If you run multiple instances, containers, or autoscaling, use Redis or a gateway-level limiter so counters are shared.
What should I alert on first?
Start with 5xx error rate, p95 latency, uptime failures, and sudden increases in 429 responses. These catch most API incidents early.
Can I exclude webhook endpoints from rate limits?
Yes, but do it intentionally. Payment or third-party webhooks should be protected with signature verification and logging if they are exempted from normal limits.
Should rate limiting happen in the app or at the reverse proxy?
For small SaaS, app-level is easier to start. For higher traffic or multi-service APIs, use gateway or proxy limits plus app-level limits on sensitive routes.
What is a safe default limit?
Start with conservative defaults like 60 requests/minute per IP for general endpoints, then tune from production traffic patterns.
Why am I seeing too many 429s from legitimate users?
Common causes are frontend retry loops, polling too frequently, office/shared IPs, and limits applied to static or health routes by mistake.
Final takeaway
Minimum viable API monitoring is:
- request logs
- latency and error dashboards
- health checks
- actionable alerts
Minimum viable rate limiting is:
- one global default
- stricter limits on auth and expensive endpoints
- shared storage or gateway enforcement for multi-instance deployments
Before trusting your numbers, verify:
- proxy/IP handling
- metrics coverage
- failed request logging
- 429 response behavior
Treat 429 spikes and latency increases as production signals. Tune both client behavior and server capacity.