High CPU / Memory Usage
The essential playbook for implementing high cpu / memory usage in your SaaS.
Use this page when your app server becomes slow, unresponsive, or gets killed due to CPU or memory pressure. The goal is to quickly identify whether the issue is caused by application code, database load, worker concurrency, background jobs, traffic spikes, or infrastructure limits, then apply a safe remediation.
Quick Fix / Quick Setup
Start by confirming whether the spike comes from web workers, background jobs, database activity, or traffic. If memory is exhausted, reduce worker concurrency first. If CPU is saturated, identify hot endpoints, loops, or expensive queries before increasing server size.
# 1) Find top CPU and memory consumers
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
# 2) Watch live process usage
htop
# 3) Check load and memory
uptime
free -h
vmstat 1 5
# 4) Check if the app was OOM-killed
dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pager
# 5) Inspect service logs
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
# 6) Inspect container usage if using Docker
docker stats --no-stream
docker ps
docker inspect <container_id>
# 7) Check Gunicorn/Uvicorn worker count and restart if overloaded
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'
systemctl restart gunicorn
systemctl restart celery
# 8) Check for runaway DB queries
psql "$DATABASE_URL" -c "select pid, now()-query_start as duration, state, wait_event_type, query from pg_stat_activity where state <> 'idle' order by duration desc limit 10;"
# 9) Check top query cost if pg_stat_statements is enabled
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"Immediate mitigation options:
- disable noisy cron jobs
- pause non-essential workers
- reduce worker concurrency if memory bound
- enable caching for hot endpoints
- rate limit abusive routes
- scale up temporarily only if service stability requires it
What’s happening
- High CPU usually means your app is spending too much time executing code, handling too many requests, or waiting inefficiently on database/network work.
- High memory usually means too many workers, large in-memory objects, memory leaks, unbounded caches, oversized uploads, or queued jobs holding data.
- In production, the symptom may appear as slow responses, 502/504 errors, container restarts, OOM kills, or failed health checks.
- The fix is to identify the overloaded component first: web app, worker process, database, reverse proxy, or the host itself.
Step-by-step implementation
1) Measure host pressure
Run these first:
top
htop
uptime
free -h
vmstat 1 5
iostat -xz 1 5
ss -sWhat to look for:
- load average rising quickly
- available memory near zero
- swap usage increasing
- high iowait
- too many open TCP connections
- one process or service dominating CPU or RSS
If using Docker:
docker stats --no-stream
docker ps2) Identify the process consuming resources
Check which service is responsible:
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'Typical interpretations:
gunicornoruvicornhigh CPU: hot endpoints, expensive app code, too much request volumegunicornoruvicornhigh memory: too many workers, large responses, object retentionceleryorrqhigh CPU: retry storms, duplicate jobs, expensive batch workceleryorrqhigh memory: large payloads, file processing, unbounded job batchespostgreshigh CPU: slow queries, missing indexes, lock contention, pollingnginxhigh CPU: connection flood, traffic spike, abusive clients
For a single process memory breakdown:
pmap -x <PID> | tail -n 20
smem -rk3) Check for OOM kills
If the kernel is killing processes, fix memory pressure before deeper code investigation.
dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pagerIf you see OOM events:
- reduce web worker count
- reduce job worker concurrency
- stop non-essential scheduled jobs
- increase host or container memory temporarily
- lower parallel imports/exports
- confirm container memory limits are not too low
4) Verify web server worker settings
Too many workers is a common cause of memory exhaustion.
Check current process count:
ps aux | grep gunicorn
ps aux | grep uvicornExample Gunicorn service:
[Service]
ExecStart=/app/venv/bin/gunicorn app.wsgi:application \
--bind 127.0.0.1:8000 \
--workers 2 \
--threads 2 \
--timeout 60 \
--max-requests 1000 \
--max-requests-jitter 100Guidelines:
- memory-bound app: reduce
--workers - occasional leaks: use
--max-requestsand--max-requests-jitter - CPU saturation with free memory: benchmark before increasing workers
- long requests: review endpoint design before increasing timeout
After changes:
sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo systemctl status gunicorn --no-pager5) Inspect logs for traffic and endpoint patterns
Look for one route, one tenant, one IP, or one task driving the spike.
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
tail -n 200 /var/log/nginx/access.log
tail -n 200 /var/log/nginx/error.logUseful checks:
# Top IPs in nginx access log
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
# Top requested paths
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | headIf a small number of clients or routes is causing load:
- rate limit at Nginx
- cache hot GET endpoints
- block abusive clients
- reduce expensive synchronous work
Example Nginx rate limiting:
http {
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://127.0.0.1:8000;
}
}
}Validate and reload:
nginx -T
sudo nginx -t
sudo systemctl reload nginx6) Check database activity
High CPU or memory symptoms in the app often originate in the database.
Inspect active sessions:
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, now()-query_start as duration, query from pg_stat_activity order by duration desc limit 20;"Inspect expensive queries:
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"Common DB-driven causes:
- missing indexes
- N+1 queries
- large table scans
- repeated polling
- too many open connections
- lock contention
If connections are high, verify pooling and worker counts.
Example app-side connection pooling guidance:
- keep worker count conservative
- avoid opening one DB connection per thread without limits
- use PgBouncer if connection churn is high
7) Review background jobs and schedulers
Check workers:
ps aux | egrep 'celery|rq'
journalctl -u celery -n 200 --no-pagerLook for:
- retry storms
- duplicate queue consumers
- cron overlap
- large payload processing
- image or CSV work done entirely in memory
- jobs fetching entire datasets
Immediate mitigations:
- lower concurrency
- stop one queue temporarily
- disable one noisy scheduled task
- break large jobs into smaller batches
- enforce idempotency on retries
Example Celery service with lower concurrency:
[Service]
ExecStart=/app/venv/bin/celery -A app worker --loglevel=INFO --concurrency=2Restart after changes:
sudo systemctl restart celery
sudo systemctl status celery --no-pager8) Check payload size, uploads, and response shape
Memory spikes often come from loading too much into RAM.
Review whether you are:
- reading whole CSV files into memory
- building large JSON responses
- exporting large datasets without streaming
- processing full images synchronously in request handlers
- returning unpaginated API lists
Safer patterns:
- stream uploads and downloads
- paginate responses
- chunk batch processing
- move heavy processing to workers
- store large files in object storage instead of local RAM-heavy processing
9) Apply immediate mitigations
Use the least risky change that restores service.
Priority order:
- reduce concurrency
- stop non-essential background work
- add rate limits for expensive routes
- cache high-read endpoints
- scale vertically if needed
- roll back a recent deploy if regression is suspected
Example systemd restart sequence:
sudo systemctl restart gunicorn
sudo systemctl restart celery
sudo systemctl reload nginxContainer restart:
docker restart <container_id>Do not rely on restart alone unless you already captured enough evidence.
10) Implement durable fixes
After stabilization, fix the real cause:
- optimize slow queries
- add missing indexes
- eliminate N+1 queries
- paginate list endpoints
- stream files instead of buffering
- cap job batch sizes
- de-duplicate scheduled work
- fix loops or repeated external API calls
- reduce log volume on hot paths
- tune worker and thread counts from measurements
- add profiling and tracing
Process Flow
Common causes
- Too many Gunicorn/Uvicorn workers or threads for available RAM
- Celery or RQ worker concurrency set too high
- Memory leak from object retention, global caches, or long-lived processes
- N+1 queries or missing database indexes causing excessive CPU
- Long-running or stuck database queries
- Retry storms from failed background jobs or webhooks
- Infinite loops or inefficient code paths introduced in a recent deploy
- Large file uploads, exports, or image processing done fully in memory
- Unbounded API responses or missing pagination
- Traffic spike, bot traffic, or abusive clients hitting expensive endpoints
- Too many open database connections or no connection pooling
- Container memory limits too low for actual workload
- Verbose logging or synchronous external API calls under load
- Cron jobs or scheduled tasks overlapping and competing for resources
Debugging tips
- Compare CPU and memory over time. A sudden spike usually indicates traffic or a job event; a slow climb suggests a leak or cache growth.
- Correlate spikes with deploys, migrations, imports, cron schedules, billing runs, or webhook bursts.
- Restarting the process can temporarily hide a memory leak. Capture evidence first if possible.
- If one worker is much larger than others, inspect request types or endpoints causing object retention.
- If all workers are uniformly large, your base application memory footprint or worker count is likely too high.
- If CPU is low but response times are high, check I/O wait, DB locks, network timeouts, and thread exhaustion.
- If Postgres connections are high, check connection pooling and whether workers are opening too many connections.
- Use sampling profilers or application tracing for repeatable spikes instead of guessing from
topoutput alone.
Useful commands:
top
htop
uptime
free -h
vmstat 1 5
iostat -xz 1 5
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
pmap -x <PID> | tail -n 20
smem -rk
dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
docker stats --no-stream
docker inspect <container_id>
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'
ss -s
ss -ltnp
nginx -T
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, now()-query_start as duration, query from pg_stat_activity order by duration desc limit 20;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"
redis-cli info memory
redis-cli info stats
curl -I http://127.0.0.1/healthChecklist
- ✓ Confirmed whether issue is CPU, memory, or both
- ✓ Identified the exact process consuming resources
- ✓ Checked for OOM kills in kernel logs
- ✓ Reviewed app, worker, and database logs during the spike window
- ✓ Validated Gunicorn/Uvicorn/Celery/RQ concurrency settings
- ✓ Inspected slow queries and active database sessions
- ✓ Checked for retry storms, duplicate jobs, or runaway schedulers
- ✓ Verified uploads, exports, and batch jobs are streamed or chunked
- ✓ Applied rate limits or caching to hot routes if traffic-driven
- ✓ Added monitoring, alerts, and dashboards for CPU, memory, restart count, and request latency
FAQ
What is the fastest safe response to an active memory incident?
Reduce application and worker concurrency, stop non-essential background jobs, confirm whether OOM kills are occurring, and restore service stability before deeper profiling.
How do I separate app issues from database issues?
Check process-level CPU and memory first, then inspect active database queries and slow query stats. If app CPU is high with normal DB load, focus on application code. If DB sessions and query durations spike, focus on queries and indexes.
Why does the app recover after restart but fail again later?
That usually indicates a memory leak, cache growth, repeating job burst, or a traffic pattern that rebuilds the same pressure over time.
Can rate limiting help CPU and memory issues?
Yes. If a small set of endpoints or clients is creating disproportionate load, rate limiting and caching can stabilize the service quickly.
Should I use more workers to improve performance?
Not automatically. More workers may improve throughput for idle CPU capacity, but they also increase memory usage and can worsen contention if the database or host is already the bottleneck.
Final takeaway
Do not treat high CPU or memory as a generic scaling problem. Identify the overloaded process, confirm whether the trigger is traffic, code, database, or jobs, apply a safe mitigation, then implement a permanent fix with monitoring in place.