High CPU / Memory Usage — SaaS Builder Playbooks

Use this page when your app server becomes slow, unresponsive, or gets killed due to CPU or memory pressure. The goal is to quickly identify whether the issue is caused by application code, database load, worker concurrency, background jobs, traffic spikes, or infrastructure limits, then apply a safe remediation.

Quick Fix / Quick Setup

Start by confirming whether the spike comes from web workers, background jobs, database activity, or traffic. If memory is exhausted, reduce worker concurrency first. If CPU is saturated, identify hot endpoints, loops, or expensive queries before increasing server size.

bash

# 1) Find top CPU and memory consumers
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20

# 2) Watch live process usage
htop

# 3) Check load and memory
uptime
free -h
vmstat 1 5

# 4) Check if the app was OOM-killed
dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pager

# 5) Inspect service logs
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager

# 6) Inspect container usage if using Docker
docker stats --no-stream
docker ps
docker inspect <container_id>

# 7) Check Gunicorn/Uvicorn worker count and restart if overloaded
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'
systemctl restart gunicorn
systemctl restart celery

# 8) Check for runaway DB queries
psql "$DATABASE_URL" -c "select pid, now()-query_start as duration, state, wait_event_type, query from pg_stat_activity where state <> 'idle' order by duration desc limit 10;"

# 9) Check top query cost if pg_stat_statements is enabled
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Immediate mitigation options:

disable noisy cron jobs
pause non-essential workers
reduce worker concurrency if memory bound
enable caching for hot endpoints
rate limit abusive routes
scale up temporarily only if service stability requires it

What’s happening

High CPU usually means your app is spending too much time executing code, handling too many requests, or waiting inefficiently on database/network work.
High memory usually means too many workers, large in-memory objects, memory leaks, unbounded caches, oversized uploads, or queued jobs holding data.
In production, the symptom may appear as slow responses, 502/504 errors, container restarts, OOM kills, or failed health checks.
The fix is to identify the overloaded component first: web app, worker process, database, reverse proxy, or the host itself.

Step-by-step implementation

1) Measure host pressure

Run these first:

bash

top
htop
uptime
free -h
vmstat 1 5
iostat -xz 1 5
ss -s

What to look for:

load average rising quickly
available memory near zero
swap usage increasing
high iowait
too many open TCP connections
one process or service dominating CPU or RSS

If using Docker:

bash

docker stats --no-stream
docker ps

2) Identify the process consuming resources

Check which service is responsible:

bash

ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'

Typical interpretations:

gunicorn or uvicorn high CPU: hot endpoints, expensive app code, too much request volume
gunicorn or uvicorn high memory: too many workers, large responses, object retention
celery or rq high CPU: retry storms, duplicate jobs, expensive batch work
celery or rq high memory: large payloads, file processing, unbounded job batches
postgres high CPU: slow queries, missing indexes, lock contention, polling
nginx high CPU: connection flood, traffic spike, abusive clients

For a single process memory breakdown:

bash

pmap -x <PID> | tail -n 20
smem -rk

3) Check for OOM kills

If the kernel is killing processes, fix memory pressure before deeper code investigation.

bash

dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pager

If you see OOM events:

reduce web worker count
reduce job worker concurrency
stop non-essential scheduled jobs
increase host or container memory temporarily
lower parallel imports/exports
confirm container memory limits are not too low

4) Verify web server worker settings

Too many workers is a common cause of memory exhaustion.

Check current process count:

bash

ps aux | grep gunicorn
ps aux | grep uvicorn

Example Gunicorn service:

ini

[Service]
ExecStart=/app/venv/bin/gunicorn app.wsgi:application \
  --bind 127.0.0.1:8000 \
  --workers 2 \
  --threads 2 \
  --timeout 60 \
  --max-requests 1000 \
  --max-requests-jitter 100

Guidelines:

memory-bound app: reduce --workers
occasional leaks: use --max-requests and --max-requests-jitter
CPU saturation with free memory: benchmark before increasing workers
long requests: review endpoint design before increasing timeout

After changes:

bash

sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo systemctl status gunicorn --no-pager

5) Inspect logs for traffic and endpoint patterns

Look for one route, one tenant, one IP, or one task driving the spike.

bash

journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
tail -n 200 /var/log/nginx/access.log
tail -n 200 /var/log/nginx/error.log

Useful checks:

bash

# Top IPs in nginx access log
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head

# Top requested paths
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head

If a small number of clients or routes is causing load:

rate limit at Nginx
cache hot GET endpoints
block abusive clients
reduce expensive synchronous work

Example Nginx rate limiting:

nginx

http {
  limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

  server {
    location /api/ {
      limit_req zone=api_limit burst=20 nodelay;
      proxy_pass http://127.0.0.1:8000;
    }
  }
}

Validate and reload:

bash

nginx -T
sudo nginx -t
sudo systemctl reload nginx

6) Check database activity

High CPU or memory symptoms in the app often originate in the database.

Inspect active sessions:

bash

psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, now()-query_start as duration, query from pg_stat_activity order by duration desc limit 20;"

Inspect expensive queries:

bash

psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Common DB-driven causes:

missing indexes
N+1 queries
large table scans
repeated polling
too many open connections
lock contention

If connections are high, verify pooling and worker counts.

Example app-side connection pooling guidance:

keep worker count conservative
avoid opening one DB connection per thread without limits
use PgBouncer if connection churn is high

7) Review background jobs and schedulers

Check workers:

bash

ps aux | egrep 'celery|rq'
journalctl -u celery -n 200 --no-pager

Look for:

retry storms
duplicate queue consumers
cron overlap
large payload processing
image or CSV work done entirely in memory
jobs fetching entire datasets

Immediate mitigations:

lower concurrency
stop one queue temporarily
disable one noisy scheduled task
break large jobs into smaller batches
enforce idempotency on retries

Example Celery service with lower concurrency:

ini

[Service]
ExecStart=/app/venv/bin/celery -A app worker --loglevel=INFO --concurrency=2

Restart after changes:

bash

sudo systemctl restart celery
sudo systemctl status celery --no-pager

8) Check payload size, uploads, and response shape

Memory spikes often come from loading too much into RAM.

Review whether you are:

reading whole CSV files into memory
building large JSON responses
exporting large datasets without streaming
processing full images synchronously in request handlers
returning unpaginated API lists

Safer patterns:

stream uploads and downloads
paginate responses
chunk batch processing
move heavy processing to workers
store large files in object storage instead of local RAM-heavy processing

9) Apply immediate mitigations

Use the least risky change that restores service.

Priority order:

reduce concurrency
stop non-essential background work
add rate limits for expensive routes
cache high-read endpoints
scale vertically if needed
roll back a recent deploy if regression is suspected

Example systemd restart sequence:

bash

sudo systemctl restart gunicorn
sudo systemctl restart celery
sudo systemctl reload nginx

Container restart:

bash

docker restart <container_id>

Do not rely on restart alone unless you already captured enough evidence.

10) Implement durable fixes

After stabilization, fix the real cause:

optimize slow queries
add missing indexes
eliminate N+1 queries
paginate list endpoints
stream files instead of buffering
cap job batch sizes
de-duplicate scheduled work
fix loops or repeated external API calls
reduce log volume on hot paths
tune worker and thread counts from measurements
add profiling and tracing

symptom

overloaded process

DB/app/jobs/traffic

mitigation path

Process Flow

Common causes

Too many Gunicorn/Uvicorn workers or threads for available RAM
Celery or RQ worker concurrency set too high
Memory leak from object retention, global caches, or long-lived processes
N+1 queries or missing database indexes causing excessive CPU
Long-running or stuck database queries
Retry storms from failed background jobs or webhooks
Infinite loops or inefficient code paths introduced in a recent deploy
Large file uploads, exports, or image processing done fully in memory
Unbounded API responses or missing pagination
Traffic spike, bot traffic, or abusive clients hitting expensive endpoints
Too many open database connections or no connection pooling
Container memory limits too low for actual workload
Verbose logging or synchronous external API calls under load
Cron jobs or scheduled tasks overlapping and competing for resources

Debugging tips

Compare CPU and memory over time. A sudden spike usually indicates traffic or a job event; a slow climb suggests a leak or cache growth.
Correlate spikes with deploys, migrations, imports, cron schedules, billing runs, or webhook bursts.
Restarting the process can temporarily hide a memory leak. Capture evidence first if possible.
If one worker is much larger than others, inspect request types or endpoints causing object retention.
If all workers are uniformly large, your base application memory footprint or worker count is likely too high.
If CPU is low but response times are high, check I/O wait, DB locks, network timeouts, and thread exhaustion.
If Postgres connections are high, check connection pooling and whether workers are opening too many connections.
Use sampling profilers or application tracing for repeatable spikes instead of guessing from top output alone.

Useful commands:

bash

top
htop
uptime
free -h
vmstat 1 5
iostat -xz 1 5
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
pmap -x <PID> | tail -n 20
smem -rk
dmesg -T | egrep -i 'killed process|out of memory|oom'
journalctl -k -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
docker stats --no-stream
docker inspect <container_id>
ps aux | egrep 'gunicorn|uvicorn|celery|rq|postgres|nginx'
ss -s
ss -ltnp
nginx -T
psql "$DATABASE_URL" -c "select pid, usename, state, wait_event_type, now()-query_start as duration, query from pg_stat_activity order by duration desc limit 20;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"
redis-cli info memory
redis-cli info stats
curl -I http://127.0.0.1/health

Checklist

✓ Confirmed whether issue is CPU, memory, or both
✓ Identified the exact process consuming resources
✓ Checked for OOM kills in kernel logs
✓ Reviewed app, worker, and database logs during the spike window
✓ Validated Gunicorn/Uvicorn/Celery/RQ concurrency settings
✓ Inspected slow queries and active database sessions
✓ Checked for retry storms, duplicate jobs, or runaway schedulers
✓ Verified uploads, exports, and batch jobs are streamed or chunked
✓ Applied rate limits or caching to hot routes if traffic-driven
✓ Added monitoring, alerts, and dashboards for CPU, memory, restart count, and request latency

FAQ

What is the fastest safe response to an active memory incident?

Reduce application and worker concurrency, stop non-essential background jobs, confirm whether OOM kills are occurring, and restore service stability before deeper profiling.

How do I separate app issues from database issues?

Check process-level CPU and memory first, then inspect active database queries and slow query stats. If app CPU is high with normal DB load, focus on application code. If DB sessions and query durations spike, focus on queries and indexes.

Why does the app recover after restart but fail again later?

That usually indicates a memory leak, cache growth, repeating job burst, or a traffic pattern that rebuilds the same pressure over time.

Can rate limiting help CPU and memory issues?

Yes. If a small set of endpoints or clients is creating disproportionate load, rate limiting and caching can stabilize the service quickly.

Should I use more workers to improve performance?

Not automatically. More workers may improve throughput for idle CPU capacity, but they also increase memory usage and can worsen contention if the database or host is already the bottleneck.

Final takeaway

Do not treat high CPU or memory as a generic scaling problem. Identify the overloaded process, confirm whether the trigger is traffic, code, database, or jobs, apply a safe mitigation, then implement a permanent fix with monitoring in place.