Performance Checklist — SaaS Builder Playbooks

Use this checklist to verify the main performance layers of a small SaaS before launch and after major changes. It focuses on practical bottlenecks: slow queries, missing caching, oversized assets, blocking requests, bad worker setup, and weak monitoring. The goal is to catch the common issues that make MVPs feel slow in production.

Related production checklists:

Quick Fix / Quick Setup

Run this after deployment or before launch. If TTFB is high, start with app/database profiling. If static assets are slow, fix compression, caching headers, and file delivery first.

bash

# Quick production performance sweep

# 1) Check app/server resource pressure
uptime
free -m
df -h
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

# 2) Check web/app errors and slow behavior
sudo journalctl -u gunicorn -n 200 --no-pager
sudo tail -n 200 /var/log/nginx/access.log
sudo tail -n 200 /var/log/nginx/error.log

# 3) Check database pressure (Postgres)
psql "$DATABASE_URL" -c "select pid, now()-query_start as duration, state, query from pg_stat_activity where state <> 'idle' order by duration desc limit 10;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

# 4) Validate HTTP performance
curl -o /dev/null -s -w 'dns=%{time_namelookup} connect=%{time_connect} ttfb=%{time_starttransfer} total=%{time_total}\n' https://yourdomain.com/

# 5) Check worker/queue backlog
redis-cli ping
redis-cli llen default

# 6) Test compression and caching headers
curl -I https://yourdomain.com/static/app.css
curl -H 'Accept-Encoding: gzip' -I https://yourdomain.com/

What’s happening

Performance problems usually come from a small set of bottlenecks:

slow database queries
too few app workers
blocking network calls
oversized frontend assets
missing cache layers
background jobs running in the request path

A checklist is more useful than isolated tuning because most SaaS latency issues are cross-layer:

app code
database
web server
queue workers
browser delivery

For small SaaS products, the target is not maximum complexity. The target is:

predictable response times
stable memory use
enough monitoring to catch regressions early

browser

CDN/Nginx

app server

database/Redis/worker

external APIs

Process Flow

Step-by-step implementation

1. Define performance budgets

Set concrete targets before tuning.

Example starting budgets:

homepage TTFB: < 500ms
API p95: < 800ms
DB query p95 for common reads: < 100ms
background queue delay: < 30s
CSS/JS initial payload: as small as possible, preferably < 300KB gzipped for critical paths
hero images and dashboard assets: resized and compressed

Track these in release reviews and in your main production checklist at SaaS Production Checklist.

2. Enable request timing and query visibility

If you cannot see slow requests, do not optimize yet.

Minimum setup:

access logs with request duration
app-level timing for key routes
Postgres query visibility with pg_stat_statements
error tracking and APM if available

Enable pg_stat_statements in Postgres:

sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Example checks:

bash

psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 20;"
psql "$DATABASE_URL" -c "select pid, now()-query_start as duration, wait_event_type, wait_event, state, query from pg_stat_activity where state <> 'idle' order by duration desc limit 20;"

3. Profile the slowest endpoints

Identify the slowest 5 endpoints first.

Check:

endpoint path
request duration
SQL time
external API time
response size
auth/session overhead
cache hit/miss behavior

Use timing output from curl:

bash

curl -o /dev/null -s -w 'dns=%{time_namelookup} connect=%{time_connect} appconnect=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n' https://yourdomain.com/dashboard

If authenticated pages are much slower than public pages, inspect:

session backend
user/tenant lookup queries
permission checks
dashboard aggregate queries

4. Optimize database access

Common fixes:

add missing indexes
remove N+1 queries
paginate large result sets
avoid SELECT *
reduce repeated counts/aggregations in dashboards
use precomputed summaries where needed

Examples of indexes often missed in SaaS apps:

sql

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_projects_tenant_id ON projects(tenant_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_invoices_tenant_status ON invoices(tenant_id, status);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_events_created_at ON events(created_at DESC);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_members_user_id ON members(user_id);

Quick checks for table/index usage:

sql

SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC
LIMIT 20;

sql

SELECT indexrelname, relname, idx_scan
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC
LIMIT 20;

If you see high sequential scans on large tables, investigate indexing.

5. Add caching deliberately

Do not add Redis everywhere by default.

Good cache candidates:

pricing/config data
tenant settings
dashboard summary cards
expensive computed responses
rate limit state
session storage if required by your stack

Define:

cache key format
TTL
invalidation rule
tenant/user isolation

Example cache key pattern:

text

tenant:{tenant_id}:dashboard_summary:v3
user:{user_id}:permissions:v2
pricing:public:v1

Rules:

include tenant or user context where needed
version keys when response shape changes
avoid caching mutable objects without invalidation strategy

6. Move blocking work to background jobs

These should usually not run in the web request:

email sending
image processing
webhook retries
exports
report generation
billing sync tasks
non-critical third-party updates

Check queue health:

bash

redis-cli ping
redis-cli llen default

If queue depth grows and workers are healthy, inspect:

slow job handlers
retry loops
large payloads
dead jobs
missing worker concurrency

Related setup and troubleshooting:

7. Tune app concurrency

Set Gunicorn/Uvicorn worker counts based on CPU and memory, not guesswork.

Example Gunicorn config:

python

# gunicorn.conf.py
bind = "0.0.0.0:8000"
workers = 3
threads = 2
worker_class = "gthread"
timeout = 30
graceful_timeout = 30
keepalive = 5
max_requests = 1000
max_requests_jitter = 100
accesslog = "-"
errorlog = "-"

Review after deploy:

CPU saturation
memory per worker
request queueing
worker restarts
timeout rates

Check process pressure:

bash

ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head
top -o %CPU
vmstat 1 5
iostat -xz 1 3

If memory grows over time, inspect:

worker recycling
in-process caches
large ORM objects retained too long
unbounded task payloads

8. Optimize static asset delivery

Static assets should not be a hidden bottleneck.

Requirements:

minified CSS/JS
gzip or Brotli enabled
fingerprinted filenames
long-lived cache headers for versioned assets
image resizing
optional CDN/object storage if traffic justifies it

Example Nginx config:

nginx

gzip on;
gzip_types text/plain text/css application/javascript application/json application/xml image/svg+xml;
gzip_min_length 1024;

location /static/ {
    alias /var/www/app/static/;
    expires 1y;
    add_header Cache-Control "public, immutable";
    access_log off;
}

Validate headers:

bash

curl -I https://yourdomain.com/static/app.css
curl -H 'Accept-Encoding: gzip' -I https://yourdomain.com/

Check for:

Cache-Control
Content-Encoding
correct Content-Type
static files being served by Nginx, not app workers

9. Add timeouts and retry rules for external dependencies

Never allow external APIs to hang web workers indefinitely.

Cover:

Stripe
email provider
OAuth providers
internal HTTP services
webhook deliveries

Rules:

set connect timeout
set read timeout
retry only where safe
use background jobs for non-critical retries
add circuit breaking or failure isolation if traffic grows

If p95 is poor but CPU is low, external I/O is often the problem.

10. Add monitoring and alerts

Minimum performance monitoring:

p50/p95 request latency
4xx/5xx rates
CPU
memory
disk
DB connections
slow queries
queue depth
worker health
uptime

You should also review:

11. Re-test after changes

After every optimization:

compare baseline vs new metrics
verify no correctness regressions
verify cache invalidation
verify queue behavior
verify memory remains stable over time

Do not keep tuning without confirming measurable gains.

Common causes

Missing database indexes on tenant_id, user_id, status, created_at, or foreign keys
N+1 ORM queries in dashboards, admin tables, and API serializers
Too many synchronous tasks inside request handlers
App worker count too low or too high for available memory
No caching headers on static files
Static files served by the app instead of Nginx or object storage
Large unoptimized images and JavaScript bundles
External API calls without timeouts causing hung workers
Connection pool exhaustion between app and database
Long-running database transactions or locks
Queue workers down, causing delayed async work to fall back into user-facing flows
Memory leaks from unbounded in-process caches or large object retention
No observability, so regressions are only noticed after users report them

Debugging tips

Start by isolating the layer before changing infrastructure.

Basic host checks

bash

uptime
free -m
df -h
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
top -o %CPU
vmstat 1 5
iostat -xz 1 3

App and web logs

bash

sudo journalctl -u gunicorn -n 200 --no-pager
sudo tail -n 200 /var/log/nginx/access.log
sudo tail -n 200 /var/log/nginx/error.log

HTTP timing

bash

curl -o /dev/null -s -w 'dns=%{time_namelookup} connect=%{time_connect} ttfb=%{time_starttransfer} total=%{time_total}\n' https://yourdomain.com/

Database pressure

bash

psql "$DATABASE_URL" -c "select pid, now()-query_start as duration, state, query from pg_stat_activity where state <> 'idle' order by duration desc limit 10;"
psql "$DATABASE_URL" -c "select query, calls, total_exec_time, mean_exec_time from pg_stat_statements order by total_exec_time desc limit 10;"

Queue checks

bash

redis-cli ping
redis-cli llen default

Static asset checks

bash

curl -I https://yourdomain.com/static/app.css
curl -H 'Accept-Encoding: gzip' -I https://yourdomain.com/

Diagnostic rules

Compare app latency vs database latency before changing server size
Use access logs with request timing to isolate slow endpoints first
If only authenticated pages are slow, inspect DB queries, session storage, and permission checks
If CPU is low but responses are slow, suspect I/O waits, DB locks, external APIs, or queue contention
If memory keeps rising after deploys, inspect worker recycling and large caches
If static assets are slow but HTML is fast, inspect compression, cache headers, and asset size
If p95 is bad but p50 is fine, look for lock contention, cache misses, or one expensive code path

symptom

app

assets

queue

external APIs

infra

Process Flow

Checklist

Application

✓ Request timing enabled for main endpoints
✓ No debug mode, dev reloaders, or verbose SQL logging left on in production
✓ Expensive computations are cached or precomputed where appropriate
✓ External API calls have connect/read timeouts and retry rules
✓ File uploads and report generation do not block web requests

Database

✓ Slow query visibility is enabled
✓ Indexes exist for frequent filters, joins, and tenant/user lookups
✓ N+1 query patterns are removed from dashboard and list pages
✓ Pagination is used for large result sets
✓ Connection pool settings are sane for app worker count

Caching

✓ Redis or equivalent is used only where it reduces repeated expensive work
✓ Cache keys include tenant/user context where required
✓ Cache invalidation strategy is defined for mutable data

Static assets

✓ CSS/JS are minified and compressed
✓ Cache-Control headers are set for versioned assets
✓ Images are resized and modern formats considered where useful

Workers

✓ Background queue is running and monitored
✓ Retries and dead-letter behavior are defined for critical jobs

Infrastructure

✓ CPU, memory, disk, and open file limits are monitored
✓ Web server and app timeouts are configured intentionally

Monitoring

✓ p95 latency, error rate, queue depth, and DB health alerts exist

Release process

✓ Performance is checked after major feature launches and migration-heavy releases
✓ Performance review is included in SaaS Production Checklist
✓ Monitoring review is included in Monitoring Checklist
✓ Security changes did not accidentally degrade performance in Security Checklist
✓ Auth/session changes were reviewed in Auth System Checklist

Related guides

FAQ

What should I optimize first?

Start with measurement, then fix the slowest endpoints and queries before changing servers or adding complexity.

Do I need Redis for every SaaS?

No. Use it when repeated reads, sessions, rate limits, or background jobs justify it.

What is a good first latency target?

For core pages and APIs, aim for consistent sub-second behavior with a reasonable p95 under normal load.

Should I optimize frontend or backend first?

Whichever is dominating user wait time. Measure TTFB, asset weight, and render delays before deciding.

How often should I run this checklist?

Before launch, after major features, after infra changes, and during recurring production reviews.

What should I check before scaling servers?

Confirm whether the bottleneck is actually compute. Slow queries, blocking external API calls, missing caching, or static asset issues are often cheaper and more effective to fix first.

How do I know if the database is the bottleneck?

Look for slow query durations, lock waits, high connection counts, and endpoints whose latency tracks query time. pg_stat_statements and slow query logs are the fastest way to confirm this.

Should every background task be moved off the request path?

Move anything user-facing that can safely be async: email, report generation, image processing, webhook retries, and non-critical syncing. Keep only work required for immediate correctness in the request path.

What is the minimum monitoring needed for performance?

Track request latency, error rate, CPU, memory, DB health, queue depth, and uptime. Add alerts for sustained p95 latency increases and worker or database failures.

How often should this checklist be reviewed?

Run it before launch, after major releases, after infrastructure changes, and during periodic production reviews to catch regressions early.

Final takeaway

Performance is a release discipline, not a one-time tuning task.

For a small SaaS, the biggest wins usually come from:

query fixes
background jobs
caching
asset delivery
basic monitoring

Use this checklist to establish a baseline, catch regressions early, and only add complexity when metrics justify it.