SaaS Production Checklist — SaaS Builder Playbooks

Use this page as the final production readiness gate before launch or major updates. It is built for MVPs and small SaaS products that need a practical checklist, not enterprise process overhead. Review deploy, security, auth, payments, monitoring, backups, and rollback readiness in one pass.

Quick Fix / Quick Setup

txt

Production readiness quick setup:

1. Set APP_ENV=production and DEBUG=false
2. Confirm HTTPS is enabled and certificates auto-renew
3. Validate database backups and restore test
4. Run migrations on staging, then production
5. Verify auth flows: register, login, reset, logout
6. Verify billing flows: checkout, webhook, cancel, failed payment
7. Confirm logging, Sentry, uptime, and alerts are active
8. Check worker/cron/background jobs are running
9. Validate static/media storage and file permissions
10. Document rollback steps and on-call contact

Launch gate:
- No default secrets
- No pending migrations
- No failing health checks
- No unhandled high-severity errors
- Monitoring and alerting tested

Best used with a staging environment that mirrors production. Add a simple go/no-go signoff and keep the checklist versioned in the repo.

What’s happening

Production failures often come from missing non-code setup:

secrets not loaded
TLS not enabled
webhooks pointing to the wrong URL
workers not running
backups never tested
monitoring installed but alerts disabled

A production checklist reduces launch risk by making those assumptions explicit.

For small SaaS teams, the goal is not heavy process. The goal is:

repeatable deploys
recoverability
basic security hygiene
correct auth and billing behavior under real traffic

Verify Email

Session

Logout

Auth Lifecycle

Step-by-step implementation

1. Create one tracked checklist

Keep the checklist in the repo.

txt

docs/production-checklist.md

Or use a tracked release issue template.

Recommended sections:

application config
infrastructure
data
auth
payments
observability
security
recovery
launch validation

Assign one owner per section, even if one person owns everything.

2. Validate application config

Confirm production settings are explicit and safe.

Example environment values:

env

APP_ENV=production
DEBUG=false
APP_URL=https://yourdomain.com
ALLOWED_HOSTS=yourdomain.com,www.yourdomain.com
SESSION_COOKIE_SECURE=true
CSRF_COOKIE_SECURE=true

Checks:

debug mode disabled
production domain matches app config
CORS and CSRF trusted origins match real frontend/backend domains
secrets come from env vars or a secret manager
no fallback development secrets

Quick commands:

bash

printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'

3. Validate infrastructure and network path

Check the full path from public DNS to the app process.

Checks:

DNS resolves correctly
reverse proxy is running
app server is running
firewall allows expected ports only
processes restart automatically
disk and memory have headroom
server time is correct

Commands:

bash

dig yourdomain.com +short
nslookup yourdomain.com
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
ss -tulpn
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

If using systemd:

bash

sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celery

If using Docker:

bash

docker ps
docker compose ps

Minimal health check target:

txt

GET /health -> 200 OK

DNS

TLS

reverse proxy

app

database

Process Flow

4. Confirm HTTPS and proxy behavior

Checks:

HTTPS enabled
certificates auto-renew
HTTP redirects to HTTPS
reverse proxy forwards scheme/host headers correctly
cookies marked secure in production
webhook endpoints use the public HTTPS URL

Basic check:

bash

curl -I https://yourdomain.com
curl -I http://yourdomain.com

Expected:

HTTPS returns 200 or redirect to app route
HTTP returns 301 or 308 to HTTPS

Nginx example:

nginx

server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com www.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

5. Validate database state, migrations, backups, and restore

Checks:

production DB credentials are correct
pending migrations reviewed
migrations run on staging first
backups scheduled
retention exists
restore test completed at least once
rollback policy documented for non-reversible migrations

Commands:

bash

pg_isready
psql "$DATABASE_URL" -c 'select now();'
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic heads

Migration workflow:

bash

# staging first
python manage.py migrate

# then production during release window
python manage.py migrate

Minimum backup validation:

verify backup job success
verify backup files exist
restore into a separate database
confirm schema and expected row counts

Example restore validation items:

txt

- Can restore within target time
- App can connect to restored DB
- Critical tables exist
- Recent records exist

6. Check static files, media, and storage persistence

Checks:

static assets built and served correctly
uploads persist after redeploy
object storage credentials are set
CDN URL matches production domain if used
file permissions allow app access without overly broad writes

Typical problems:

local filesystem uploads lost after container restart
wrong asset base URL
stale cached frontend bundle referencing wrong API domain

Validation:

bash

curl -I https://yourdomain.com/static/app.css

For uploads, perform a real upload in production and verify:

file appears in storage backend
file remains after restart or redeploy
correct access controls are applied

7. Test auth flows in production mode

Do not stop at “login page loads”. Test the full lifecycle.

Required checks:

register
login
logout
password reset
email verification
session persistence across requests
protected route behavior
token/session expiration behavior
secure cookies enabled

Key failure points:

wrong cookie domain
missing secure flag behind proxy
CSRF origin mismatch
callback URL mismatch
email sending not configured

Use the auth checklist for deeper validation: Auth System Checklist

8. Test payments and billing state sync

If revenue depends on it, validate this manually before launch.

Required checks:

checkout session succeeds
subscription or payment record is created
webhook signature verification works
duplicate events do not corrupt state
failed payments update user access correctly
cancellation and resume flows work
billing portal works if enabled
live keys are only used in production
app database stays in sync with provider state

Stripe-related commands:

bash

stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>

Critical checks:

webhook endpoint is public
raw request body handling is correct
signing secret matches environment
event processing is idempotent

Use the payment checklist for deeper validation: Payment System Checklist

9. Validate background jobs, workers, and schedulers

Checks:

worker process is running
scheduler/cron is running
queues are not stalled
retries are configured
failures are visible in logs or alerts

Commands:

bash

redis-cli ping
celery -A app inspect ping
sudo systemctl status celery

Examples of production-critical async work:

email delivery
payment webhook processing
report generation
cleanup tasks
subscription sync jobs

A common launch failure is deploying the app but not the worker.

10. Enable observability and alerting

Minimum production observability:

application logs
server logs
error tracking
uptime checks
alerting
queue and webhook failure visibility

Checks:

Sentry or equivalent receives test error
uptime monitor hits public URL and health endpoint
alerts notify the right person
logs are searchable during incidents
payment webhook failures generate alerts
worker failures generate alerts

Useful commands:

bash

sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log

Use the monitoring checklist for deeper validation: Monitoring Checklist

11. Check security basics before launch

Checks:

no default credentials
admin endpoints protected
secrets rotated from setup defaults
least privilege applied for DB and cloud roles
security headers configured
dependencies updated
rate limits applied to critical endpoints
production admin access restricted by role or network where possible

Minimum headers to consider:

txt

Strict-Transport-Security
X-Content-Type-Options
X-Frame-Options
Content-Security-Policy
Referrer-Policy

Use the security checklist for deeper hardening: Security Checklist

12. Document recovery and rollback

Rollback needs to exist before deploy, not during the incident.

Document:

current release version
previous stable release version
rollback command or image tag
migration rollback policy
maintenance mode process
backup restore process
incident contact path

Example rollback note:

txt

App rollback:
- Re-deploy previous image tag: app:v1.2.3

Database rollback:
- Only if migration is reversible and tested
- Otherwise restore from backup or apply forward fix

Be explicit about which migrations are unsafe to reverse.

13. Run a post-deploy smoke test

Execute a short repeatable smoke test after every production deploy.

Minimum smoke test:

homepage loads
login works
dashboard loads
one protected API endpoint returns success
checkout or billing page loads
webhook test passes
email sends
upload works
queue job executes
health endpoint returns success

Store this as a script or runbook. Do not rely on memory.

Use the deployment checklist for release gating details: Deployment Checklist

Common causes

Production environment variables missing or inconsistent across app, worker, and scheduler
HTTPS or proxy misconfiguration causing auth/session failures
Migrations deployed without schema verification or backup safety
Webhook endpoints reachable but signature verification or event handling broken
Monitoring present but no actionable alerts configured
Uploads, emails, or queues depend on services not enabled in production
No tested rollback or restore path for bad deploys
Live payment mode enabled without validating real subscription state transitions
Running production with debug mode enabled
Static/media paths differ between local and production
DNS, TLS, and reverse proxy settings drift out of sync

Debugging tips

Check production config before assuming code is broken.

Validate the full path:

txt

DNS -> TLS -> reverse proxy -> app server -> database -> background jobs

Use repeatable commands:

bash

printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
dig yourdomain.com +short
nslookup yourdomain.com
sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celery
docker ps
docker compose ps
ss -tulpn
sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic heads
pg_isready
psql "$DATABASE_URL" -c 'select now();'
redis-cli ping
celery -A app inspect ping
stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>

Debugging priorities:

compare staging and production env vars
inspect most recent deploy diff
inspect migration set
inspect webhook delivery logs
inspect worker health
confirm alerts actually fired

Browser

App

Stripe

Start Checkout

Create Session

Redirect to Checkout

Webhook (invoice.paid)

Update Subscription

one-page architecture diagram showing request flow, worker flow, and billing webhook flow.

Checklist

Application config

✓ APP_ENV=production
✓ DEBUG=false
✓ allowed hosts/domains are correct
✓ secrets loaded from env or secret manager
✓ no secrets committed in repo
✓ CORS and CSRF settings match production domains

Infrastructure

✓ DNS points to correct target
✓ reverse proxy is running
✓ app server is running
✓ processes restart automatically
✓ firewall rules are correct
✓ CPU, memory, disk, and DB connections are within safe limits

HTTPS and networking

✓ HTTPS enabled
✓ certificates auto-renew
✓ HTTP redirects to HTTPS
✓ health endpoint returns success
✓ webhook endpoints use public production URL

Database

✓ production DB credentials verified
✓ no pending migrations
✓ schema version is correct
✓ backups are scheduled
✓ retention exists
✓ restore test completed

Static/media/storage

✓ static files load from expected path or storage backend
✓ uploads work
✓ uploads persist after redeploy
✓ file permissions are valid

Auth

✓ register works
✓ login works
✓ logout works
✓ email verification works
✓ password reset works
✓ sessions or JWT settings are production-safe
✓ admin endpoints are protected

Payments

✓ live payment keys only in production
✓ checkout works
✓ subscription creation works
✓ webhook signature verification works
✓ webhook handler is idempotent
✓ cancellation flow works
✓ failed payment handling works
✓ billing state sync is correct

Background jobs

✓ worker is running
✓ scheduler/cron is running
✓ queue is processing
✓ retries configured
✓ failures visible in logs/alerts

Observability

✓ application logs active
✓ error tracking active
✓ uptime monitoring active
✓ alerts active
✓ payment webhook failures alert
✓ queue failures alert

Security

✓ no default credentials
✓ secrets rotated from defaults
✓ rate limits or abuse protections enabled
✓ least privilege applied
✓ security headers configured
✓ dependencies patched

Recovery

✓ rollback steps documented
✓ previous stable release identified
✓ migration rollback policy documented
✓ maintenance mode plan exists
✓ restore steps documented

Launch validation

✓ post-deploy smoke test completed
✓ transactional emails send from production domain
✓ team knows where logs, dashboards, and contacts are stored
✓ no failing health checks
✓ no unhandled high-severity errors

Related guides

FAQ

What is the minimum production checklist for an MVP SaaS?

At minimum: production env vars, HTTPS, backups, migrations, auth flow validation, payment flow validation, worker health, logs, error tracking, uptime checks, and rollback steps.

Should staging be identical to production?

As close as practical. Match runtime, database engine, storage pattern, webhook behavior, and environment variable structure.

How often should I run this checklist?

Before launch, before major deploys, after infrastructure changes, and after any incident that exposed a missing control.

Can I automate parts of this checklist?

Yes. Health checks, smoke tests, migration checks, TLS validation, backup jobs, and alert tests should be automated where possible.

What is the most commonly skipped item?

Restore testing for backups. Many teams create backups but never verify they can restore quickly and correctly.

Final takeaway

Production readiness is mostly about eliminating hidden setup gaps.

A good checklist turns launch from guesswork into a repeatable process.

For MVPs and small SaaS apps, focus on:

security basics
backups and restore testing
billing correctness
auth correctness
monitoring and alerting
rollback readiness

If those are covered, launch risk drops significantly.