SaaS Production Checklist

The essential playbook for implementing saas production checklist in your SaaS.

Use this page as the final production readiness gate before launch or major updates. It is built for MVPs and small SaaS products that need a practical checklist, not enterprise process overhead. Review deploy, security, auth, payments, monitoring, backups, and rollback readiness in one pass.

Quick Fix / Quick Setup

txt
Production readiness quick setup:

1. Set APP_ENV=production and DEBUG=false
2. Confirm HTTPS is enabled and certificates auto-renew
3. Validate database backups and restore test
4. Run migrations on staging, then production
5. Verify auth flows: register, login, reset, logout
6. Verify billing flows: checkout, webhook, cancel, failed payment
7. Confirm logging, Sentry, uptime, and alerts are active
8. Check worker/cron/background jobs are running
9. Validate static/media storage and file permissions
10. Document rollback steps and on-call contact

Launch gate:
- No default secrets
- No pending migrations
- No failing health checks
- No unhandled high-severity errors
- Monitoring and alerting tested

Best used with a staging environment that mirrors production. Add a simple go/no-go signoff and keep the checklist versioned in the repo.


What’s happening

Production failures often come from missing non-code setup:

  • secrets not loaded
  • TLS not enabled
  • webhooks pointing to the wrong URL
  • workers not running
  • backups never tested
  • monitoring installed but alerts disabled

A production checklist reduces launch risk by making those assumptions explicit.

For small SaaS teams, the goal is not heavy process. The goal is:

  • repeatable deploys
  • recoverability
  • basic security hygiene
  • correct auth and billing behavior under real traffic
Register
Verify Email
Login
Session
Logout

Auth Lifecycle


Step-by-step implementation

1. Create one tracked checklist

Keep the checklist in the repo.

txt
docs/production-checklist.md

Or use a tracked release issue template.

Recommended sections:

  • application config
  • infrastructure
  • data
  • auth
  • payments
  • observability
  • security
  • recovery
  • launch validation

Assign one owner per section, even if one person owns everything.


2. Validate application config

Confirm production settings are explicit and safe.

Example environment values:

env
APP_ENV=production
DEBUG=false
APP_URL=https://yourdomain.com
ALLOWED_HOSTS=yourdomain.com,www.yourdomain.com
SESSION_COOKIE_SECURE=true
CSRF_COOKIE_SECURE=true

Checks:

  • debug mode disabled
  • production domain matches app config
  • CORS and CSRF trusted origins match real frontend/backend domains
  • secrets come from env vars or a secret manager
  • no fallback development secrets

Quick commands:

bash
printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'

3. Validate infrastructure and network path

Check the full path from public DNS to the app process.

Checks:

  • DNS resolves correctly
  • reverse proxy is running
  • app server is running
  • firewall allows expected ports only
  • processes restart automatically
  • disk and memory have headroom
  • server time is correct

Commands:

bash
dig yourdomain.com +short
nslookup yourdomain.com
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
ss -tulpn
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

If using systemd:

bash
sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celery

If using Docker:

bash
docker ps
docker compose ps

Minimal health check target:

txt
GET /health -> 200 OK
DNS
TLS
reverse proxy
app
database

Process Flow


4. Confirm HTTPS and proxy behavior

Checks:

  • HTTPS enabled
  • certificates auto-renew
  • HTTP redirects to HTTPS
  • reverse proxy forwards scheme/host headers correctly
  • cookies marked secure in production
  • webhook endpoints use the public HTTPS URL

Basic check:

bash
curl -I https://yourdomain.com
curl -I http://yourdomain.com

Expected:

  • HTTPS returns 200 or redirect to app route
  • HTTP returns 301 or 308 to HTTPS

Nginx example:

nginx
server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com www.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

5. Validate database state, migrations, backups, and restore

Checks:

  • production DB credentials are correct
  • pending migrations reviewed
  • migrations run on staging first
  • backups scheduled
  • retention exists
  • restore test completed at least once
  • rollback policy documented for non-reversible migrations

Commands:

bash
pg_isready
psql "$DATABASE_URL" -c 'select now();'
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic heads

Migration workflow:

bash
# staging first
python manage.py migrate

# then production during release window
python manage.py migrate

Minimum backup validation:

  • verify backup job success
  • verify backup files exist
  • restore into a separate database
  • confirm schema and expected row counts

Example restore validation items:

txt
- Can restore within target time
- App can connect to restored DB
- Critical tables exist
- Recent records exist

6. Check static files, media, and storage persistence

Checks:

  • static assets built and served correctly
  • uploads persist after redeploy
  • object storage credentials are set
  • CDN URL matches production domain if used
  • file permissions allow app access without overly broad writes

Typical problems:

  • local filesystem uploads lost after container restart
  • wrong asset base URL
  • stale cached frontend bundle referencing wrong API domain

Validation:

bash
curl -I https://yourdomain.com/static/app.css

For uploads, perform a real upload in production and verify:

  • file appears in storage backend
  • file remains after restart or redeploy
  • correct access controls are applied

7. Test auth flows in production mode

Do not stop at “login page loads”. Test the full lifecycle.

Required checks:

  • register
  • login
  • logout
  • password reset
  • email verification
  • session persistence across requests
  • protected route behavior
  • token/session expiration behavior
  • secure cookies enabled

Key failure points:

  • wrong cookie domain
  • missing secure flag behind proxy
  • CSRF origin mismatch
  • callback URL mismatch
  • email sending not configured

Use the auth checklist for deeper validation: Auth System Checklist


8. Test payments and billing state sync

If revenue depends on it, validate this manually before launch.

Required checks:

  • checkout session succeeds
  • subscription or payment record is created
  • webhook signature verification works
  • duplicate events do not corrupt state
  • failed payments update user access correctly
  • cancellation and resume flows work
  • billing portal works if enabled
  • live keys are only used in production
  • app database stays in sync with provider state

Stripe-related commands:

bash
stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>

Critical checks:

  • webhook endpoint is public
  • raw request body handling is correct
  • signing secret matches environment
  • event processing is idempotent

Use the payment checklist for deeper validation: Payment System Checklist


9. Validate background jobs, workers, and schedulers

Checks:

  • worker process is running
  • scheduler/cron is running
  • queues are not stalled
  • retries are configured
  • failures are visible in logs or alerts

Commands:

bash
redis-cli ping
celery -A app inspect ping
sudo systemctl status celery

Examples of production-critical async work:

  • email delivery
  • payment webhook processing
  • report generation
  • cleanup tasks
  • subscription sync jobs

A common launch failure is deploying the app but not the worker.


10. Enable observability and alerting

Minimum production observability:

  • application logs
  • server logs
  • error tracking
  • uptime checks
  • alerting
  • queue and webhook failure visibility

Checks:

  • Sentry or equivalent receives test error
  • uptime monitor hits public URL and health endpoint
  • alerts notify the right person
  • logs are searchable during incidents
  • payment webhook failures generate alerts
  • worker failures generate alerts

Useful commands:

bash
sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log

Use the monitoring checklist for deeper validation: Monitoring Checklist


11. Check security basics before launch

Checks:

  • no default credentials
  • admin endpoints protected
  • secrets rotated from setup defaults
  • least privilege applied for DB and cloud roles
  • security headers configured
  • dependencies updated
  • rate limits applied to critical endpoints
  • production admin access restricted by role or network where possible

Minimum headers to consider:

txt
Strict-Transport-Security
X-Content-Type-Options
X-Frame-Options
Content-Security-Policy
Referrer-Policy

Use the security checklist for deeper hardening: Security Checklist


12. Document recovery and rollback

Rollback needs to exist before deploy, not during the incident.

Document:

  • current release version
  • previous stable release version
  • rollback command or image tag
  • migration rollback policy
  • maintenance mode process
  • backup restore process
  • incident contact path

Example rollback note:

txt
App rollback:
- Re-deploy previous image tag: app:v1.2.3

Database rollback:
- Only if migration is reversible and tested
- Otherwise restore from backup or apply forward fix

Be explicit about which migrations are unsafe to reverse.


13. Run a post-deploy smoke test

Execute a short repeatable smoke test after every production deploy.

Minimum smoke test:

  • homepage loads
  • login works
  • dashboard loads
  • one protected API endpoint returns success
  • checkout or billing page loads
  • webhook test passes
  • email sends
  • upload works
  • queue job executes
  • health endpoint returns success

Store this as a script or runbook. Do not rely on memory.

Use the deployment checklist for release gating details: Deployment Checklist


Common causes

  • Production environment variables missing or inconsistent across app, worker, and scheduler
  • HTTPS or proxy misconfiguration causing auth/session failures
  • Migrations deployed without schema verification or backup safety
  • Webhook endpoints reachable but signature verification or event handling broken
  • Monitoring present but no actionable alerts configured
  • Uploads, emails, or queues depend on services not enabled in production
  • No tested rollback or restore path for bad deploys
  • Live payment mode enabled without validating real subscription state transitions
  • Running production with debug mode enabled
  • Static/media paths differ between local and production
  • DNS, TLS, and reverse proxy settings drift out of sync

Debugging tips

Check production config before assuming code is broken.

Validate the full path:

txt
DNS -> TLS -> reverse proxy -> app server -> database -> background jobs

Use repeatable commands:

bash
printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
dig yourdomain.com +short
nslookup yourdomain.com
sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celery
docker ps
docker compose ps
ss -tulpn
sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic heads
pg_isready
psql "$DATABASE_URL" -c 'select now();'
redis-cli ping
celery -A app inspect ping
stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>

Debugging priorities:

  1. compare staging and production env vars
  2. inspect most recent deploy diff
  3. inspect migration set
  4. inspect webhook delivery logs
  5. inspect worker health
  6. confirm alerts actually fired
Browser
App
Stripe
DB
Start Checkout
Create Session
Redirect to Checkout
Webhook (invoice.paid)
Update Subscription

one-page architecture diagram showing request flow, worker flow, and billing webhook flow.


Checklist

Application config

  • APP_ENV=production
  • DEBUG=false
  • allowed hosts/domains are correct
  • secrets loaded from env or secret manager
  • no secrets committed in repo
  • CORS and CSRF settings match production domains

Infrastructure

  • DNS points to correct target
  • reverse proxy is running
  • app server is running
  • processes restart automatically
  • firewall rules are correct
  • CPU, memory, disk, and DB connections are within safe limits

HTTPS and networking

  • HTTPS enabled
  • certificates auto-renew
  • HTTP redirects to HTTPS
  • health endpoint returns success
  • webhook endpoints use public production URL

Database

  • production DB credentials verified
  • no pending migrations
  • schema version is correct
  • backups are scheduled
  • retention exists
  • restore test completed

Static/media/storage

  • static files load from expected path or storage backend
  • uploads work
  • uploads persist after redeploy
  • file permissions are valid

Auth

  • register works
  • login works
  • logout works
  • email verification works
  • password reset works
  • sessions or JWT settings are production-safe
  • admin endpoints are protected

Payments

  • live payment keys only in production
  • checkout works
  • subscription creation works
  • webhook signature verification works
  • webhook handler is idempotent
  • cancellation flow works
  • failed payment handling works
  • billing state sync is correct

Background jobs

  • worker is running
  • scheduler/cron is running
  • queue is processing
  • retries configured
  • failures visible in logs/alerts

Observability

  • application logs active
  • error tracking active
  • uptime monitoring active
  • alerts active
  • payment webhook failures alert
  • queue failures alert

Security

  • no default credentials
  • secrets rotated from defaults
  • rate limits or abuse protections enabled
  • least privilege applied
  • security headers configured
  • dependencies patched

Recovery

  • rollback steps documented
  • previous stable release identified
  • migration rollback policy documented
  • maintenance mode plan exists
  • restore steps documented

Launch validation

  • post-deploy smoke test completed
  • transactional emails send from production domain
  • team knows where logs, dashboards, and contacts are stored
  • no failing health checks
  • no unhandled high-severity errors

Related guides


FAQ

What is the minimum production checklist for an MVP SaaS?

At minimum: production env vars, HTTPS, backups, migrations, auth flow validation, payment flow validation, worker health, logs, error tracking, uptime checks, and rollback steps.

Should staging be identical to production?

As close as practical. Match runtime, database engine, storage pattern, webhook behavior, and environment variable structure.

How often should I run this checklist?

Before launch, before major deploys, after infrastructure changes, and after any incident that exposed a missing control.

Can I automate parts of this checklist?

Yes. Health checks, smoke tests, migration checks, TLS validation, backup jobs, and alert tests should be automated where possible.

What is the most commonly skipped item?

Restore testing for backups. Many teams create backups but never verify they can restore quickly and correctly.


Final takeaway

Production readiness is mostly about eliminating hidden setup gaps.

A good checklist turns launch from guesswork into a repeatable process.

For MVPs and small SaaS apps, focus on:

  • security basics
  • backups and restore testing
  • billing correctness
  • auth correctness
  • monitoring and alerting
  • rollback readiness

If those are covered, launch risk drops significantly.