SaaS Production Checklist
The essential playbook for implementing saas production checklist in your SaaS.
Use this page as the final production readiness gate before launch or major updates. It is built for MVPs and small SaaS products that need a practical checklist, not enterprise process overhead. Review deploy, security, auth, payments, monitoring, backups, and rollback readiness in one pass.
Quick Fix / Quick Setup
Production readiness quick setup:
1. Set APP_ENV=production and DEBUG=false
2. Confirm HTTPS is enabled and certificates auto-renew
3. Validate database backups and restore test
4. Run migrations on staging, then production
5. Verify auth flows: register, login, reset, logout
6. Verify billing flows: checkout, webhook, cancel, failed payment
7. Confirm logging, Sentry, uptime, and alerts are active
8. Check worker/cron/background jobs are running
9. Validate static/media storage and file permissions
10. Document rollback steps and on-call contact
Launch gate:
- No default secrets
- No pending migrations
- No failing health checks
- No unhandled high-severity errors
- Monitoring and alerting testedBest used with a staging environment that mirrors production. Add a simple go/no-go signoff and keep the checklist versioned in the repo.
What’s happening
Production failures often come from missing non-code setup:
- secrets not loaded
- TLS not enabled
- webhooks pointing to the wrong URL
- workers not running
- backups never tested
- monitoring installed but alerts disabled
A production checklist reduces launch risk by making those assumptions explicit.
For small SaaS teams, the goal is not heavy process. The goal is:
- repeatable deploys
- recoverability
- basic security hygiene
- correct auth and billing behavior under real traffic
Auth Lifecycle
Step-by-step implementation
1. Create one tracked checklist
Keep the checklist in the repo.
docs/production-checklist.mdOr use a tracked release issue template.
Recommended sections:
- application config
- infrastructure
- data
- auth
- payments
- observability
- security
- recovery
- launch validation
Assign one owner per section, even if one person owns everything.
2. Validate application config
Confirm production settings are explicit and safe.
Example environment values:
APP_ENV=production
DEBUG=false
APP_URL=https://yourdomain.com
ALLOWED_HOSTS=yourdomain.com,www.yourdomain.com
SESSION_COOKIE_SECURE=true
CSRF_COOKIE_SECURE=trueChecks:
- debug mode disabled
- production domain matches app config
- CORS and CSRF trusted origins match real frontend/backend domains
- secrets come from env vars or a secret manager
- no fallback development secrets
Quick commands:
printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'3. Validate infrastructure and network path
Check the full path from public DNS to the app process.
Checks:
- DNS resolves correctly
- reverse proxy is running
- app server is running
- firewall allows expected ports only
- processes restart automatically
- disk and memory have headroom
- server time is correct
Commands:
dig yourdomain.com +short
nslookup yourdomain.com
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
ss -tulpn
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | headIf using systemd:
sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celeryIf using Docker:
docker ps
docker compose psMinimal health check target:
GET /health -> 200 OKProcess Flow
4. Confirm HTTPS and proxy behavior
Checks:
- HTTPS enabled
- certificates auto-renew
- HTTP redirects to HTTPS
- reverse proxy forwards scheme/host headers correctly
- cookies marked secure in production
- webhook endpoints use the public HTTPS URL
Basic check:
curl -I https://yourdomain.com
curl -I http://yourdomain.comExpected:
- HTTPS returns
200or redirect to app route - HTTP returns
301or308to HTTPS
Nginx example:
server {
listen 80;
server_name yourdomain.com www.yourdomain.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name yourdomain.com www.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}5. Validate database state, migrations, backups, and restore
Checks:
- production DB credentials are correct
- pending migrations reviewed
- migrations run on staging first
- backups scheduled
- retention exists
- restore test completed at least once
- rollback policy documented for non-reversible migrations
Commands:
pg_isready
psql "$DATABASE_URL" -c 'select now();'
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic headsMigration workflow:
# staging first
python manage.py migrate
# then production during release window
python manage.py migrateMinimum backup validation:
- verify backup job success
- verify backup files exist
- restore into a separate database
- confirm schema and expected row counts
Example restore validation items:
- Can restore within target time
- App can connect to restored DB
- Critical tables exist
- Recent records exist6. Check static files, media, and storage persistence
Checks:
- static assets built and served correctly
- uploads persist after redeploy
- object storage credentials are set
- CDN URL matches production domain if used
- file permissions allow app access without overly broad writes
Typical problems:
- local filesystem uploads lost after container restart
- wrong asset base URL
- stale cached frontend bundle referencing wrong API domain
Validation:
curl -I https://yourdomain.com/static/app.cssFor uploads, perform a real upload in production and verify:
- file appears in storage backend
- file remains after restart or redeploy
- correct access controls are applied
7. Test auth flows in production mode
Do not stop at “login page loads”. Test the full lifecycle.
Required checks:
- register
- login
- logout
- password reset
- email verification
- session persistence across requests
- protected route behavior
- token/session expiration behavior
- secure cookies enabled
Key failure points:
- wrong cookie domain
- missing secure flag behind proxy
- CSRF origin mismatch
- callback URL mismatch
- email sending not configured
Use the auth checklist for deeper validation: Auth System Checklist
8. Test payments and billing state sync
If revenue depends on it, validate this manually before launch.
Required checks:
- checkout session succeeds
- subscription or payment record is created
- webhook signature verification works
- duplicate events do not corrupt state
- failed payments update user access correctly
- cancellation and resume flows work
- billing portal works if enabled
- live keys are only used in production
- app database stays in sync with provider state
Stripe-related commands:
stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>Critical checks:
- webhook endpoint is public
- raw request body handling is correct
- signing secret matches environment
- event processing is idempotent
Use the payment checklist for deeper validation: Payment System Checklist
9. Validate background jobs, workers, and schedulers
Checks:
- worker process is running
- scheduler/cron is running
- queues are not stalled
- retries are configured
- failures are visible in logs or alerts
Commands:
redis-cli ping
celery -A app inspect ping
sudo systemctl status celeryExamples of production-critical async work:
- email delivery
- payment webhook processing
- report generation
- cleanup tasks
- subscription sync jobs
A common launch failure is deploying the app but not the worker.
10. Enable observability and alerting
Minimum production observability:
- application logs
- server logs
- error tracking
- uptime checks
- alerting
- queue and webhook failure visibility
Checks:
- Sentry or equivalent receives test error
- uptime monitor hits public URL and health endpoint
- alerts notify the right person
- logs are searchable during incidents
- payment webhook failures generate alerts
- worker failures generate alerts
Useful commands:
sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.logUse the monitoring checklist for deeper validation: Monitoring Checklist
11. Check security basics before launch
Checks:
- no default credentials
- admin endpoints protected
- secrets rotated from setup defaults
- least privilege applied for DB and cloud roles
- security headers configured
- dependencies updated
- rate limits applied to critical endpoints
- production admin access restricted by role or network where possible
Minimum headers to consider:
Strict-Transport-Security
X-Content-Type-Options
X-Frame-Options
Content-Security-Policy
Referrer-PolicyUse the security checklist for deeper hardening: Security Checklist
12. Document recovery and rollback
Rollback needs to exist before deploy, not during the incident.
Document:
- current release version
- previous stable release version
- rollback command or image tag
- migration rollback policy
- maintenance mode process
- backup restore process
- incident contact path
Example rollback note:
App rollback:
- Re-deploy previous image tag: app:v1.2.3
Database rollback:
- Only if migration is reversible and tested
- Otherwise restore from backup or apply forward fixBe explicit about which migrations are unsafe to reverse.
13. Run a post-deploy smoke test
Execute a short repeatable smoke test after every production deploy.
Minimum smoke test:
- homepage loads
- login works
- dashboard loads
- one protected API endpoint returns success
- checkout or billing page loads
- webhook test passes
- email sends
- upload works
- queue job executes
- health endpoint returns success
Store this as a script or runbook. Do not rely on memory.
Use the deployment checklist for release gating details: Deployment Checklist
Common causes
- Production environment variables missing or inconsistent across app, worker, and scheduler
- HTTPS or proxy misconfiguration causing auth/session failures
- Migrations deployed without schema verification or backup safety
- Webhook endpoints reachable but signature verification or event handling broken
- Monitoring present but no actionable alerts configured
- Uploads, emails, or queues depend on services not enabled in production
- No tested rollback or restore path for bad deploys
- Live payment mode enabled without validating real subscription state transitions
- Running production with debug mode enabled
- Static/media paths differ between local and production
- DNS, TLS, and reverse proxy settings drift out of sync
Debugging tips
Check production config before assuming code is broken.
Validate the full path:
DNS -> TLS -> reverse proxy -> app server -> database -> background jobsUse repeatable commands:
printenv | sort
env | grep -E 'APP_ENV|DEBUG|DATABASE|REDIS|SECRET|STRIPE|DOMAIN'
curl -I https://yourdomain.com
curl -sS https://yourdomain.com/health
dig yourdomain.com +short
nslookup yourdomain.com
sudo systemctl status nginx
sudo systemctl status gunicorn
sudo systemctl status celery
docker ps
docker compose ps
ss -tulpn
sudo journalctl -u nginx -n 200 --no-pager
sudo journalctl -u gunicorn -n 200 --no-pager
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log
df -h
free -m
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head
python manage.py showmigrations
python manage.py migrate --plan
alembic current
alembic heads
pg_isready
psql "$DATABASE_URL" -c 'select now();'
redis-cli ping
celery -A app inspect ping
stripe listen
stripe events resend <event_id> --webhook-endpoint=<endpoint_id>Debugging priorities:
- compare staging and production env vars
- inspect most recent deploy diff
- inspect migration set
- inspect webhook delivery logs
- inspect worker health
- confirm alerts actually fired
one-page architecture diagram showing request flow, worker flow, and billing webhook flow.
Checklist
Application config
- ✓
APP_ENV=production - ✓
DEBUG=false - ✓ allowed hosts/domains are correct
- ✓ secrets loaded from env or secret manager
- ✓ no secrets committed in repo
- ✓ CORS and CSRF settings match production domains
Infrastructure
- ✓ DNS points to correct target
- ✓ reverse proxy is running
- ✓ app server is running
- ✓ processes restart automatically
- ✓ firewall rules are correct
- ✓ CPU, memory, disk, and DB connections are within safe limits
HTTPS and networking
- ✓ HTTPS enabled
- ✓ certificates auto-renew
- ✓ HTTP redirects to HTTPS
- ✓ health endpoint returns success
- ✓ webhook endpoints use public production URL
Database
- ✓ production DB credentials verified
- ✓ no pending migrations
- ✓ schema version is correct
- ✓ backups are scheduled
- ✓ retention exists
- ✓ restore test completed
Static/media/storage
- ✓ static files load from expected path or storage backend
- ✓ uploads work
- ✓ uploads persist after redeploy
- ✓ file permissions are valid
Auth
- ✓ register works
- ✓ login works
- ✓ logout works
- ✓ email verification works
- ✓ password reset works
- ✓ sessions or JWT settings are production-safe
- ✓ admin endpoints are protected
Payments
- ✓ live payment keys only in production
- ✓ checkout works
- ✓ subscription creation works
- ✓ webhook signature verification works
- ✓ webhook handler is idempotent
- ✓ cancellation flow works
- ✓ failed payment handling works
- ✓ billing state sync is correct
Background jobs
- ✓ worker is running
- ✓ scheduler/cron is running
- ✓ queue is processing
- ✓ retries configured
- ✓ failures visible in logs/alerts
Observability
- ✓ application logs active
- ✓ error tracking active
- ✓ uptime monitoring active
- ✓ alerts active
- ✓ payment webhook failures alert
- ✓ queue failures alert
Security
- ✓ no default credentials
- ✓ secrets rotated from defaults
- ✓ rate limits or abuse protections enabled
- ✓ least privilege applied
- ✓ security headers configured
- ✓ dependencies patched
Recovery
- ✓ rollback steps documented
- ✓ previous stable release identified
- ✓ migration rollback policy documented
- ✓ maintenance mode plan exists
- ✓ restore steps documented
Launch validation
- ✓ post-deploy smoke test completed
- ✓ transactional emails send from production domain
- ✓ team knows where logs, dashboards, and contacts are stored
- ✓ no failing health checks
- ✓ no unhandled high-severity errors
Related guides
- Deployment Checklist
- Security Checklist
- Monitoring Checklist
- Auth System Checklist
- Payment System Checklist
FAQ
What is the minimum production checklist for an MVP SaaS?
At minimum: production env vars, HTTPS, backups, migrations, auth flow validation, payment flow validation, worker health, logs, error tracking, uptime checks, and rollback steps.
Should staging be identical to production?
As close as practical. Match runtime, database engine, storage pattern, webhook behavior, and environment variable structure.
How often should I run this checklist?
Before launch, before major deploys, after infrastructure changes, and after any incident that exposed a missing control.
Can I automate parts of this checklist?
Yes. Health checks, smoke tests, migration checks, TLS validation, backup jobs, and alert tests should be automated where possible.
What is the most commonly skipped item?
Restore testing for backups. Many teams create backups but never verify they can restore quickly and correctly.
Final takeaway
Production readiness is mostly about eliminating hidden setup gaps.
A good checklist turns launch from guesswork into a repeatable process.
For MVPs and small SaaS apps, focus on:
- security basics
- backups and restore testing
- billing correctness
- auth correctness
- monitoring and alerting
- rollback readiness
If those are covered, launch risk drops significantly.