Backup and Recovery Checklist
The essential playbook for implementing backup and recovery checklist in your SaaS.
Use this checklist to verify your SaaS can recover from data loss, bad deploys, accidental deletes, server failure, and provider issues. Focus on backups you can actually restore, not just jobs that appear to run.
This page is for MVPs and small production SaaS apps running on a VPS, Docker, or managed cloud services.
Related production checklists:
Quick Fix / Quick Setup
Run a minimum viable backup setup now.
# 1) Create a database backup now
mkdir -p /var/backups/myapp
pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +%F-%H%M).dump
# 2) Archive uploaded files now
rsync -a /var/www/myapp/media/ /var/backups/myapp/media-$(date +%F-%H%M)/
# 3) Copy env/secrets snapshot securely
cp /opt/myapp/.env /var/backups/myapp/env-$(date +%F-%H%M).bak
chmod 600 /var/backups/myapp/env-*.bak
# 4) Verify backup files exist
ls -lh /var/backups/myapp
# 5) Test restore into a temporary database
createdb myapp_restore_test
pg_restore -d myapp_restore_test /var/backups/myapp/db-$(date +%F-%H%M).dump
# 6) Put backups on a different machine or object storage
# example with aws cli
aws s3 sync /var/backups/myapp s3://YOUR-BACKUP-BUCKET/myapp/
# 7) Add a daily cron job
crontab -e
# 15 2 * * * pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +\%F-\%H\%M).dumpMinimum safe setup:
- automated daily database backups
- off-server storage
- uploaded file backup
- env/secrets escrow
- a restore test at least monthly
If you only do one thing today, do a backup and a restore test.
What’s happening
Backups fail in production for predictable reasons:
- jobs run on the same server that later dies
- dumps are created but never copied offsite
- uploads are not included
.envand deployment config exist only on the live machine- restore steps are undocumented
- backup files are corrupt, empty, or too old to help
A working recovery plan for a small SaaS usually needs all of these:
- database backup
- uploaded file backup or object storage versioning
- environment variable and secret escrow
- deployment/config backup
- offsite storage
- restore drill with written steps
- retention policy
- monitoring for backup failures
Backups are not complete until you can restore them into a separate environment and boot the app.
Step-by-step implementation
1) Inventory what must be recoverable
At minimum, list:
- primary database
- uploaded files and private media
.envor secret references- Nginx config
- systemd units
- Docker Compose files
- cron jobs
- worker schedules
- storage bucket names
- webhook endpoints
- DNS records that affect app recovery
Example inventory file:
Database:
- postgres://... production primary
- nightly pg_dump custom format
- managed snapshot retention: 7 days
Files:
- /var/www/myapp/media
- S3 bucket: myapp-prod-uploads
- bucket versioning: enabled
Config:
- /etc/nginx/sites-available/myapp.conf
- /etc/systemd/system/myapp.service
- /opt/myapp/docker-compose.yml
- /opt/myapp/.env
Recovery targets:
- RPO: 24h
- RTO: 2h2) Choose backup method by component
Recommended mapping:
| Component | Preferred method | Notes |
|---|---|---|
| Postgres | pg_dump -Fc | Portable, flexible restore |
| MySQL | mysqldump | Test against exact server version |
| Managed DB | provider snapshots + logical dumps | Use both if possible |
| Local uploads | rsync, tar, or sync to object storage | Keep separate from code |
| S3-compatible storage | bucket versioning + replication | Durability is not enough without restore steps |
| Config files | Git/private repo + encrypted archive | Do not rely on live box only |
| Secrets | secret manager or encrypted escrow | Never leave as single copy on server |
3) Automate database backups
Postgres cron example
mkdir -p /var/backups/myapp
chmod 700 /var/backups/myapp
crontab -e15 2 * * * /usr/bin/pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +\%F-\%H\%M).dumpA safer pattern is a script with logging and retention:
#!/usr/bin/env bash
set -euo pipefail
BACKUP_DIR="/var/backups/myapp"
STAMP="$(date +%F-%H%M)"
FILE="$BACKUP_DIR/db-$STAMP.dump"
mkdir -p "$BACKUP_DIR"
pg_dump -Fc "$DATABASE_URL" > "$FILE"
test -s "$FILE"
find "$BACKUP_DIR" -name 'db-*.dump' -mtime +7 -deleteSave as:
/usr/local/bin/myapp-db-backup.sh
chmod +x /usr/local/bin/myapp-db-backup.shCron:
15 2 * * * /usr/local/bin/myapp-db-backup.sh >> /var/log/myapp-db-backup.log 2>&1MySQL example
mysqldump --single-transaction --quick --routines --triggers myapp_prod > /var/backups/myapp/db-$(date +%F-%H%M).sql4) Back up uploaded files
If uploads are local:
rsync -a --delete /var/www/myapp/media/ /var/backups/myapp/media-latest/Timestamped archive example:
tar -czf /var/backups/myapp/media-$(date +%F-%H%M).tar.gz /var/www/myapp/mediaIf uploads are in S3 or compatible object storage:
- enable bucket versioning
- document restore commands
- consider bucket replication for another region/account
AWS versioning:
aws s3api put-bucket-versioning \
--bucket YOUR-BACKUP-BUCKET \
--versioning-configuration Status=Enabled5) Back up secrets and config
Keep copies of these outside the app server:
.env- deployment manifests
- Docker Compose files
- Nginx config
- systemd units
- cron jobs
- TLS renewal config
- DNS records
- webhook endpoints and callback URLs
Example secure archive:
tar -czf /tmp/myapp-config-$(date +%F-%H%M).tar.gz \
/opt/myapp/.env \
/etc/nginx/sites-available/myapp.conf \
/etc/systemd/system/myapp.service \
/opt/myapp/docker-compose.yml
chmod 600 /tmp/myapp-config-*.tar.gzIf secret exposure is a concern, prefer encrypted storage or a secret manager. Also review:
6) Push backups off the primary server
Backups on the same machine are not recovery.
Example to S3:
aws s3 sync /var/backups/myapp s3://YOUR-BACKUP-BUCKET/myapp/Example to another VPS:
rsync -az /var/backups/myapp/ backupuser@backup-host:/srv/backups/myapp/Minimum rule:
- at least one copy off the primary server
- ideally in another zone, region, or provider
7) Define retention
Use multiple restore points.
Example minimum retention for a small SaaS:
- daily backups: 7 to 14 days
- weekly backups: 4 to 8 weeks
- monthly backups: 3 to 12 months
You need enough history to recover from:
- accidental deletes discovered late
- bad migrations
- silent corruption
- compromised app behavior
8) Write a recovery runbook
Document exact commands and order.
Recommended runbook structure:
1. Put app into maintenance mode
2. Confirm most recent valid backup set
3. Restore database to temp target
4. Restore uploads/media
5. Restore .env and deployment config
6. Start database/app/workers
7. Run validation checks
8. Switch traffic or disable maintenance mode
9. Monitor logs/errors
10. Record incident timeline and follow-upsInclude:
- owner
- where backups live
- credentials path
- restore commands
- service restart order
- RPO/RTO targets
- rollback criteria
9) Test restore regularly
Restore into:
- staging
- a temporary VM
- a temporary database
- a separate Docker environment
Postgres restore test:
createdb restore_test_db
pg_restore -d restore_test_db /var/backups/myapp/latest.dump
psql restore_test_db -c "\dt"Validation checks after restore:
- app boots
- login works
- dashboard loads
- uploads are accessible
- billing flows still function
- email sending works
- webhooks verify correctly
- workers process jobs
recovery flowchart showing database restore, file restore, app boot, validation checks, and go-live decision
10) Monitor backup jobs
Alert on:
- missed scheduled jobs
- zero-byte dump files
- low disk space
- failed upload to offsite storage
- restore test failures
If you already maintain production checks, also review:
Common causes
- Backups are written to the same server that later fails
- Database dumps run but never get copied offsite
- Uploaded media is omitted from backup scope
- Secrets and environment variables exist only on the production box
- Restore steps are undocumented or depend on tribal knowledge
- Backups are corrupted, zero-byte, or incomplete due to disk space issues
- Cron jobs fail silently because of missing environment variables or permissions
- Retention is too short, so the only good restore point has already been deleted
- Version mismatch between dump tool and database server breaks restore
- Teams assume managed hosting means application-level data is fully recoverable without testing
Debugging tips
Use these commands to verify backup jobs, files, and restore readiness.
Scheduler and logs
crontab -l
systemctl status cron || systemctl status crond
journalctl -u cron -n 100 --no-pager || journalctl -u crond -n 100 --no-pager
grep -R "backup\|pg_dump\|mysqldump" /etc/cron* /opt /srv 2>/dev/nullBackup files and disk space
ls -lh /var/backups/myapp
du -sh /var/backups/myapp
df -hDatabase tool and connectivity checks
echo "$DATABASE_URL"
pg_dump --version
pg_restore --version
pg_isready -d "$DATABASE_URL"
mysqldump --version
mysql --versionPostgres restore checks
createdb restore_test_db
pg_restore -l /var/backups/myapp/latest.dump | head
pg_restore -d restore_test_db /var/backups/myapp/latest.dump
psql restore_test_db -c "\dt"Object storage checks
aws s3 ls s3://YOUR-BACKUP-BUCKET/myapp/
aws s3 cp /var/backups/myapp/latest.dump s3://YOUR-BACKUP-BUCKET/myapp/latest.dumpFile backup checks
rsync --dry-run -a /var/www/myapp/media/ /var/backups/myapp/media-test/
tar -tzf backup.tar.gz | headDocker checks
docker ps
docker exec -it your-db-container pg_dump --versionIf restore failures are caused by app connectivity after recovery, also review:
/database-connection-errors/database-migration-strategy
Checklist
Coverage
- ✓ Database backups are enabled
- ✓ Uploaded files are backed up or versioned
- ✓ Secrets and environment variables have a secure second copy
- ✓ Deployment-specific config is backed up
- ✓ Recovery steps are written down
Storage and retention
- ✓ At least one backup copy is off the primary server
- ✓ Backup access follows least privilege
- ✓ Backups are encrypted at rest and in transit
- ✓ Retention includes daily, weekly, and monthly restore points
- ✓ Disk usage and backup storage usage are monitored
Validation
- ✓ Backup files are non-zero and recent
- ✓ Restore has been tested in a separate environment
- ✓ App boots successfully after restore
- ✓ Login, dashboard, uploads, billing, and email are validated
- ✓ Restore time is within target RTO
- ✓ Restore point freshness is within target RPO
Operational readiness
- ✓ Backup jobs do not depend on a developer laptop
- ✓ Alerts go to a real channel
- ✓ Runbook includes commands, owners, and restart order
- ✓ Backups are taken before risky migrations or infra changes
- ✓ Last restore drill was completed within 30 days
For broader launch readiness, pair this with:
Related guides
- SaaS Production Checklist
- Security Checklist
- Auth System Checklist
- Database Migration Strategy
- Database Connection Errors
FAQ
Do I need both database backups and server snapshots?
Usually yes. Database dumps are portable and easier to restore selectively. Snapshots help recover full machine state faster. For small SaaS apps, combine logical database backups with offsite file/config backups at minimum.
How often should I test restores?
At least monthly for active production apps, and always before major infrastructure or database changes. Also test after changing backup scripts, storage providers, or database versions.
Should I back up the whole application codebase?
Code should already live in Git and your CI/CD system. Back up deployment-specific config, environment files, Nginx/systemd/Docker setup, and anything not reproducible from source control.
If I use S3 for uploads, do I still need backups?
Yes. Durable storage is not the same as a recovery plan. Enable versioning where possible and document how to restore or roll back deleted or overwritten objects.
What is the minimum viable backup setup for an MVP SaaS?
Daily automated database dump, offsite storage, uploaded file backup or object storage versioning, a secure copy of production env vars, and one tested restore procedure.
What should I exclude from backups?
Caches, temporary files, build artifacts, dependency directories, and anything easily reproducible. Focus on data, uploads, secrets, and deployment-specific config.
Final takeaway
A backup is only real if you have restored it successfully.
For a small SaaS, the minimum standard is:
- automated backups
- offsite copies
- protected secrets
- regular restore drills
Treat restore testing as part of production readiness, not as an optional ops task.