Backup and Recovery Checklist

The essential playbook for implementing backup and recovery checklist in your SaaS.

Use this checklist to verify your SaaS can recover from data loss, bad deploys, accidental deletes, server failure, and provider issues. Focus on backups you can actually restore, not just jobs that appear to run.

This page is for MVPs and small production SaaS apps running on a VPS, Docker, or managed cloud services.

Related production checklists:

Quick Fix / Quick Setup

Run a minimum viable backup setup now.

bash
# 1) Create a database backup now
mkdir -p /var/backups/myapp
pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +%F-%H%M).dump

# 2) Archive uploaded files now
rsync -a /var/www/myapp/media/ /var/backups/myapp/media-$(date +%F-%H%M)/

# 3) Copy env/secrets snapshot securely
cp /opt/myapp/.env /var/backups/myapp/env-$(date +%F-%H%M).bak
chmod 600 /var/backups/myapp/env-*.bak

# 4) Verify backup files exist
ls -lh /var/backups/myapp

# 5) Test restore into a temporary database
createdb myapp_restore_test
pg_restore -d myapp_restore_test /var/backups/myapp/db-$(date +%F-%H%M).dump

# 6) Put backups on a different machine or object storage
# example with aws cli
aws s3 sync /var/backups/myapp s3://YOUR-BACKUP-BUCKET/myapp/

# 7) Add a daily cron job
crontab -e
# 15 2 * * * pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +\%F-\%H\%M).dump

Minimum safe setup:

  • automated daily database backups
  • off-server storage
  • uploaded file backup
  • env/secrets escrow
  • a restore test at least monthly

If you only do one thing today, do a backup and a restore test.


What’s happening

Backups fail in production for predictable reasons:

  • jobs run on the same server that later dies
  • dumps are created but never copied offsite
  • uploads are not included
  • .env and deployment config exist only on the live machine
  • restore steps are undocumented
  • backup files are corrupt, empty, or too old to help

A working recovery plan for a small SaaS usually needs all of these:

  • database backup
  • uploaded file backup or object storage versioning
  • environment variable and secret escrow
  • deployment/config backup
  • offsite storage
  • restore drill with written steps
  • retention policy
  • monitoring for backup failures

Backups are not complete until you can restore them into a separate environment and boot the app.


Step-by-step implementation

1) Inventory what must be recoverable

At minimum, list:

  • primary database
  • uploaded files and private media
  • .env or secret references
  • Nginx config
  • systemd units
  • Docker Compose files
  • cron jobs
  • worker schedules
  • storage bucket names
  • webhook endpoints
  • DNS records that affect app recovery

Example inventory file:

txt
Database:
- postgres://... production primary
- nightly pg_dump custom format
- managed snapshot retention: 7 days

Files:
- /var/www/myapp/media
- S3 bucket: myapp-prod-uploads
- bucket versioning: enabled

Config:
- /etc/nginx/sites-available/myapp.conf
- /etc/systemd/system/myapp.service
- /opt/myapp/docker-compose.yml
- /opt/myapp/.env

Recovery targets:
- RPO: 24h
- RTO: 2h

2) Choose backup method by component

Recommended mapping:

ComponentPreferred methodNotes
Postgrespg_dump -FcPortable, flexible restore
MySQLmysqldumpTest against exact server version
Managed DBprovider snapshots + logical dumpsUse both if possible
Local uploadsrsync, tar, or sync to object storageKeep separate from code
S3-compatible storagebucket versioning + replicationDurability is not enough without restore steps
Config filesGit/private repo + encrypted archiveDo not rely on live box only
Secretssecret manager or encrypted escrowNever leave as single copy on server

3) Automate database backups

Postgres cron example

bash
mkdir -p /var/backups/myapp
chmod 700 /var/backups/myapp
crontab -e
cron
15 2 * * * /usr/bin/pg_dump -Fc "$DATABASE_URL" > /var/backups/myapp/db-$(date +\%F-\%H\%M).dump

A safer pattern is a script with logging and retention:

bash
#!/usr/bin/env bash
set -euo pipefail

BACKUP_DIR="/var/backups/myapp"
STAMP="$(date +%F-%H%M)"
FILE="$BACKUP_DIR/db-$STAMP.dump"

mkdir -p "$BACKUP_DIR"
pg_dump -Fc "$DATABASE_URL" > "$FILE"

test -s "$FILE"
find "$BACKUP_DIR" -name 'db-*.dump' -mtime +7 -delete

Save as:

bash
/usr/local/bin/myapp-db-backup.sh
chmod +x /usr/local/bin/myapp-db-backup.sh

Cron:

cron
15 2 * * * /usr/local/bin/myapp-db-backup.sh >> /var/log/myapp-db-backup.log 2>&1

MySQL example

bash
mysqldump --single-transaction --quick --routines --triggers myapp_prod > /var/backups/myapp/db-$(date +%F-%H%M).sql

4) Back up uploaded files

If uploads are local:

bash
rsync -a --delete /var/www/myapp/media/ /var/backups/myapp/media-latest/

Timestamped archive example:

bash
tar -czf /var/backups/myapp/media-$(date +%F-%H%M).tar.gz /var/www/myapp/media

If uploads are in S3 or compatible object storage:

  • enable bucket versioning
  • document restore commands
  • consider bucket replication for another region/account

AWS versioning:

bash
aws s3api put-bucket-versioning \
  --bucket YOUR-BACKUP-BUCKET \
  --versioning-configuration Status=Enabled

5) Back up secrets and config

Keep copies of these outside the app server:

  • .env
  • deployment manifests
  • Docker Compose files
  • Nginx config
  • systemd units
  • cron jobs
  • TLS renewal config
  • DNS records
  • webhook endpoints and callback URLs

Example secure archive:

bash
tar -czf /tmp/myapp-config-$(date +%F-%H%M).tar.gz \
  /opt/myapp/.env \
  /etc/nginx/sites-available/myapp.conf \
  /etc/systemd/system/myapp.service \
  /opt/myapp/docker-compose.yml
chmod 600 /tmp/myapp-config-*.tar.gz

If secret exposure is a concern, prefer encrypted storage or a secret manager. Also review:

6) Push backups off the primary server

Backups on the same machine are not recovery.

Example to S3:

bash
aws s3 sync /var/backups/myapp s3://YOUR-BACKUP-BUCKET/myapp/

Example to another VPS:

bash
rsync -az /var/backups/myapp/ backupuser@backup-host:/srv/backups/myapp/

Minimum rule:

  • at least one copy off the primary server
  • ideally in another zone, region, or provider

7) Define retention

Use multiple restore points.

Example minimum retention for a small SaaS:

  • daily backups: 7 to 14 days
  • weekly backups: 4 to 8 weeks
  • monthly backups: 3 to 12 months

You need enough history to recover from:

  • accidental deletes discovered late
  • bad migrations
  • silent corruption
  • compromised app behavior

8) Write a recovery runbook

Document exact commands and order.

Recommended runbook structure:

txt
1. Put app into maintenance mode
2. Confirm most recent valid backup set
3. Restore database to temp target
4. Restore uploads/media
5. Restore .env and deployment config
6. Start database/app/workers
7. Run validation checks
8. Switch traffic or disable maintenance mode
9. Monitor logs/errors
10. Record incident timeline and follow-ups

Include:

  • owner
  • where backups live
  • credentials path
  • restore commands
  • service restart order
  • RPO/RTO targets
  • rollback criteria

9) Test restore regularly

Restore into:

  • staging
  • a temporary VM
  • a temporary database
  • a separate Docker environment

Postgres restore test:

bash
createdb restore_test_db
pg_restore -d restore_test_db /var/backups/myapp/latest.dump
psql restore_test_db -c "\dt"

Validation checks after restore:

  • app boots
  • login works
  • dashboard loads
  • uploads are accessible
  • billing flows still function
  • email sending works
  • webhooks verify correctly
  • workers process jobs
Start
Process
End

recovery flowchart showing database restore, file restore, app boot, validation checks, and go-live decision

10) Monitor backup jobs

Alert on:

  • missed scheduled jobs
  • zero-byte dump files
  • low disk space
  • failed upload to offsite storage
  • restore test failures

If you already maintain production checks, also review:


Common causes

  • Backups are written to the same server that later fails
  • Database dumps run but never get copied offsite
  • Uploaded media is omitted from backup scope
  • Secrets and environment variables exist only on the production box
  • Restore steps are undocumented or depend on tribal knowledge
  • Backups are corrupted, zero-byte, or incomplete due to disk space issues
  • Cron jobs fail silently because of missing environment variables or permissions
  • Retention is too short, so the only good restore point has already been deleted
  • Version mismatch between dump tool and database server breaks restore
  • Teams assume managed hosting means application-level data is fully recoverable without testing

Debugging tips

Use these commands to verify backup jobs, files, and restore readiness.

Scheduler and logs

bash
crontab -l
systemctl status cron || systemctl status crond
journalctl -u cron -n 100 --no-pager || journalctl -u crond -n 100 --no-pager
grep -R "backup\|pg_dump\|mysqldump" /etc/cron* /opt /srv 2>/dev/null

Backup files and disk space

bash
ls -lh /var/backups/myapp
du -sh /var/backups/myapp
df -h

Database tool and connectivity checks

bash
echo "$DATABASE_URL"
pg_dump --version
pg_restore --version
pg_isready -d "$DATABASE_URL"
mysqldump --version
mysql --version

Postgres restore checks

bash
createdb restore_test_db
pg_restore -l /var/backups/myapp/latest.dump | head
pg_restore -d restore_test_db /var/backups/myapp/latest.dump
psql restore_test_db -c "\dt"

Object storage checks

bash
aws s3 ls s3://YOUR-BACKUP-BUCKET/myapp/
aws s3 cp /var/backups/myapp/latest.dump s3://YOUR-BACKUP-BUCKET/myapp/latest.dump

File backup checks

bash
rsync --dry-run -a /var/www/myapp/media/ /var/backups/myapp/media-test/
tar -tzf backup.tar.gz | head

Docker checks

bash
docker ps
docker exec -it your-db-container pg_dump --version

If restore failures are caused by app connectivity after recovery, also review:

  • /database-connection-errors
  • /database-migration-strategy

Checklist

Coverage

  • Database backups are enabled
  • Uploaded files are backed up or versioned
  • Secrets and environment variables have a secure second copy
  • Deployment-specific config is backed up
  • Recovery steps are written down

Storage and retention

  • At least one backup copy is off the primary server
  • Backup access follows least privilege
  • Backups are encrypted at rest and in transit
  • Retention includes daily, weekly, and monthly restore points
  • Disk usage and backup storage usage are monitored

Validation

  • Backup files are non-zero and recent
  • Restore has been tested in a separate environment
  • App boots successfully after restore
  • Login, dashboard, uploads, billing, and email are validated
  • Restore time is within target RTO
  • Restore point freshness is within target RPO

Operational readiness

  • Backup jobs do not depend on a developer laptop
  • Alerts go to a real channel
  • Runbook includes commands, owners, and restart order
  • Backups are taken before risky migrations or infra changes
  • Last restore drill was completed within 30 days

For broader launch readiness, pair this with:


Related guides


FAQ

Do I need both database backups and server snapshots?

Usually yes. Database dumps are portable and easier to restore selectively. Snapshots help recover full machine state faster. For small SaaS apps, combine logical database backups with offsite file/config backups at minimum.

How often should I test restores?

At least monthly for active production apps, and always before major infrastructure or database changes. Also test after changing backup scripts, storage providers, or database versions.

Should I back up the whole application codebase?

Code should already live in Git and your CI/CD system. Back up deployment-specific config, environment files, Nginx/systemd/Docker setup, and anything not reproducible from source control.

If I use S3 for uploads, do I still need backups?

Yes. Durable storage is not the same as a recovery plan. Enable versioning where possible and document how to restore or roll back deleted or overwritten objects.

What is the minimum viable backup setup for an MVP SaaS?

Daily automated database dump, offsite storage, uploaded file backup or object storage versioning, a secure copy of production env vars, and one tested restore procedure.

What should I exclude from backups?

Caches, temporary files, build artifacts, dependency directories, and anything easily reproducible. Focus on data, uploads, secrets, and deployment-specific config.


Final takeaway

A backup is only real if you have restored it successfully.

For a small SaaS, the minimum standard is:

  • automated backups
  • offsite copies
  • protected secrets
  • regular restore drills

Treat restore testing as part of production readiness, not as an optional ops task.