Zero Downtime Deployment — SaaS Builder Playbooks

A zero downtime deployment replaces app processes without taking the site offline. For small SaaS products, the goal is simple: new code goes live, in-flight requests finish, health checks stay green, and rollback is fast if the release is bad.

This page focuses on practical deployment patterns for Gunicorn, Nginx, systemd, and container-based setups.

Quick Fix / Quick Setup

Use this if you deploy on a VPS with immutable release directories and a graceful Gunicorn reload:

bash

# Example: blue/green-style release switch with symlink + Gunicorn reload
set -e
APP_DIR=/var/www/myapp
RELEASES=$APP_DIR/releases
CURRENT=$APP_DIR/current
NEW_RELEASE=$RELEASES/$(date +%Y%m%d%H%M%S)

mkdir -p "$NEW_RELEASE"
rsync -a . "$NEW_RELEASE"/
cd "$NEW_RELEASE"
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
alembic upgrade head

ln -sfn "$NEW_RELEASE" "$CURRENT"
sudo systemctl reload gunicorn
curl -f http://127.0.0.1:8000/health

# rollback if health check fails
# ln -sfn /var/www/myapp/releases/<previous_release> /var/www/myapp/current
# sudo systemctl reload gunicorn

Use reload only if your app server supports graceful worker replacement. Run destructive database migrations separately or make them backward-compatible before switching traffic.

What’s happening

Downtime usually happens when old app processes stop before new ones are ready.

A safe deployment keeps at least one healthy app instance serving traffic during code replacement.

For a small SaaS setup, zero downtime usually depends on these controls:

immutable release directories
health checks
graceful worker replacement
backward-compatible database migrations
fast rollback to the previous release

Nginx should continue routing requests while Gunicorn workers restart gracefully or while traffic shifts between old and new releases.

The database is often the real blocker. App restarts are easy compared to schema changes that lock tables, break compatibility, or invalidate queued jobs.

Step-by-step implementation

1. Pick a deployment pattern

Use one of these patterns:

Graceful reload: best for one VPS with Gunicorn and moderate traffic
Blue/green: best when you can run old and new versions briefly at the same time
Rolling: best when multiple app instances sit behind a load balancer or reverse proxy

For most small SaaS apps on one server, graceful reload plus immutable releases is the simplest reliable option.

2. Add a health endpoint

Your deploy should never switch traffic before the app proves it can serve requests.

Minimal Flask or FastAPI-style endpoint:

python

@app.get("/health")
def health():
    return {"status": "ok"}

If database availability is critical to request handling, include a lightweight DB check. Keep it fast. Do not run expensive queries.

Example with SQLAlchemy:

python

from sqlalchemy import text

@app.get("/health")
def health():
    db.session.execute(text("SELECT 1"))
    return {"status": "ok"}

Use both local and public checks during deploy:

bash

curl -f http://127.0.0.1:8000/health
curl -f https://yourdomain.com/health

3. Build outside the live path

Do not deploy directly into the active code directory.

Recommended layout:

text

/var/www/myapp/
├── current -> /var/www/myapp/releases/20260420123000
├── releases/
│   ├── 20260419110000
│   └── 20260420123000
└── shared/
    ├── .env
    ├── uploads/
    └── logs/

Keep shared state outside the release directory:

secrets
uploads
runtime sockets
logs
cache volumes if needed

4. Prepare a systemd service that points to `current`

Example gunicorn.service:

ini

[Unit]
Description=Gunicorn for myapp
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/www/myapp/current
EnvironmentFile=/var/www/myapp/shared/.env
ExecStart=/var/www/myapp/current/.venv/bin/gunicorn app:app \
  --workers 3 \
  --bind 127.0.0.1:8000 \
  --timeout 60 \
  --graceful-timeout 30 \
  --access-logfile - \
  --error-logfile -
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
KillSignal=SIGTERM
TimeoutStopSec=60

[Install]
WantedBy=multi-user.target

Reload systemd if you change the unit:

bash

sudo systemctl daemon-reload
sudo systemctl restart gunicorn

For regular deploys, prefer:

bash

sudo systemctl reload gunicorn

Do not use restart unless you accept a stop/start cycle.

5. Put Nginx in front of Gunicorn

Example Nginx server block:

nginx

server {
    listen 80;
    server_name yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    location /health {
        proxy_pass http://127.0.0.1:8000/health;
        access_log off;
    }
}

Validate config before reload:

bash

sudo nginx -t
sudo systemctl reload nginx

If you use Unix sockets, keep the socket path stable across releases or let systemd manage socket creation consistently.

6. Run pre-deploy checks

Before touching live traffic:

bash

set -e
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
pytest
alembic current

If your app has asset compilation:

bash

npm ci
npm run build

If config is environment-driven, validate required variables before switch:

bash

test -n "$DATABASE_URL"
test -n "$SECRET_KEY"

7. Use safe database migrations

Zero downtime app deploys fail most often because of schema changes.

Use an expand/contract pattern:

add new nullable columns, tables, or indexes
deploy code that can handle old and new schema
backfill data separately
remove old columns only after old code is gone

Safe examples:

add nullable column
add new table
add index concurrently where supported
write code that reads both old and new fields temporarily

Unsafe examples during live traffic:

dropping columns old code still reads
renaming columns without compatibility layer
blocking table rewrites during peak traffic

Run migration status checks:

bash

alembic current
alembic history

If you need a full migration strategy, see Database Migration Strategy.

8. Switch traffic gracefully

For a symlink-based deploy:

bash

APP_DIR=/var/www/myapp
PREVIOUS=$(readlink -f $APP_DIR/current)
NEW_RELEASE=$APP_DIR/releases/$(date +%Y%m%d%H%M%S)

mkdir -p "$NEW_RELEASE"
rsync -a . "$NEW_RELEASE"/
cd "$NEW_RELEASE"

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
alembic upgrade head

ln -sfn "$NEW_RELEASE" "$APP_DIR/current"
sudo systemctl reload gunicorn

curl -f http://127.0.0.1:8000/health
curl -f https://yourdomain.com/health

If the health check fails:

bash

ln -sfn "$PREVIOUS" /var/www/myapp/current
sudo systemctl reload gunicorn

9. Verify post-deploy state

Check these immediately after release:

bash

systemctl status gunicorn
journalctl -u gunicorn -n 200 --no-pager
journalctl -u nginx -n 200 --no-pager
curl -I http://127.0.0.1:8000/health
curl -I https://yourdomain.com/health
readlink -f /var/www/myapp/current
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log

Look for:

502 or 503 responses
worker boot errors
missing env vars
import errors
static asset 404s
DB connection failures
background workers using old code

For production triage workflows, see Debugging Production Issues.

10. Keep rollback immediate

Rollback should be a traffic switch, not a restore operation.

Good rollback:

switch current symlink back
revert image tag
point Nginx upstream back to old app
reload services gracefully

Bad rollback:

rebuild app from scratch under pressure
restore a full backup for a bad code push
manually edit files in the live directory

Capture release metadata in every deploy:

bash

echo "release=$(date +%Y%m%d%H%M%S)"
echo "git_sha=$(git rev-parse --short HEAD)"
echo "deployed_at=$(date -Iseconds)"

build

migrate

warm

health check

switch traffic

verify

rollback

Process Flow

Common causes

These are the most common reasons “zero downtime” deployments still cause outages:

using systemctl restart instead of graceful reload
running breaking database migrations with incompatible app code
deploying directly into the live directory
no health endpoint, so traffic shifts too early
single Gunicorn worker configuration
non-versioned static assets
background workers running old code against new payloads
changing Nginx upstream or socket path without a clean handoff
not enough RAM to run old and new processes briefly
rollback requiring a backup restore instead of a fast release switch

Debugging tips

Use these commands during or after a failed deploy:

bash

systemctl status gunicorn
journalctl -u gunicorn -n 200 --no-pager
journalctl -u nginx -n 200 --no-pager
nginx -t
ps aux | grep gunicorn
ss -ltnp | grep 8000
curl -I http://127.0.0.1:8000/health
curl -I https://yourdomain.com/health
readlink -f /var/www/myapp/current
ls -lah /var/www/myapp/releases
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log
alembic current
alembic history
docker ps
docker logs <container_name> --tail 200

Additional checks:

Confirm Gunicorn is actually reloading gracefully

Watch worker PIDs before and after reload:

bash

ps -ef | grep gunicorn
sudo systemctl reload gunicorn
sleep 2
ps -ef | grep gunicorn

You want to see new workers appear before old ones fully disappear.

Check for Nginx upstream failures

Search for common upstream errors:

bash

grep -i "upstream\|connect() failed\|502\|503" /var/log/nginx/error.log | tail -n 50

Check release pointer state

bash

readlink -f /var/www/myapp/current
ls -lah /var/www/myapp/releases

If current points to the wrong release, rollback may be a symlink issue, not an app issue.

Check worker and web deploy sync

If you use Celery or RQ, verify worker version and queue state. Incompatible job payloads often look like partial deploy failures.

Checklist

✓ Health endpoint exists and is used during deploy
✓ New release builds outside the live path
✓ Database migrations are backward-compatible
✓ Gunicorn reload is graceful, not stop/start
✓ Nginx continues serving during process replacement
✓ Static files are versioned or atomically switched
✓ Background workers are updated safely
✓ Rollback path is tested
✓ Logs and metrics are checked after release
✓ Old release is retained until deploy is confirmed stable

For broader release hardening, review Deployment Checklist and SaaS Production Checklist.

Related guides

FAQ

What is the minimum setup for zero downtime on a VPS?

Use Nginx in front of Gunicorn, run multiple Gunicorn workers, deploy to a new release directory, run compatible migrations, switch the current symlink, and reload Gunicorn gracefully.

Can I use zero downtime deployment with Flask or FastAPI?

Yes. The framework matters less than the process manager, reverse proxy, health checks, and migration strategy.

When should I avoid automatic migrations during deploy?

Avoid automatic migrations when they are large, blocking, or destructive. Run those in a planned step with compatibility checks and rollback planning.

How do I know if my reload is graceful?

Watch active requests during deploy, confirm old workers exit after finishing work, and verify Nginx does not show a spike in 502 or 503 responses.

What breaks zero downtime most often?

Schema incompatibility, direct in-place file deployment, and restarting the entire app stack at once are the most common causes.

Can a single VPS do zero downtime deployment?

Yes, if you use graceful reloads or briefly run old and new app processes side by side within available CPU and RAM.

Are database migrations the main risk?

Usually yes. Process replacement is manageable. Schema changes are where most deployment failures happen.

Should I use blue/green or rolling?

Blue/green is simpler on one host if resources allow two versions at once. Rolling is better with multiple instances.

Is Docker required?

No. Release directories plus systemd and Nginx are enough for many small SaaS products.

Can I guarantee zero dropped requests?

Not completely. You can reduce the risk significantly, but long-running requests, forced kills, bad health checks, and resource exhaustion can still interrupt traffic.

Final takeaway

Zero downtime deployment is mostly operational discipline:

build separately
migrate safely
warm the new version
switch traffic gracefully
verify health
keep rollback immediate

For a small SaaS product, the simplest reliable setup is usually:

immutable release directories
health checks
graceful Gunicorn reloads
backward-compatible database changes
a tested rollback path

If your current deploy still uses in-place file changes or restart, fix that first.

Quick Fix / Quick Setup

What’s happening

Step-by-step implementation

1. Pick a deployment pattern

2. Add a health endpoint

3. Build outside the live path

4. Prepare a systemd service that points to current

5. Put Nginx in front of Gunicorn

6. Run pre-deploy checks

7. Use safe database migrations

8. Switch traffic gracefully

9. Verify post-deploy state

10. Keep rollback immediate

Common causes

Debugging tips

Confirm Gunicorn is actually reloading gracefully

Check for Nginx upstream failures

Check release pointer state

Check worker and web deploy sync

Checklist

Related guides

FAQ

What is the minimum setup for zero downtime on a VPS?

Can I use zero downtime deployment with Flask or FastAPI?

When should I avoid automatic migrations during deploy?

How do I know if my reload is graceful?

What breaks zero downtime most often?

Can a single VPS do zero downtime deployment?

Are database migrations the main risk?

Should I use blue/green or rolling?

Is Docker required?

Can I guarantee zero dropped requests?

Final takeaway

4. Prepare a systemd service that points to `current`