Zero Downtime Deployment
The essential playbook for implementing zero downtime deployment in your SaaS.
A zero downtime deployment replaces app processes without taking the site offline. For small SaaS products, the goal is simple: new code goes live, in-flight requests finish, health checks stay green, and rollback is fast if the release is bad.
This page focuses on practical deployment patterns for Gunicorn, Nginx, systemd, and container-based setups.
Quick Fix / Quick Setup
Use this if you deploy on a VPS with immutable release directories and a graceful Gunicorn reload:
# Example: blue/green-style release switch with symlink + Gunicorn reload
set -e
APP_DIR=/var/www/myapp
RELEASES=$APP_DIR/releases
CURRENT=$APP_DIR/current
NEW_RELEASE=$RELEASES/$(date +%Y%m%d%H%M%S)
mkdir -p "$NEW_RELEASE"
rsync -a . "$NEW_RELEASE"/
cd "$NEW_RELEASE"
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
alembic upgrade head
ln -sfn "$NEW_RELEASE" "$CURRENT"
sudo systemctl reload gunicorn
curl -f http://127.0.0.1:8000/health
# rollback if health check fails
# ln -sfn /var/www/myapp/releases/<previous_release> /var/www/myapp/current
# sudo systemctl reload gunicornUse reload only if your app server supports graceful worker replacement. Run destructive database migrations separately or make them backward-compatible before switching traffic.
What’s happening
Downtime usually happens when old app processes stop before new ones are ready.
A safe deployment keeps at least one healthy app instance serving traffic during code replacement.
For a small SaaS setup, zero downtime usually depends on these controls:
- immutable release directories
- health checks
- graceful worker replacement
- backward-compatible database migrations
- fast rollback to the previous release
Nginx should continue routing requests while Gunicorn workers restart gracefully or while traffic shifts between old and new releases.
The database is often the real blocker. App restarts are easy compared to schema changes that lock tables, break compatibility, or invalidate queued jobs.
Step-by-step implementation
1. Pick a deployment pattern
Use one of these patterns:
- Graceful reload: best for one VPS with Gunicorn and moderate traffic
- Blue/green: best when you can run old and new versions briefly at the same time
- Rolling: best when multiple app instances sit behind a load balancer or reverse proxy
For most small SaaS apps on one server, graceful reload plus immutable releases is the simplest reliable option.
2. Add a health endpoint
Your deploy should never switch traffic before the app proves it can serve requests.
Minimal Flask or FastAPI-style endpoint:
@app.get("/health")
def health():
return {"status": "ok"}If database availability is critical to request handling, include a lightweight DB check. Keep it fast. Do not run expensive queries.
Example with SQLAlchemy:
from sqlalchemy import text
@app.get("/health")
def health():
db.session.execute(text("SELECT 1"))
return {"status": "ok"}Use both local and public checks during deploy:
curl -f http://127.0.0.1:8000/health
curl -f https://yourdomain.com/health3. Build outside the live path
Do not deploy directly into the active code directory.
Recommended layout:
/var/www/myapp/
├── current -> /var/www/myapp/releases/20260420123000
├── releases/
│ ├── 20260419110000
│ └── 20260420123000
└── shared/
├── .env
├── uploads/
└── logs/Keep shared state outside the release directory:
- secrets
- uploads
- runtime sockets
- logs
- cache volumes if needed
4. Prepare a systemd service that points to current
Example gunicorn.service:
[Unit]
Description=Gunicorn for myapp
After=network.target
[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/www/myapp/current
EnvironmentFile=/var/www/myapp/shared/.env
ExecStart=/var/www/myapp/current/.venv/bin/gunicorn app:app \
--workers 3 \
--bind 127.0.0.1:8000 \
--timeout 60 \
--graceful-timeout 30 \
--access-logfile - \
--error-logfile -
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
KillSignal=SIGTERM
TimeoutStopSec=60
[Install]
WantedBy=multi-user.targetReload systemd if you change the unit:
sudo systemctl daemon-reload
sudo systemctl restart gunicornFor regular deploys, prefer:
sudo systemctl reload gunicornDo not use restart unless you accept a stop/start cycle.
5. Put Nginx in front of Gunicorn
Example Nginx server block:
server {
listen 80;
server_name yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
location /health {
proxy_pass http://127.0.0.1:8000/health;
access_log off;
}
}Validate config before reload:
sudo nginx -t
sudo systemctl reload nginxIf you use Unix sockets, keep the socket path stable across releases or let systemd manage socket creation consistently.
6. Run pre-deploy checks
Before touching live traffic:
set -e
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
pytest
alembic currentIf your app has asset compilation:
npm ci
npm run buildIf config is environment-driven, validate required variables before switch:
test -n "$DATABASE_URL"
test -n "$SECRET_KEY"7. Use safe database migrations
Zero downtime app deploys fail most often because of schema changes.
Use an expand/contract pattern:
- add new nullable columns, tables, or indexes
- deploy code that can handle old and new schema
- backfill data separately
- remove old columns only after old code is gone
Safe examples:
- add nullable column
- add new table
- add index concurrently where supported
- write code that reads both old and new fields temporarily
Unsafe examples during live traffic:
- dropping columns old code still reads
- renaming columns without compatibility layer
- blocking table rewrites during peak traffic
Run migration status checks:
alembic current
alembic historyIf you need a full migration strategy, see Database Migration Strategy.
8. Switch traffic gracefully
For a symlink-based deploy:
APP_DIR=/var/www/myapp
PREVIOUS=$(readlink -f $APP_DIR/current)
NEW_RELEASE=$APP_DIR/releases/$(date +%Y%m%d%H%M%S)
mkdir -p "$NEW_RELEASE"
rsync -a . "$NEW_RELEASE"/
cd "$NEW_RELEASE"
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
alembic upgrade head
ln -sfn "$NEW_RELEASE" "$APP_DIR/current"
sudo systemctl reload gunicorn
curl -f http://127.0.0.1:8000/health
curl -f https://yourdomain.com/healthIf the health check fails:
ln -sfn "$PREVIOUS" /var/www/myapp/current
sudo systemctl reload gunicorn9. Verify post-deploy state
Check these immediately after release:
systemctl status gunicorn
journalctl -u gunicorn -n 200 --no-pager
journalctl -u nginx -n 200 --no-pager
curl -I http://127.0.0.1:8000/health
curl -I https://yourdomain.com/health
readlink -f /var/www/myapp/current
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.logLook for:
- 502 or 503 responses
- worker boot errors
- missing env vars
- import errors
- static asset 404s
- DB connection failures
- background workers using old code
For production triage workflows, see Debugging Production Issues.
10. Keep rollback immediate
Rollback should be a traffic switch, not a restore operation.
Good rollback:
- switch
currentsymlink back - revert image tag
- point Nginx upstream back to old app
- reload services gracefully
Bad rollback:
- rebuild app from scratch under pressure
- restore a full backup for a bad code push
- manually edit files in the live directory
Capture release metadata in every deploy:
echo "release=$(date +%Y%m%d%H%M%S)"
echo "git_sha=$(git rev-parse --short HEAD)"
echo "deployed_at=$(date -Iseconds)"Process Flow
Common causes
These are the most common reasons “zero downtime” deployments still cause outages:
- using
systemctl restartinstead of graceful reload - running breaking database migrations with incompatible app code
- deploying directly into the live directory
- no health endpoint, so traffic shifts too early
- single Gunicorn worker configuration
- non-versioned static assets
- background workers running old code against new payloads
- changing Nginx upstream or socket path without a clean handoff
- not enough RAM to run old and new processes briefly
- rollback requiring a backup restore instead of a fast release switch
Debugging tips
Use these commands during or after a failed deploy:
systemctl status gunicorn
journalctl -u gunicorn -n 200 --no-pager
journalctl -u nginx -n 200 --no-pager
nginx -t
ps aux | grep gunicorn
ss -ltnp | grep 8000
curl -I http://127.0.0.1:8000/health
curl -I https://yourdomain.com/health
readlink -f /var/www/myapp/current
ls -lah /var/www/myapp/releases
tail -n 200 /var/log/nginx/error.log
tail -n 200 /var/log/nginx/access.log
alembic current
alembic history
docker ps
docker logs <container_name> --tail 200Additional checks:
Confirm Gunicorn is actually reloading gracefully
Watch worker PIDs before and after reload:
ps -ef | grep gunicorn
sudo systemctl reload gunicorn
sleep 2
ps -ef | grep gunicornYou want to see new workers appear before old ones fully disappear.
Check for Nginx upstream failures
Search for common upstream errors:
grep -i "upstream\|connect() failed\|502\|503" /var/log/nginx/error.log | tail -n 50Check release pointer state
readlink -f /var/www/myapp/current
ls -lah /var/www/myapp/releasesIf current points to the wrong release, rollback may be a symlink issue, not an app issue.
Check worker and web deploy sync
If you use Celery or RQ, verify worker version and queue state. Incompatible job payloads often look like partial deploy failures.
Checklist
- ✓ Health endpoint exists and is used during deploy
- ✓ New release builds outside the live path
- ✓ Database migrations are backward-compatible
- ✓ Gunicorn reload is graceful, not stop/start
- ✓ Nginx continues serving during process replacement
- ✓ Static files are versioned or atomically switched
- ✓ Background workers are updated safely
- ✓ Rollback path is tested
- ✓ Logs and metrics are checked after release
- ✓ Old release is retained until deploy is confirmed stable
For broader release hardening, review Deployment Checklist and SaaS Production Checklist.
Related guides
- Database Migration Strategy
- Debugging Production Issues
- Deployment Checklist
- SaaS Production Checklist
FAQ
What is the minimum setup for zero downtime on a VPS?
Use Nginx in front of Gunicorn, run multiple Gunicorn workers, deploy to a new release directory, run compatible migrations, switch the current symlink, and reload Gunicorn gracefully.
Can I use zero downtime deployment with Flask or FastAPI?
Yes. The framework matters less than the process manager, reverse proxy, health checks, and migration strategy.
When should I avoid automatic migrations during deploy?
Avoid automatic migrations when they are large, blocking, or destructive. Run those in a planned step with compatibility checks and rollback planning.
How do I know if my reload is graceful?
Watch active requests during deploy, confirm old workers exit after finishing work, and verify Nginx does not show a spike in 502 or 503 responses.
What breaks zero downtime most often?
Schema incompatibility, direct in-place file deployment, and restarting the entire app stack at once are the most common causes.
Can a single VPS do zero downtime deployment?
Yes, if you use graceful reloads or briefly run old and new app processes side by side within available CPU and RAM.
Are database migrations the main risk?
Usually yes. Process replacement is manageable. Schema changes are where most deployment failures happen.
Should I use blue/green or rolling?
Blue/green is simpler on one host if resources allow two versions at once. Rolling is better with multiple instances.
Is Docker required?
No. Release directories plus systemd and Nginx are enough for many small SaaS products.
Can I guarantee zero dropped requests?
Not completely. You can reduce the risk significantly, but long-running requests, forced kills, bad health checks, and resource exhaustion can still interrupt traffic.
Final takeaway
Zero downtime deployment is mostly operational discipline:
- build separately
- migrate safely
- warm the new version
- switch traffic gracefully
- verify health
- keep rollback immediate
For a small SaaS product, the simplest reliable setup is usually:
- immutable release directories
- health checks
- graceful Gunicorn reloads
- backward-compatible database changes
- a tested rollback path
If your current deploy still uses in-place file changes or restart, fix that first.