App Crashes on Deployment — SaaS Builder Playbooks

Use this page when your SaaS deploy completes but the app process dies, restarts in a loop, fails health checks, or returns 500/502 right after release.

The goal is to isolate whether the crash is caused by:

startup commands
missing environment variables
dependency or build issues
database connectivity
migrations
permissions
process manager configuration

This applies to:

VPS deployments
Docker hosts
Gunicorn + Nginx setups
systemd-managed Python apps
basic production app servers behind a reverse proxy

Quick Fix / Quick Setup

Start with the app process, not Nginx. Reproduce the startup failure with the exact production command and environment.

bash

# 1) Check service status and recent logs
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager

# 2) Test the app manually with the same environment
cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

# 3) Verify Python/package/runtime paths
which python
python --version
pip freeze | tail -n 50

# 4) Check env vars, DB, and migrations
printenv | sort | grep -E 'ENV|SECRET|DATABASE|REDIS|STRIPE'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python manage.py migrate || alembic upgrade head

# 5) If behind Nginx, confirm upstream is actually listening
ss -ltnp | grep 8000
curl -I http://127.0.0.1:8000/
sudo nginx -t

Most deployment crashes come from one of five sources:

wrong start command
missing environment variables
dependency mismatch
failed migrations
wrong bind host or port

Re-run the app manually first, then fix the first real traceback you see.

What’s happening

A deployment crash usually means the application process cannot complete startup or is being killed shortly after launch.

Typical symptoms:

systemd service enters failed state
Docker container exits immediately
Gunicorn workers boot then die
health checks fail and trigger restart loops
Nginx returns 502 Bad Gateway or 500

Key rule:

the useful signal is usually in the first traceback or fatal log line
the last log line often just shows the restart symptom

If the app works locally but crashes in production, compare these assumptions:

environment variables
Python or runtime version
working directory
file permissions
network access to DB/Redis/APIs
database schema state
process manager command and service user

reverse proxy

app server

app import

env config

DB/cache

app ready

Process Flow

Step-by-step implementation

1) Inspect service and web server logs

Check the app service first.

bash

sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pager

If using Nginx:

bash

sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.log

Look for:

ModuleNotFoundError
ImportError
Permission denied
Address already in use
No such file or directory
migration failures
DB connection failures
OOM or abrupt exits

2) Reproduce the crash manually

Run the exact production command on the server.

bash

cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

For Uvicorn:

bash

uvicorn app.main:app --host 0.0.0.0 --port 8000

For Django checks:

bash

python manage.py check
python manage.py migrate

This often exposes the real traceback faster than restart-loop logs.

3) Validate the app entrypoint

Confirm the module path in your service config matches your codebase.

Examples:

bash

gunicorn app.main:app
gunicorn -k uvicorn.workers.UvicornWorker app.main:app
gunicorn myproject.wsgi:application
uvicorn app.main:app --host 0.0.0.0 --port 8000

Test import directly:

bash

python -c "import importlib; importlib.import_module('app.main')"

If import fails, the app cannot boot.

4) Check systemd service configuration

Example systemd unit:

ini

[Unit]
Description=MyApp Gunicorn
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/srv/myapp
EnvironmentFile=/srv/myapp/.env
ExecStart=/srv/myapp/.venv/bin/gunicorn app.main:app --bind 127.0.0.1:8000 --workers 2
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Reload and restart after edits:

bash

sudo systemctl daemon-reload
sudo systemctl restart myapp
sudo systemctl status myapp --no-pager

Verify:

WorkingDirectory exists
ExecStart points to the correct virtualenv binary
EnvironmentFile path is correct
service user can read the app directory and env file

5) Confirm environment variables are actually loaded

Do not assume .env in your shell is the same as runtime env in systemd or Docker.

Check env values:

bash

env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"

Common missing values:

DATABASE_URL
REDIS_URL
SECRET_KEY
ALLOWED_HOSTS
SMTP credentials
storage credentials
payment keys
OAuth secrets

Apps using strict settings loaders often fail immediately if one required variable is missing or malformed.

6) Compare runtime versions and dependencies

Check Python path and version:

bash

which python
python --version
pip freeze

Typical failures:

local uses Python 3.12, server uses 3.10
package compiled for different runtime
wrong virtualenv activated
build installed partial dependencies
service file points to old release path

If you use lockfiles, reinstall from the lockfile in production.

7) Check database connectivity and migrations

A release can fail if startup code expects schema changes that are not applied.

Test DB connectivity:

bash

python -c "import os; print(os.getenv('DATABASE_URL'))"
nc -vz localhost 5432
pg_isready

Run migrations:

bash

python manage.py migrate
# or
alembic upgrade head

If Redis is required during boot:

bash

redis-cli ping

If startup depends on DB or Redis and either is unavailable, the app may exit before serving traffic.

8) Verify filesystem paths and permissions

Common failures:

log directory not writable
socket directory owned by root
SQLite file inaccessible
temp directory permissions invalid
upload or media directory missing
static directory path incorrect

Check writable paths as the service user:

bash

sudo -u www-data test -w /srv/myapp && echo writable || echo not-writable
df -h
free -m

If using Unix sockets, confirm the directory exists and permissions match both app and Nginx users.

9) Check bind host, port, and upstream

If the app is healthy but Nginx cannot reach it, check listening ports:

bash

ss -ltnp
curl -I http://127.0.0.1:8000/

In containers, bind to 0.0.0.0, not 127.0.0.1.

Bad:

bash

uvicorn app.main:app --host 127.0.0.1 --port 8000

Good:

bash

uvicorn app.main:app --host 0.0.0.0 --port 8000

If using Nginx, confirm upstream matches the app:

nginx

location / {
    proxy_pass http://127.0.0.1:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

Validate config:

bash

sudo nginx -t

If you are seeing proxy symptoms, also check 502 Bad Gateway Fix Guide.

10) Check Docker container startup

Inspect container state:

bash

docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>

Look for:

wrong CMD or entrypoint
failed shell script in startup
health check failures
container exit code
missing env file
image tag drift

Example Compose anti-pattern:

yaml

command: gunicorn wrong.module:app

Correct example:

yaml

command: gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

If you need a full production reference, see Docker Production Setup for SaaS.

11) Check framework-specific production settings

Flask

Confirm the Gunicorn target is correct and production does not depend on local-only FLASK_ENV assumptions.

FastAPI

Use the correct worker class when using Gunicorn:

bash

gunicorn -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000

Django

Check:

ALLOWED_HOSTS
SECRET_KEY
STATIC_ROOT
DEBUG=False behavior
WSGI or ASGI module path
migrations

Example:

bash

python manage.py check --deploy
python manage.py migrate

12) Check memory, disk, and resource kills

Some crashes are not Python errors. The process may be killed by the host.

bash

free -m
df -h
top
journalctl -xe --no-pager
docker inspect <container_name>

Signs:

no traceback
process exits abruptly
container state shows OOM kill
kernel logs mention memory pressure

13) Test the full chain after the fix

Once the app starts, verify each layer in order:

bash

curl -I http://127.0.0.1:8000/
sudo nginx -t
curl -I https://yourdomain.com/

Then test the health endpoint if you have one:

bash

curl -I https://yourdomain.com/health

For a full deployment baseline, review Deploy SaaS with Nginx + Gunicorn and Environment Setup on VPS.

Common causes

Most deployment crashes come from one of these:

Incorrect app start command or wrong module path
Missing required environment variables or malformed config values
Dependency mismatch between local and production
Wrong Python version or missing virtualenv activation
Database unavailable or DATABASE_URL incorrect
Migrations not applied before restart
Redis, broker, or cache connection failure during startup
ALLOWED_HOSTS, SECRET_KEY, or framework-specific production settings missing
Permission denied for log, socket, temp, media, or SQLite files
App binding to 127.0.0.1 or wrong port inside Docker/platform runtime
Nginx upstream points to a missing socket or port
Container entrypoint or CMD misconfigured
Health check endpoint failing and causing restart loops
Out-of-memory kill or resource limits terminating the process
Startup code calling external APIs or services that are unavailable

Common deployment patterns that trigger crashes:

service file still points to an old module or old virtualenv path
build succeeded with cached dependencies, but runtime uses a different interpreter
deploy runs migrations after restart instead of before traffic switch
app writes to local disk in a read-only container
health check path depends on DB or auth and marks app unhealthy
env vars exist in CI but not on the actual server
Nginx points to a stale socket path

Debugging tips

Use these commands during isolation:

bash

sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pager
ps aux | grep -E 'gunicorn|uvicorn|python|celery'
ss -ltnp
curl -I http://127.0.0.1:8000/
sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.log
docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>
python --version
which python
pip freeze
env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python -c "import importlib; importlib.import_module('app.main')"
gunicorn app.main:app --bind 0.0.0.0:8000
uvicorn app.main:app --host 0.0.0.0 --port 8000
python manage.py check
python manage.py migrate
alembic upgrade head
nc -vz localhost 5432
pg_isready
redis-cli ping
free -m
df -h
top
sudo -u www-data test -w /srv/myapp && echo writable || echo not-writable

Practical rules:

Fix the first crash in logs, not the final restart message.
Run the exact production command manually.
Reduce startup complexity by disabling optional integrations temporarily.
Keep /health lightweight and unauthenticated.
Use one source of truth for config loading.
Pin runtime and dependency versions.
If the issue is broad and spans multiple services, use Debugging Production Issues.
If the app boots but exceptions continue at runtime, add Error Tracking with Sentry.

Start

Process

End

checklist diagram showing app startup dependencies and where each can fail.

Checklist

✓ Service logs inspected and first traceback identified
✓ App start command manually tested on server
✓ Correct Python or runtime version confirmed
✓ Virtualenv or container image verified
✓ Required environment variables present in runtime
✓ Database reachable and migrations applied
✓ Redis or cache reachable if required
✓ Module path and Gunicorn/Uvicorn/WSGI config validated
✓ Nginx upstream port or socket matches app config
✓ Static, media, log, and temp directories writable
✓ Health check endpoint returns 200
✓ Restart tested and public URL verified
✓ Production deploy steps reviewed against SaaS Production Checklist

Product CTA

If you want fewer failed releases, use a deployment workflow that enforces:

preflight config validation
environment checks before restart
explicit migration steps
health checks before traffic switch
log aggregation for startup failures
predictable rollback paths

This is especially useful for solo builders shipping frequent MVP updates. A small deployment toolkit or internal release script that validates env, DB access, service config, and upstream health before restart will remove most of the guesswork from production deploys.

Related guides

FAQ

What is the fastest way to diagnose an app crash after deployment?

Run the exact production start command manually on the server with the same environment variables. That usually exposes the real traceback faster than reading restart-loop logs alone.

Why do I see 502 from Nginx when the real problem is the app?

A 502 usually means Nginx cannot reach the upstream app process. If the app crashes before binding its socket or port, Nginx only shows the proxy symptom.

Can environment variables cause immediate startup failure?

Yes. Many apps validate configuration at import time or startup. A missing DATABASE_URL, SECRET_KEY, SMTP credential, or storage config can terminate the process before it serves requests.

Should migrations run before or after restarting the app?

Usually before switching traffic to the new release. Restarting first can cause code to hit schema changes that do not exist yet.

How do I tell if the process is being killed by memory limits?

Check system logs, container inspect output, and memory metrics. OOM kills often appear as abrupt exits without a normal Python traceback.

Should I debug Nginx first?

No. First prove the app process can start and listen locally on the server. Then debug the proxy layer.

What if logs are empty?

The process may be exiting before stdout is captured. Run the command manually, check the systemd unit configuration, or inspect Docker logs and entrypoint scripts.

Why does it restart in a loop?

Health checks or the process manager detect a failed startup and automatically retry. The loop is a symptom, not the cause.

Final takeaway

Deployment crashes are usually straightforward once you reproduce them with the exact production command and environment.

Use this order:

start at the app process
capture the first traceback
verify config and dependency paths
confirm DB, Redis, and filesystem assumptions
test the proxy only after the app is listening

Once fixed, add:

env validation
explicit migration steps
lightweight health checks
centralized error capture
predictable restart and rollback workflow

For broader production hardening, review SaaS Production Checklist.