App Crashes on Deployment

The essential playbook for implementing app crashes on deployment in your SaaS.

Use this page when your SaaS deploy completes but the app process dies, restarts in a loop, fails health checks, or returns 500/502 right after release.

The goal is to isolate whether the crash is caused by:

  • startup commands
  • missing environment variables
  • dependency or build issues
  • database connectivity
  • migrations
  • permissions
  • process manager configuration

This applies to:

  • VPS deployments
  • Docker hosts
  • Gunicorn + Nginx setups
  • systemd-managed Python apps
  • basic production app servers behind a reverse proxy

Quick Fix / Quick Setup

Start with the app process, not Nginx. Reproduce the startup failure with the exact production command and environment.

bash
# 1) Check service status and recent logs
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager

# 2) Test the app manually with the same environment
cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

# 3) Verify Python/package/runtime paths
which python
python --version
pip freeze | tail -n 50

# 4) Check env vars, DB, and migrations
printenv | sort | grep -E 'ENV|SECRET|DATABASE|REDIS|STRIPE'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python manage.py migrate || alembic upgrade head

# 5) If behind Nginx, confirm upstream is actually listening
ss -ltnp | grep 8000
curl -I http://127.0.0.1:8000/
sudo nginx -t

Most deployment crashes come from one of five sources:

  • wrong start command
  • missing environment variables
  • dependency mismatch
  • failed migrations
  • wrong bind host or port

Re-run the app manually first, then fix the first real traceback you see.


What’s happening

A deployment crash usually means the application process cannot complete startup or is being killed shortly after launch.

Typical symptoms:

  • systemd service enters failed state
  • Docker container exits immediately
  • Gunicorn workers boot then die
  • health checks fail and trigger restart loops
  • Nginx returns 502 Bad Gateway or 500

Key rule:

  • the useful signal is usually in the first traceback or fatal log line
  • the last log line often just shows the restart symptom

If the app works locally but crashes in production, compare these assumptions:

  • environment variables
  • Python or runtime version
  • working directory
  • file permissions
  • network access to DB/Redis/APIs
  • database schema state
  • process manager command and service user
reverse proxy
app server
app import
env config
DB/cache
app ready

Process Flow


Step-by-step implementation

1) Inspect service and web server logs

Check the app service first.

bash
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pager

If using Nginx:

bash
sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.log

Look for:

  • ModuleNotFoundError
  • ImportError
  • Permission denied
  • Address already in use
  • No such file or directory
  • migration failures
  • DB connection failures
  • OOM or abrupt exits

2) Reproduce the crash manually

Run the exact production command on the server.

bash
cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

For Uvicorn:

bash
uvicorn app.main:app --host 0.0.0.0 --port 8000

For Django checks:

bash
python manage.py check
python manage.py migrate

This often exposes the real traceback faster than restart-loop logs.

3) Validate the app entrypoint

Confirm the module path in your service config matches your codebase.

Examples:

bash
gunicorn app.main:app
gunicorn -k uvicorn.workers.UvicornWorker app.main:app
gunicorn myproject.wsgi:application
uvicorn app.main:app --host 0.0.0.0 --port 8000

Test import directly:

bash
python -c "import importlib; importlib.import_module('app.main')"

If import fails, the app cannot boot.

4) Check systemd service configuration

Example systemd unit:

ini
[Unit]
Description=MyApp Gunicorn
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/srv/myapp
EnvironmentFile=/srv/myapp/.env
ExecStart=/srv/myapp/.venv/bin/gunicorn app.main:app --bind 127.0.0.1:8000 --workers 2
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Reload and restart after edits:

bash
sudo systemctl daemon-reload
sudo systemctl restart myapp
sudo systemctl status myapp --no-pager

Verify:

  • WorkingDirectory exists
  • ExecStart points to the correct virtualenv binary
  • EnvironmentFile path is correct
  • service user can read the app directory and env file

5) Confirm environment variables are actually loaded

Do not assume .env in your shell is the same as runtime env in systemd or Docker.

Check env values:

bash
env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"

Common missing values:

  • DATABASE_URL
  • REDIS_URL
  • SECRET_KEY
  • ALLOWED_HOSTS
  • SMTP credentials
  • storage credentials
  • payment keys
  • OAuth secrets

Apps using strict settings loaders often fail immediately if one required variable is missing or malformed.

6) Compare runtime versions and dependencies

Check Python path and version:

bash
which python
python --version
pip freeze

Typical failures:

  • local uses Python 3.12, server uses 3.10
  • package compiled for different runtime
  • wrong virtualenv activated
  • build installed partial dependencies
  • service file points to old release path

If you use lockfiles, reinstall from the lockfile in production.

7) Check database connectivity and migrations

A release can fail if startup code expects schema changes that are not applied.

Test DB connectivity:

bash
python -c "import os; print(os.getenv('DATABASE_URL'))"
nc -vz localhost 5432
pg_isready

Run migrations:

bash
python manage.py migrate
# or
alembic upgrade head

If Redis is required during boot:

bash
redis-cli ping

If startup depends on DB or Redis and either is unavailable, the app may exit before serving traffic.

8) Verify filesystem paths and permissions

Common failures:

  • log directory not writable
  • socket directory owned by root
  • SQLite file inaccessible
  • temp directory permissions invalid
  • upload or media directory missing
  • static directory path incorrect

Check writable paths as the service user:

bash
sudo -u www-data test -w /srv/myapp && echo writable || echo not-writable
df -h
free -m

If using Unix sockets, confirm the directory exists and permissions match both app and Nginx users.

9) Check bind host, port, and upstream

If the app is healthy but Nginx cannot reach it, check listening ports:

bash
ss -ltnp
curl -I http://127.0.0.1:8000/

In containers, bind to 0.0.0.0, not 127.0.0.1.

Bad:

bash
uvicorn app.main:app --host 127.0.0.1 --port 8000

Good:

bash
uvicorn app.main:app --host 0.0.0.0 --port 8000

If using Nginx, confirm upstream matches the app:

nginx
location / {
    proxy_pass http://127.0.0.1:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

Validate config:

bash
sudo nginx -t

If you are seeing proxy symptoms, also check 502 Bad Gateway Fix Guide.

10) Check Docker container startup

Inspect container state:

bash
docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>

Look for:

  • wrong CMD or entrypoint
  • failed shell script in startup
  • health check failures
  • container exit code
  • missing env file
  • image tag drift

Example Compose anti-pattern:

yaml
command: gunicorn wrong.module:app

Correct example:

yaml
command: gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2

If you need a full production reference, see Docker Production Setup for SaaS.

11) Check framework-specific production settings

Flask

Confirm the Gunicorn target is correct and production does not depend on local-only FLASK_ENV assumptions.

FastAPI

Use the correct worker class when using Gunicorn:

bash
gunicorn -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000

Django

Check:

  • ALLOWED_HOSTS
  • SECRET_KEY
  • STATIC_ROOT
  • DEBUG=False behavior
  • WSGI or ASGI module path
  • migrations

Example:

bash
python manage.py check --deploy
python manage.py migrate

12) Check memory, disk, and resource kills

Some crashes are not Python errors. The process may be killed by the host.

bash
free -m
df -h
top
journalctl -xe --no-pager
docker inspect <container_name>

Signs:

  • no traceback
  • process exits abruptly
  • container state shows OOM kill
  • kernel logs mention memory pressure

13) Test the full chain after the fix

Once the app starts, verify each layer in order:

bash
curl -I http://127.0.0.1:8000/
sudo nginx -t
curl -I https://yourdomain.com/

Then test the health endpoint if you have one:

bash
curl -I https://yourdomain.com/health

For a full deployment baseline, review Deploy SaaS with Nginx + Gunicorn and Environment Setup on VPS.


Common causes

Most deployment crashes come from one of these:

  • Incorrect app start command or wrong module path
  • Missing required environment variables or malformed config values
  • Dependency mismatch between local and production
  • Wrong Python version or missing virtualenv activation
  • Database unavailable or DATABASE_URL incorrect
  • Migrations not applied before restart
  • Redis, broker, or cache connection failure during startup
  • ALLOWED_HOSTS, SECRET_KEY, or framework-specific production settings missing
  • Permission denied for log, socket, temp, media, or SQLite files
  • App binding to 127.0.0.1 or wrong port inside Docker/platform runtime
  • Nginx upstream points to a missing socket or port
  • Container entrypoint or CMD misconfigured
  • Health check endpoint failing and causing restart loops
  • Out-of-memory kill or resource limits terminating the process
  • Startup code calling external APIs or services that are unavailable

Common deployment patterns that trigger crashes:

  • service file still points to an old module or old virtualenv path
  • build succeeded with cached dependencies, but runtime uses a different interpreter
  • deploy runs migrations after restart instead of before traffic switch
  • app writes to local disk in a read-only container
  • health check path depends on DB or auth and marks app unhealthy
  • env vars exist in CI but not on the actual server
  • Nginx points to a stale socket path

Debugging tips

Use these commands during isolation:

bash
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pager
ps aux | grep -E 'gunicorn|uvicorn|python|celery'
ss -ltnp
curl -I http://127.0.0.1:8000/
sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.log
docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>
python --version
which python
pip freeze
env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python -c "import importlib; importlib.import_module('app.main')"
gunicorn app.main:app --bind 0.0.0.0:8000
uvicorn app.main:app --host 0.0.0.0 --port 8000
python manage.py check
python manage.py migrate
alembic upgrade head
nc -vz localhost 5432
pg_isready
redis-cli ping
free -m
df -h
top
sudo -u www-data test -w /srv/myapp && echo writable || echo not-writable

Practical rules:

  • Fix the first crash in logs, not the final restart message.
  • Run the exact production command manually.
  • Reduce startup complexity by disabling optional integrations temporarily.
  • Keep /health lightweight and unauthenticated.
  • Use one source of truth for config loading.
  • Pin runtime and dependency versions.
  • If the issue is broad and spans multiple services, use Debugging Production Issues.
  • If the app boots but exceptions continue at runtime, add Error Tracking with Sentry.
Start
Process
End

checklist diagram showing app startup dependencies and where each can fail.


Checklist

  • Service logs inspected and first traceback identified
  • App start command manually tested on server
  • Correct Python or runtime version confirmed
  • Virtualenv or container image verified
  • Required environment variables present in runtime
  • Database reachable and migrations applied
  • Redis or cache reachable if required
  • Module path and Gunicorn/Uvicorn/WSGI config validated
  • Nginx upstream port or socket matches app config
  • Static, media, log, and temp directories writable
  • Health check endpoint returns 200
  • Restart tested and public URL verified
  • Production deploy steps reviewed against SaaS Production Checklist

Product CTA

If you want fewer failed releases, use a deployment workflow that enforces:

  • preflight config validation
  • environment checks before restart
  • explicit migration steps
  • health checks before traffic switch
  • log aggregation for startup failures
  • predictable rollback paths

This is especially useful for solo builders shipping frequent MVP updates. A small deployment toolkit or internal release script that validates env, DB access, service config, and upstream health before restart will remove most of the guesswork from production deploys.


Related guides


FAQ

What is the fastest way to diagnose an app crash after deployment?

Run the exact production start command manually on the server with the same environment variables. That usually exposes the real traceback faster than reading restart-loop logs alone.

Why do I see 502 from Nginx when the real problem is the app?

A 502 usually means Nginx cannot reach the upstream app process. If the app crashes before binding its socket or port, Nginx only shows the proxy symptom.

Can environment variables cause immediate startup failure?

Yes. Many apps validate configuration at import time or startup. A missing DATABASE_URL, SECRET_KEY, SMTP credential, or storage config can terminate the process before it serves requests.

Should migrations run before or after restarting the app?

Usually before switching traffic to the new release. Restarting first can cause code to hit schema changes that do not exist yet.

How do I tell if the process is being killed by memory limits?

Check system logs, container inspect output, and memory metrics. OOM kills often appear as abrupt exits without a normal Python traceback.

Should I debug Nginx first?

No. First prove the app process can start and listen locally on the server. Then debug the proxy layer.

What if logs are empty?

The process may be exiting before stdout is captured. Run the command manually, check the systemd unit configuration, or inspect Docker logs and entrypoint scripts.

Why does it restart in a loop?

Health checks or the process manager detect a failed startup and automatically retry. The loop is a symptom, not the cause.


Final takeaway

Deployment crashes are usually straightforward once you reproduce them with the exact production command and environment.

Use this order:

  1. start at the app process
  2. capture the first traceback
  3. verify config and dependency paths
  4. confirm DB, Redis, and filesystem assumptions
  5. test the proxy only after the app is listening

Once fixed, add:

  • env validation
  • explicit migration steps
  • lightweight health checks
  • centralized error capture
  • predictable restart and rollback workflow

For broader production hardening, review SaaS Production Checklist.