App Crashes on Deployment
The essential playbook for implementing app crashes on deployment in your SaaS.
Use this page when your SaaS deploy completes but the app process dies, restarts in a loop, fails health checks, or returns 500/502 right after release.
The goal is to isolate whether the crash is caused by:
- startup commands
- missing environment variables
- dependency or build issues
- database connectivity
- migrations
- permissions
- process manager configuration
This applies to:
- VPS deployments
- Docker hosts
- Gunicorn + Nginx setups
- systemd-managed Python apps
- basic production app servers behind a reverse proxy
Quick Fix / Quick Setup
Start with the app process, not Nginx. Reproduce the startup failure with the exact production command and environment.
# 1) Check service status and recent logs
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
# 2) Test the app manually with the same environment
cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2
# 3) Verify Python/package/runtime paths
which python
python --version
pip freeze | tail -n 50
# 4) Check env vars, DB, and migrations
printenv | sort | grep -E 'ENV|SECRET|DATABASE|REDIS|STRIPE'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python manage.py migrate || alembic upgrade head
# 5) If behind Nginx, confirm upstream is actually listening
ss -ltnp | grep 8000
curl -I http://127.0.0.1:8000/
sudo nginx -tMost deployment crashes come from one of five sources:
- wrong start command
- missing environment variables
- dependency mismatch
- failed migrations
- wrong bind host or port
Re-run the app manually first, then fix the first real traceback you see.
What’s happening
A deployment crash usually means the application process cannot complete startup or is being killed shortly after launch.
Typical symptoms:
systemdservice enters failed state- Docker container exits immediately
- Gunicorn workers boot then die
- health checks fail and trigger restart loops
- Nginx returns
502 Bad Gatewayor500
Key rule:
- the useful signal is usually in the first traceback or fatal log line
- the last log line often just shows the restart symptom
If the app works locally but crashes in production, compare these assumptions:
- environment variables
- Python or runtime version
- working directory
- file permissions
- network access to DB/Redis/APIs
- database schema state
- process manager command and service user
Process Flow
Step-by-step implementation
1) Inspect service and web server logs
Check the app service first.
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pagerIf using Nginx:
sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.logLook for:
ModuleNotFoundErrorImportErrorPermission deniedAddress already in useNo such file or directory- migration failures
- DB connection failures
- OOM or abrupt exits
2) Reproduce the crash manually
Run the exact production command on the server.
cd /srv/myapp
source .venv/bin/activate
export $(grep -v '^#' .env | xargs)
gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2For Uvicorn:
uvicorn app.main:app --host 0.0.0.0 --port 8000For Django checks:
python manage.py check
python manage.py migrateThis often exposes the real traceback faster than restart-loop logs.
3) Validate the app entrypoint
Confirm the module path in your service config matches your codebase.
Examples:
gunicorn app.main:app
gunicorn -k uvicorn.workers.UvicornWorker app.main:app
gunicorn myproject.wsgi:application
uvicorn app.main:app --host 0.0.0.0 --port 8000Test import directly:
python -c "import importlib; importlib.import_module('app.main')"If import fails, the app cannot boot.
4) Check systemd service configuration
Example systemd unit:
[Unit]
Description=MyApp Gunicorn
After=network.target
[Service]
User=www-data
Group=www-data
WorkingDirectory=/srv/myapp
EnvironmentFile=/srv/myapp/.env
ExecStart=/srv/myapp/.venv/bin/gunicorn app.main:app --bind 127.0.0.1:8000 --workers 2
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.targetReload and restart after edits:
sudo systemctl daemon-reload
sudo systemctl restart myapp
sudo systemctl status myapp --no-pagerVerify:
WorkingDirectoryexistsExecStartpoints to the correct virtualenv binaryEnvironmentFilepath is correct- service user can read the app directory and env file
5) Confirm environment variables are actually loaded
Do not assume .env in your shell is the same as runtime env in systemd or Docker.
Check env values:
env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"Common missing values:
DATABASE_URLREDIS_URLSECRET_KEYALLOWED_HOSTS- SMTP credentials
- storage credentials
- payment keys
- OAuth secrets
Apps using strict settings loaders often fail immediately if one required variable is missing or malformed.
6) Compare runtime versions and dependencies
Check Python path and version:
which python
python --version
pip freezeTypical failures:
- local uses Python 3.12, server uses 3.10
- package compiled for different runtime
- wrong virtualenv activated
- build installed partial dependencies
- service file points to old release path
If you use lockfiles, reinstall from the lockfile in production.
7) Check database connectivity and migrations
A release can fail if startup code expects schema changes that are not applied.
Test DB connectivity:
python -c "import os; print(os.getenv('DATABASE_URL'))"
nc -vz localhost 5432
pg_isreadyRun migrations:
python manage.py migrate
# or
alembic upgrade headIf Redis is required during boot:
redis-cli pingIf startup depends on DB or Redis and either is unavailable, the app may exit before serving traffic.
8) Verify filesystem paths and permissions
Common failures:
- log directory not writable
- socket directory owned by root
- SQLite file inaccessible
- temp directory permissions invalid
- upload or media directory missing
- static directory path incorrect
Check writable paths as the service user:
sudo -u www-data test -w /srv/myapp && echo writable || echo not-writable
df -h
free -mIf using Unix sockets, confirm the directory exists and permissions match both app and Nginx users.
9) Check bind host, port, and upstream
If the app is healthy but Nginx cannot reach it, check listening ports:
ss -ltnp
curl -I http://127.0.0.1:8000/In containers, bind to 0.0.0.0, not 127.0.0.1.
Bad:
uvicorn app.main:app --host 127.0.0.1 --port 8000Good:
uvicorn app.main:app --host 0.0.0.0 --port 8000If using Nginx, confirm upstream matches the app:
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}Validate config:
sudo nginx -tIf you are seeing proxy symptoms, also check 502 Bad Gateway Fix Guide.
10) Check Docker container startup
Inspect container state:
docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>Look for:
- wrong
CMDor entrypoint - failed shell script in startup
- health check failures
- container exit code
- missing env file
- image tag drift
Example Compose anti-pattern:
command: gunicorn wrong.module:appCorrect example:
command: gunicorn app.main:app --bind 0.0.0.0:8000 --workers 2If you need a full production reference, see Docker Production Setup for SaaS.
11) Check framework-specific production settings
Flask
Confirm the Gunicorn target is correct and production does not depend on local-only FLASK_ENV assumptions.
FastAPI
Use the correct worker class when using Gunicorn:
gunicorn -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000Django
Check:
ALLOWED_HOSTSSECRET_KEYSTATIC_ROOTDEBUG=Falsebehavior- WSGI or ASGI module path
- migrations
Example:
python manage.py check --deploy
python manage.py migrate12) Check memory, disk, and resource kills
Some crashes are not Python errors. The process may be killed by the host.
free -m
df -h
top
journalctl -xe --no-pager
docker inspect <container_name>Signs:
- no traceback
- process exits abruptly
- container state shows OOM kill
- kernel logs mention memory pressure
13) Test the full chain after the fix
Once the app starts, verify each layer in order:
curl -I http://127.0.0.1:8000/
sudo nginx -t
curl -I https://yourdomain.com/Then test the health endpoint if you have one:
curl -I https://yourdomain.com/healthFor a full deployment baseline, review Deploy SaaS with Nginx + Gunicorn and Environment Setup on VPS.
Common causes
Most deployment crashes come from one of these:
- Incorrect app start command or wrong module path
- Missing required environment variables or malformed config values
- Dependency mismatch between local and production
- Wrong Python version or missing virtualenv activation
- Database unavailable or
DATABASE_URLincorrect - Migrations not applied before restart
- Redis, broker, or cache connection failure during startup
ALLOWED_HOSTS,SECRET_KEY, or framework-specific production settings missing- Permission denied for log, socket, temp, media, or SQLite files
- App binding to
127.0.0.1or wrong port inside Docker/platform runtime - Nginx upstream points to a missing socket or port
- Container entrypoint or
CMDmisconfigured - Health check endpoint failing and causing restart loops
- Out-of-memory kill or resource limits terminating the process
- Startup code calling external APIs or services that are unavailable
Common deployment patterns that trigger crashes:
- service file still points to an old module or old virtualenv path
- build succeeded with cached dependencies, but runtime uses a different interpreter
- deploy runs migrations after restart instead of before traffic switch
- app writes to local disk in a read-only container
- health check path depends on DB or auth and marks app unhealthy
- env vars exist in CI but not on the actual server
- Nginx points to a stale socket path
Debugging tips
Use these commands during isolation:
sudo systemctl status myapp --no-pager
journalctl -u myapp -n 200 --no-pager
journalctl -xe --no-pager
ps aux | grep -E 'gunicorn|uvicorn|python|celery'
ss -ltnp
curl -I http://127.0.0.1:8000/
sudo nginx -t
sudo tail -n 200 /var/log/nginx/error.log
docker ps -a
docker logs --tail 200 <container_name>
docker inspect <container_name>
python --version
which python
pip freeze
env | sort
printenv | grep -E 'DATABASE|REDIS|SECRET|ENV|PORT'
python -c "import os; print(os.getenv('DATABASE_URL'))"
python -c "import importlib; importlib.import_module('app.main')"
gunicorn app.main:app --bind 0.0.0.0:8000
uvicorn app.main:app --host 0.0.0.0 --port 8000
python manage.py check
python manage.py migrate
alembic upgrade head
nc -vz localhost 5432
pg_isready
redis-cli ping
free -m
df -h
top
sudo -u www-data test -w /srv/myapp && echo writable || echo not-writablePractical rules:
- Fix the first crash in logs, not the final restart message.
- Run the exact production command manually.
- Reduce startup complexity by disabling optional integrations temporarily.
- Keep
/healthlightweight and unauthenticated. - Use one source of truth for config loading.
- Pin runtime and dependency versions.
- If the issue is broad and spans multiple services, use Debugging Production Issues.
- If the app boots but exceptions continue at runtime, add Error Tracking with Sentry.
checklist diagram showing app startup dependencies and where each can fail.
Checklist
- ✓ Service logs inspected and first traceback identified
- ✓ App start command manually tested on server
- ✓ Correct Python or runtime version confirmed
- ✓ Virtualenv or container image verified
- ✓ Required environment variables present in runtime
- ✓ Database reachable and migrations applied
- ✓ Redis or cache reachable if required
- ✓ Module path and Gunicorn/Uvicorn/WSGI config validated
- ✓ Nginx upstream port or socket matches app config
- ✓ Static, media, log, and temp directories writable
- ✓ Health check endpoint returns
200 - ✓ Restart tested and public URL verified
- ✓ Production deploy steps reviewed against SaaS Production Checklist
Product CTA
If you want fewer failed releases, use a deployment workflow that enforces:
- preflight config validation
- environment checks before restart
- explicit migration steps
- health checks before traffic switch
- log aggregation for startup failures
- predictable rollback paths
This is especially useful for solo builders shipping frequent MVP updates. A small deployment toolkit or internal release script that validates env, DB access, service config, and upstream health before restart will remove most of the guesswork from production deploys.
Related guides
- Deploy SaaS with Nginx + Gunicorn
- Docker Production Setup for SaaS
- Environment Setup on VPS
- 502 Bad Gateway Fix Guide
- Database Connection Errors
FAQ
What is the fastest way to diagnose an app crash after deployment?
Run the exact production start command manually on the server with the same environment variables. That usually exposes the real traceback faster than reading restart-loop logs alone.
Why do I see 502 from Nginx when the real problem is the app?
A 502 usually means Nginx cannot reach the upstream app process. If the app crashes before binding its socket or port, Nginx only shows the proxy symptom.
Can environment variables cause immediate startup failure?
Yes. Many apps validate configuration at import time or startup. A missing DATABASE_URL, SECRET_KEY, SMTP credential, or storage config can terminate the process before it serves requests.
Should migrations run before or after restarting the app?
Usually before switching traffic to the new release. Restarting first can cause code to hit schema changes that do not exist yet.
How do I tell if the process is being killed by memory limits?
Check system logs, container inspect output, and memory metrics. OOM kills often appear as abrupt exits without a normal Python traceback.
Should I debug Nginx first?
No. First prove the app process can start and listen locally on the server. Then debug the proxy layer.
What if logs are empty?
The process may be exiting before stdout is captured. Run the command manually, check the systemd unit configuration, or inspect Docker logs and entrypoint scripts.
Why does it restart in a loop?
Health checks or the process manager detect a failed startup and automatically retry. The loop is a symptom, not the cause.
Final takeaway
Deployment crashes are usually straightforward once you reproduce them with the exact production command and environment.
Use this order:
- start at the app process
- capture the first traceback
- verify config and dependency paths
- confirm DB, Redis, and filesystem assumptions
- test the proxy only after the app is listening
Once fixed, add:
- env validation
- explicit migration steps
- lightweight health checks
- centralized error capture
- predictable restart and rollback workflow
For broader production hardening, review SaaS Production Checklist.