Database Connection Errors
The essential playbook for implementing database connection errors in your SaaS.
Use this page when your app cannot connect to the database, shows connection refused or timeout errors, fails after deployment, or intermittently loses DB access in production. The goal is to isolate whether the failure is caused by credentials, network reachability, TLS, connection limits, container networking, or application configuration.
Quick Fix / Quick Setup
Run these checks from the same VM or container where the app is running.
# 1) Verify env vars actually loaded in the running process
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'
# 2) Test raw TCP reachability to the database host
nc -vz $DB_HOST ${DB_PORT:-5432}
# 3) Test login directly with the DB client
# Postgres
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'
# MySQL
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"
# 4) If using Docker, confirm service name and network
docker ps
docker network ls
docker inspect <app_container>
# 5) Restart app only after verifying DB is reachable
systemctl restart gunicorn || docker compose restart appIf direct DB client access fails from the same host or container, this is not an ORM bug. Fix network, credentials, firewall, TLS, or database availability first.
What’s happening
A database connection error means the app failed at one of these layers:
- Config load
- DNS resolution
- TCP socket connection
- TLS negotiation
- Authentication
- Database selection
- Connection pool checkout
Typical production error patterns:
connection refusedconnection timed outno route to hostpassword authentication faileddatabase does not existcould not translate host nameSSL requiredtoo many connections- pool timeout or checkout timeout errors
Important runtime differences:
- In Docker,
localhostpoints to the current container, not the database container. - In VPS deployments, firewall rules, bind addresses, and provider allowlists commonly block access.
- In managed databases, SSL mode and client IP allowlists are frequent failure points.
- Under load, connection pool exhaustion can appear as intermittent application 500s.
Process Flow
Step-by-step implementation
1) Extract the exact database error
Do not debug a generic 500. Pull the actual database exception from application logs.
# systemd app
journalctl -u gunicorn -n 200 --no-pager
# Docker app
docker logs <app_container> --tail 200Also check the database logs:
# Postgres
journalctl -u postgresql -n 200 --no-pager
# MySQL
journalctl -u mysql -n 200 --no-pager
# Docker DB
docker logs <db_container> --tail 200Look for messages that clearly indicate one category:
- name resolution failure
- refused connection
- auth failure
- SSL requirement
- max connections
- pool timeout
2) Verify runtime configuration
Check what the running process actually sees.
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'Common issues:
- old values still loaded in
systemd - stale
.envfile in Docker Compose - CI/CD secret updated but app not restarted
- app reading
DATABASE_URLwhile you changed onlyDB_* - password contains special characters and breaks URL parsing
If using a connection URL, verify encoding. Example:
postgresql://user:pa%24%24word@db.example.com:5432/appdbNot:
postgresql://user:pa$$word@db.example.com:5432/appdb3) Test DNS resolution from the app runtime
getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOSTIf this fails:
- wrong hostname
- internal DNS issue
- stale DNS after infrastructure change
- service name mismatch in Docker Compose
For Docker Compose, the host should often be the service name:
services:
app:
environment:
DB_HOST: db
db:
image: postgres:16Not localhost.
4) Test raw TCP connectivity
nc -vz $DB_HOST ${DB_PORT:-5432}
# or
telnet $DB_HOST ${DB_PORT:-5432}Interpretation:
succeeded-> network path works, move to auth/TLSconnection refused-> host reachable, port closed or service not listening- timeout -> firewall, security group, routing, allowlist, wrong IP, overloaded DB
- name resolution error -> DNS problem
If the app runs in Docker, enter the container and test from there:
docker exec -it <app_container> sh
nc -vz $DB_HOST ${DB_PORT:-5432}5) Test direct database authentication
Postgres
PGPASSWORD="$DB_PASSWORD" psql \
-h "$DB_HOST" \
-p "${DB_PORT:-5432}" \
-U "$DB_USER" \
-d "$DB_NAME" \
-c 'select 1;'MySQL
mysql \
-h "$DB_HOST" \
-P "${DB_PORT:-3306}" \
-u "$DB_USER" \
-p"$DB_PASSWORD" \
-e 'select 1' \
"$DB_NAME"If TCP works but login fails, focus on:
- wrong username or password
- wrong database name
- host-based access rules
- missing grants
- SSL mode mismatch
- stale secret in runtime environment
6) Check database server health
Service status
systemctl status postgresql
systemctl status mysqlListening sockets
ss -ltnp | grep -E '5432|3306'You want to confirm the DB is listening on the expected interface.
Examples:
127.0.0.1:5432only -> reachable only locally on that host0.0.0.0:3306or private IP -> reachable externally depending on firewall rules
7) Validate server-side access configuration
Postgres
Check listen_addresses:
# postgresql.conf
listen_addresses = '*'
port = 5432Check host rules in pg_hba.conf:
# TYPE DATABASE USER ADDRESS METHOD
host appdb appuser 10.0.0.0/24 scram-sha-256Reload after changes:
sudo systemctl reload postgresqlMySQL
Check bind address:
[mysqld]
bind-address = 0.0.0.0
port = 3306Verify grants:
SHOW GRANTS FOR 'appuser'@'%';Grant example:
GRANT ALL PRIVILEGES ON appdb.* TO 'appuser'@'10.%' IDENTIFIED BY 'strong-password';
FLUSH PRIVILEGES;For managed DBs, use provider network allowlists instead of opening public access broadly.
8) Verify firewall and network policy
On VPS:
sudo ufw statusCloud platforms may also require security group or network rule updates.
Check that the app server IP is allowed to reach the DB port:
- Postgres:
5432 - MySQL:
3306
If managed DB access works locally but not from production, this is often an allowlist problem.
9) Check TLS / SSL mode
Managed providers often require SSL. If your app disables it, auth may fail or the connection may be rejected.
Examples:
SQLAlchemy Postgres URL
DATABASE_URL="postgresql+psycopg://appuser:password@db.example.com:5432/appdb?sslmode=require"psycopg connect args
import os
from sqlalchemy import create_engine
engine = create_engine(
os.environ["DATABASE_URL"],
pool_pre_ping=True,
)MySQL SQLAlchemy URL
DATABASE_URL="mysql+pymysql://appuser:password@db.example.com:3306/appdb?ssl=true"If your provider requires a CA bundle, pass it explicitly according to your driver.
10) Fix Docker and Compose networking
Common working Compose example:
services:
app:
build: .
environment:
DB_HOST: db
DB_PORT: 5432
DB_NAME: appdb
DB_USER: appuser
DB_PASSWORD: secret
depends_on:
db:
condition: service_healthy
db:
image: postgres:16
environment:
POSTGRES_DB: appdb
POSTGRES_USER: appuser
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
interval: 5s
timeout: 5s
retries: 10Checks:
docker ps
docker network ls
docker inspect <app_container>
docker inspect <db_container>Do not assume depends_on alone solves readiness unless you also use health checks or app retry logic.
11) Check pool exhaustion and max connections
If errors only appear under traffic spikes, inspect connection counts and pool settings.
Typical symptoms:
- app works after restart, then fails again under load
- random 500s
- timeout waiting for connection from pool
- DB reports
too many connections
SQLAlchemy example
from sqlalchemy import create_engine
engine = create_engine(
DATABASE_URL,
pool_size=5,
max_overflow=10,
pool_timeout=30,
pool_recycle=1800,
pool_pre_ping=True,
)Guidelines for small SaaS apps:
- keep pool size conservative
- multiply pool size by total worker/process count
- ensure total possible app connections stay below DB
max_connections - enable
pool_pre_ping=Truefor stale dropped connections
Example planning:
- 2 Gunicorn workers
- pool_size=5
- max_overflow=5
Potential peak app connections: 2 * (5 + 5) = 20
If the DB only supports 25 total connections and background jobs also use the DB, this can fail quickly.
12) Add startup retry only after fixing root cause
If the app starts before the DB is ready, a short retry loop helps. It does not replace correct networking or credentials.
Example shell entrypoint:
#!/usr/bin/env sh
set -e
until nc -z "$DB_HOST" "${DB_PORT:-5432}"; do
echo "waiting for db..."
sleep 2
done
exec gunicorn app:app --bind 0.0.0.0:8000Prefer application-level retry for transient startup conditions.
13) Verify migrations are not blocking startup
If your deploy runs migrations on boot, a failing migration can make the app look like it has a plain DB connectivity problem.
Check migration logs and run manually if needed.
See also: Database Migration Strategy
14) Validate the fix end-to-end
After changes:
- restart the app
- run a simple read query
- test one write path
- verify logs stop showing connection failures
Example:
systemctl restart gunicorn
journalctl -u gunicorn -n 100 --no-pagerOr:
docker compose restart app
docker logs <app_container> --tail 100Common causes
- Wrong
DATABASE_URLor mismatchedDB_HOST,DB_PORT,DB_USER,DB_PASSWORD, orDB_NAME - Using
localhostinside a container when the database runs elsewhere - Database service is down, crashed, restarting, or still booting
- Firewall, security group, or provider allowlist blocks traffic
- Database listens only on
127.0.0.1 - Incorrect
pg_hba.confrules in Postgres - Missing MySQL host grants
- SSL/TLS required by provider but disabled in app
- App pool exhaustion or DB
max_connectionsreached - DNS resolution failure after infra changes
- Secret rotation happened but app still uses old values
- PgBouncer or pooler mode incompatible with ORM behavior
- Long-running transactions causing connection starvation
Debugging tips
- Compare the exact production connection string with the one that works locally.
- Run all network and DB-client tests from the app container or VM, not your laptop.
- Match timestamps between app logs and DB logs.
- If failures only happen during load, inspect pool settings, active connections, and slow queries.
- In Docker Compose, confirm the resolved environment values and service names.
- If using managed Postgres or MySQL, verify
sslmode=requireor required CA settings. - Diff the latest deploy for changes in secrets, image tag, migration command, or worker count.
- Temporary mitigation: reduce app worker count or pool size to lower DB pressure.
- Check for leaked sessions in background jobs, scripts, or request handlers.
- If using a proxy or pooler, verify its limits separately from the main database.
Useful commands:
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'
getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOST
nc -vz $DB_HOST ${DB_PORT:-5432}
telnet $DB_HOST ${DB_PORT:-5432}
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"
docker ps
docker logs <app_container> --tail 200
docker logs <db_container> --tail 200
docker inspect <app_container>
docker exec -it <app_container> sh
systemctl status postgresql
systemctl status mysql
ss -ltnp | grep -E '5432|3306'
sudo ufw status
journalctl -u gunicorn -n 200 --no-pager
journalctl -u postgresql -n 200 --no-pager
journalctl -u mysql -n 200 --no-pagerChecklist
- ✓ Database host resolves from the app runtime environment
- ✓ Configured port is reachable from the app runtime environment
- ✓ Credentials work with native DB client
- ✓ Database is running and accepting external or private-network connections
- ✓ Firewall, security groups, and allowlists permit traffic from the app
- ✓ TLS/SSL settings match provider requirements
- ✓ Connection pool size is reasonable for database
max_connections - ✓ App and worker processes use the same correct database config
- ✓ Migrations completed successfully
- ✓ Logs confirm a successful test query after the fix
For broader release validation, use the SaaS Production Checklist.
Related guides
- Debugging Production Issues
- SaaS Production Checklist
- Error Tracking with Sentry
- Environment Setup on VPS
- App Crashes on Deployment
FAQ
Why am I getting connection refused?
The database host and port are reachable, but nothing is listening there or the service is bound to a different interface. Check DB process status, listening sockets, and host/port configuration.
What does connection timeout usually mean?
The app cannot complete the TCP connection in time. Common causes are firewall rules, wrong hostname, private network issues, provider allowlists, or a heavily overloaded database.
Why does authentication fail even though the password looks correct?
The running app may be using a different secret than expected, the user may not have permission for that host or database, or special characters in the connection string may need URL encoding.
How do I fix database errors in Docker Compose?
Use the database service name as DB_HOST, ensure both services share a network, add health checks or retry logic, and verify the app is not trying to connect before the database is ready.
Can too many connections cause random 500 errors?
Yes. When the pool is exhausted or the database reaches max_connections, requests can fail intermittently. Reduce pool size, find slow queries, and close leaked sessions.
Why does the app connect locally but fail in production?
Production differs in hostnames, firewalls, SSL requirements, network topology, startup ordering, and environment variable loading. Verify from the real runtime environment.
Why does localhost fail in Docker?
Inside a container, localhost refers to that container itself. Use the database service name or the external database hostname.
How do I know if this is a credential issue or a network issue?
If TCP connection fails, it is usually network or service reachability. If TCP works but DB client login fails, it is authentication, authorization, or TLS configuration.
Should I add wait-for-db scripts?
They help with startup timing, but they do not fix bad credentials, firewall rules, DNS errors, or wrong DB hosts. Use them only as a short-term readiness mitigation.
Final takeaway
Debug database connection errors in this order:
- config
- DNS
- TCP reachability
- authentication
- TLS
- pool and load behavior
Always test from the actual runtime environment using the same credentials as the app.
Most production failures come from wrong env vars, blocked network access, SSL mismatches, Docker hostname mistakes, or exhausted connections, not from the ORM alone.