Database Connection Errors

The essential playbook for implementing database connection errors in your SaaS.

Use this page when your app cannot connect to the database, shows connection refused or timeout errors, fails after deployment, or intermittently loses DB access in production. The goal is to isolate whether the failure is caused by credentials, network reachability, TLS, connection limits, container networking, or application configuration.

Quick Fix / Quick Setup

Run these checks from the same VM or container where the app is running.

bash
# 1) Verify env vars actually loaded in the running process
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'

# 2) Test raw TCP reachability to the database host
nc -vz $DB_HOST ${DB_PORT:-5432}

# 3) Test login directly with the DB client
# Postgres
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'

# MySQL
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"

# 4) If using Docker, confirm service name and network
docker ps
docker network ls
docker inspect <app_container>

# 5) Restart app only after verifying DB is reachable
systemctl restart gunicorn || docker compose restart app

If direct DB client access fails from the same host or container, this is not an ORM bug. Fix network, credentials, firewall, TLS, or database availability first.

What’s happening

A database connection error means the app failed at one of these layers:

  1. Config load
  2. DNS resolution
  3. TCP socket connection
  4. TLS negotiation
  5. Authentication
  6. Database selection
  7. Connection pool checkout

Typical production error patterns:

  • connection refused
  • connection timed out
  • no route to host
  • password authentication failed
  • database does not exist
  • could not translate host name
  • SSL required
  • too many connections
  • pool timeout or checkout timeout errors

Important runtime differences:

  • In Docker, localhost points to the current container, not the database container.
  • In VPS deployments, firewall rules, bind addresses, and provider allowlists commonly block access.
  • In managed databases, SSL mode and client IP allowlists are frequent failure points.
  • Under load, connection pool exhaustion can appear as intermittent application 500s.
error string
DNS
TCP
auth
TLS
pool exhaustion
DB server limits

Process Flow

Step-by-step implementation

1) Extract the exact database error

Do not debug a generic 500. Pull the actual database exception from application logs.

bash
# systemd app
journalctl -u gunicorn -n 200 --no-pager

# Docker app
docker logs <app_container> --tail 200

Also check the database logs:

bash
# Postgres
journalctl -u postgresql -n 200 --no-pager

# MySQL
journalctl -u mysql -n 200 --no-pager

# Docker DB
docker logs <db_container> --tail 200

Look for messages that clearly indicate one category:

  • name resolution failure
  • refused connection
  • auth failure
  • SSL requirement
  • max connections
  • pool timeout

2) Verify runtime configuration

Check what the running process actually sees.

bash
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'

Common issues:

  • old values still loaded in systemd
  • stale .env file in Docker Compose
  • CI/CD secret updated but app not restarted
  • app reading DATABASE_URL while you changed only DB_*
  • password contains special characters and breaks URL parsing

If using a connection URL, verify encoding. Example:

text
postgresql://user:pa%24%24word@db.example.com:5432/appdb

Not:

text
postgresql://user:pa$$word@db.example.com:5432/appdb

3) Test DNS resolution from the app runtime

bash
getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOST

If this fails:

  • wrong hostname
  • internal DNS issue
  • stale DNS after infrastructure change
  • service name mismatch in Docker Compose

For Docker Compose, the host should often be the service name:

yaml
services:
  app:
    environment:
      DB_HOST: db

  db:
    image: postgres:16

Not localhost.

4) Test raw TCP connectivity

bash
nc -vz $DB_HOST ${DB_PORT:-5432}
# or
telnet $DB_HOST ${DB_PORT:-5432}

Interpretation:

  • succeeded -> network path works, move to auth/TLS
  • connection refused -> host reachable, port closed or service not listening
  • timeout -> firewall, security group, routing, allowlist, wrong IP, overloaded DB
  • name resolution error -> DNS problem

If the app runs in Docker, enter the container and test from there:

bash
docker exec -it <app_container> sh
nc -vz $DB_HOST ${DB_PORT:-5432}

5) Test direct database authentication

Postgres

bash
PGPASSWORD="$DB_PASSWORD" psql \
  -h "$DB_HOST" \
  -p "${DB_PORT:-5432}" \
  -U "$DB_USER" \
  -d "$DB_NAME" \
  -c 'select 1;'

MySQL

bash
mysql \
  -h "$DB_HOST" \
  -P "${DB_PORT:-3306}" \
  -u "$DB_USER" \
  -p"$DB_PASSWORD" \
  -e 'select 1' \
  "$DB_NAME"

If TCP works but login fails, focus on:

  • wrong username or password
  • wrong database name
  • host-based access rules
  • missing grants
  • SSL mode mismatch
  • stale secret in runtime environment

6) Check database server health

Service status

bash
systemctl status postgresql
systemctl status mysql

Listening sockets

bash
ss -ltnp | grep -E '5432|3306'

You want to confirm the DB is listening on the expected interface.

Examples:

  • 127.0.0.1:5432 only -> reachable only locally on that host
  • 0.0.0.0:3306 or private IP -> reachable externally depending on firewall rules

7) Validate server-side access configuration

Postgres

Check listen_addresses:

conf
# postgresql.conf
listen_addresses = '*'
port = 5432

Check host rules in pg_hba.conf:

conf
# TYPE  DATABASE   USER      ADDRESS           METHOD
host    appdb      appuser   10.0.0.0/24       scram-sha-256

Reload after changes:

bash
sudo systemctl reload postgresql

MySQL

Check bind address:

conf
[mysqld]
bind-address = 0.0.0.0
port = 3306

Verify grants:

sql
SHOW GRANTS FOR 'appuser'@'%';

Grant example:

sql
GRANT ALL PRIVILEGES ON appdb.* TO 'appuser'@'10.%' IDENTIFIED BY 'strong-password';
FLUSH PRIVILEGES;

For managed DBs, use provider network allowlists instead of opening public access broadly.

8) Verify firewall and network policy

On VPS:

bash
sudo ufw status

Cloud platforms may also require security group or network rule updates.

Check that the app server IP is allowed to reach the DB port:

  • Postgres: 5432
  • MySQL: 3306

If managed DB access works locally but not from production, this is often an allowlist problem.

9) Check TLS / SSL mode

Managed providers often require SSL. If your app disables it, auth may fail or the connection may be rejected.

Examples:

SQLAlchemy Postgres URL

bash
DATABASE_URL="postgresql+psycopg://appuser:password@db.example.com:5432/appdb?sslmode=require"

psycopg connect args

python
import os
from sqlalchemy import create_engine

engine = create_engine(
    os.environ["DATABASE_URL"],
    pool_pre_ping=True,
)

MySQL SQLAlchemy URL

bash
DATABASE_URL="mysql+pymysql://appuser:password@db.example.com:3306/appdb?ssl=true"

If your provider requires a CA bundle, pass it explicitly according to your driver.

10) Fix Docker and Compose networking

Common working Compose example:

yaml
services:
  app:
    build: .
    environment:
      DB_HOST: db
      DB_PORT: 5432
      DB_NAME: appdb
      DB_USER: appuser
      DB_PASSWORD: secret
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: appdb
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      interval: 5s
      timeout: 5s
      retries: 10

Checks:

bash
docker ps
docker network ls
docker inspect <app_container>
docker inspect <db_container>

Do not assume depends_on alone solves readiness unless you also use health checks or app retry logic.

11) Check pool exhaustion and max connections

If errors only appear under traffic spikes, inspect connection counts and pool settings.

Typical symptoms:

  • app works after restart, then fails again under load
  • random 500s
  • timeout waiting for connection from pool
  • DB reports too many connections

SQLAlchemy example

python
from sqlalchemy import create_engine

engine = create_engine(
    DATABASE_URL,
    pool_size=5,
    max_overflow=10,
    pool_timeout=30,
    pool_recycle=1800,
    pool_pre_ping=True,
)

Guidelines for small SaaS apps:

  • keep pool size conservative
  • multiply pool size by total worker/process count
  • ensure total possible app connections stay below DB max_connections
  • enable pool_pre_ping=True for stale dropped connections

Example planning:

  • 2 Gunicorn workers
  • pool_size=5
  • max_overflow=5

Potential peak app connections: 2 * (5 + 5) = 20

If the DB only supports 25 total connections and background jobs also use the DB, this can fail quickly.

12) Add startup retry only after fixing root cause

If the app starts before the DB is ready, a short retry loop helps. It does not replace correct networking or credentials.

Example shell entrypoint:

bash
#!/usr/bin/env sh
set -e

until nc -z "$DB_HOST" "${DB_PORT:-5432}"; do
  echo "waiting for db..."
  sleep 2
done

exec gunicorn app:app --bind 0.0.0.0:8000

Prefer application-level retry for transient startup conditions.

13) Verify migrations are not blocking startup

If your deploy runs migrations on boot, a failing migration can make the app look like it has a plain DB connectivity problem.

Check migration logs and run manually if needed.

See also: Database Migration Strategy

14) Validate the fix end-to-end

After changes:

  1. restart the app
  2. run a simple read query
  3. test one write path
  4. verify logs stop showing connection failures

Example:

bash
systemctl restart gunicorn
journalctl -u gunicorn -n 100 --no-pager

Or:

bash
docker compose restart app
docker logs <app_container> --tail 100

Common causes

  • Wrong DATABASE_URL or mismatched DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, or DB_NAME
  • Using localhost inside a container when the database runs elsewhere
  • Database service is down, crashed, restarting, or still booting
  • Firewall, security group, or provider allowlist blocks traffic
  • Database listens only on 127.0.0.1
  • Incorrect pg_hba.conf rules in Postgres
  • Missing MySQL host grants
  • SSL/TLS required by provider but disabled in app
  • App pool exhaustion or DB max_connections reached
  • DNS resolution failure after infra changes
  • Secret rotation happened but app still uses old values
  • PgBouncer or pooler mode incompatible with ORM behavior
  • Long-running transactions causing connection starvation

Debugging tips

  • Compare the exact production connection string with the one that works locally.
  • Run all network and DB-client tests from the app container or VM, not your laptop.
  • Match timestamps between app logs and DB logs.
  • If failures only happen during load, inspect pool settings, active connections, and slow queries.
  • In Docker Compose, confirm the resolved environment values and service names.
  • If using managed Postgres or MySQL, verify sslmode=require or required CA settings.
  • Diff the latest deploy for changes in secrets, image tag, migration command, or worker count.
  • Temporary mitigation: reduce app worker count or pool size to lower DB pressure.
  • Check for leaked sessions in background jobs, scripts, or request handlers.
  • If using a proxy or pooler, verify its limits separately from the main database.

Useful commands:

bash
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'
getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOST
nc -vz $DB_HOST ${DB_PORT:-5432}
telnet $DB_HOST ${DB_PORT:-5432}
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"
docker ps
docker logs <app_container> --tail 200
docker logs <db_container> --tail 200
docker inspect <app_container>
docker exec -it <app_container> sh
systemctl status postgresql
systemctl status mysql
ss -ltnp | grep -E '5432|3306'
sudo ufw status
journalctl -u gunicorn -n 200 --no-pager
journalctl -u postgresql -n 200 --no-pager
journalctl -u mysql -n 200 --no-pager

Checklist

  • Database host resolves from the app runtime environment
  • Configured port is reachable from the app runtime environment
  • Credentials work with native DB client
  • Database is running and accepting external or private-network connections
  • Firewall, security groups, and allowlists permit traffic from the app
  • TLS/SSL settings match provider requirements
  • Connection pool size is reasonable for database max_connections
  • App and worker processes use the same correct database config
  • Migrations completed successfully
  • Logs confirm a successful test query after the fix

For broader release validation, use the SaaS Production Checklist.

Related guides

FAQ

Why am I getting connection refused?

The database host and port are reachable, but nothing is listening there or the service is bound to a different interface. Check DB process status, listening sockets, and host/port configuration.

What does connection timeout usually mean?

The app cannot complete the TCP connection in time. Common causes are firewall rules, wrong hostname, private network issues, provider allowlists, or a heavily overloaded database.

Why does authentication fail even though the password looks correct?

The running app may be using a different secret than expected, the user may not have permission for that host or database, or special characters in the connection string may need URL encoding.

How do I fix database errors in Docker Compose?

Use the database service name as DB_HOST, ensure both services share a network, add health checks or retry logic, and verify the app is not trying to connect before the database is ready.

Can too many connections cause random 500 errors?

Yes. When the pool is exhausted or the database reaches max_connections, requests can fail intermittently. Reduce pool size, find slow queries, and close leaked sessions.

Why does the app connect locally but fail in production?

Production differs in hostnames, firewalls, SSL requirements, network topology, startup ordering, and environment variable loading. Verify from the real runtime environment.

Why does localhost fail in Docker?

Inside a container, localhost refers to that container itself. Use the database service name or the external database hostname.

How do I know if this is a credential issue or a network issue?

If TCP connection fails, it is usually network or service reachability. If TCP works but DB client login fails, it is authentication, authorization, or TLS configuration.

Should I add wait-for-db scripts?

They help with startup timing, but they do not fix bad credentials, firewall rules, DNS errors, or wrong DB hosts. Use them only as a short-term readiness mitigation.

Final takeaway

Debug database connection errors in this order:

  1. config
  2. DNS
  3. TCP reachability
  4. authentication
  5. TLS
  6. pool and load behavior

Always test from the actual runtime environment using the same credentials as the app.

Most production failures come from wrong env vars, blocked network access, SSL mismatches, Docker hostname mistakes, or exhausted connections, not from the ORM alone.