Database Connection Errors — SaaS Builder Playbooks

Use this page when your app cannot connect to the database, shows connection refused or timeout errors, fails after deployment, or intermittently loses DB access in production. The goal is to isolate whether the failure is caused by credentials, network reachability, TLS, connection limits, container networking, or application configuration.

Quick Fix / Quick Setup

Run these checks from the same VM or container where the app is running.

bash

# 1) Verify env vars actually loaded in the running process
printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'

# 2) Test raw TCP reachability to the database host
nc -vz $DB_HOST ${DB_PORT:-5432}

# 3) Test login directly with the DB client
# Postgres
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'

# MySQL
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"

# 4) If using Docker, confirm service name and network
docker ps
docker network ls
docker inspect <app_container>

# 5) Restart app only after verifying DB is reachable
systemctl restart gunicorn || docker compose restart app

If direct DB client access fails from the same host or container, this is not an ORM bug. Fix network, credentials, firewall, TLS, or database availability first.

What’s happening

A database connection error means the app failed at one of these layers:

Config load
DNS resolution
TCP socket connection
TLS negotiation
Authentication
Database selection
Connection pool checkout

Typical production error patterns:

connection refused
connection timed out
no route to host
password authentication failed
database does not exist
could not translate host name
SSL required
too many connections
pool timeout or checkout timeout errors

Important runtime differences:

In Docker, localhost points to the current container, not the database container.
In VPS deployments, firewall rules, bind addresses, and provider allowlists commonly block access.
In managed databases, SSL mode and client IP allowlists are frequent failure points.
Under load, connection pool exhaustion can appear as intermittent application 500s.

error string

DNS

TCP

auth

TLS

pool exhaustion

DB server limits

Process Flow

Step-by-step implementation

1) Extract the exact database error

Do not debug a generic 500. Pull the actual database exception from application logs.

bash

# systemd app
journalctl -u gunicorn -n 200 --no-pager

# Docker app
docker logs <app_container> --tail 200

Also check the database logs:

bash

# Postgres
journalctl -u postgresql -n 200 --no-pager

# MySQL
journalctl -u mysql -n 200 --no-pager

# Docker DB
docker logs <db_container> --tail 200

Look for messages that clearly indicate one category:

name resolution failure
refused connection
auth failure
SSL requirement
max connections
pool timeout

2) Verify runtime configuration

Check what the running process actually sees.

bash

printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'

Common issues:

old values still loaded in systemd
stale .env file in Docker Compose
CI/CD secret updated but app not restarted
app reading DATABASE_URL while you changed only DB_*
password contains special characters and breaks URL parsing

If using a connection URL, verify encoding. Example:

text

postgresql://user:pa%24%24word@db.example.com:5432/appdb

Not:

text

postgresql://user:pa$$word@db.example.com:5432/appdb

3) Test DNS resolution from the app runtime

bash

getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOST

If this fails:

wrong hostname
internal DNS issue
stale DNS after infrastructure change
service name mismatch in Docker Compose

For Docker Compose, the host should often be the service name:

yaml

services:
  app:
    environment:
      DB_HOST: db

  db:
    image: postgres:16

Not localhost.

4) Test raw TCP connectivity

bash

nc -vz $DB_HOST ${DB_PORT:-5432}
# or
telnet $DB_HOST ${DB_PORT:-5432}

Interpretation:

succeeded -> network path works, move to auth/TLS
connection refused -> host reachable, port closed or service not listening
timeout -> firewall, security group, routing, allowlist, wrong IP, overloaded DB
name resolution error -> DNS problem

If the app runs in Docker, enter the container and test from there:

bash

docker exec -it <app_container> sh
nc -vz $DB_HOST ${DB_PORT:-5432}

5) Test direct database authentication

Postgres

bash

PGPASSWORD="$DB_PASSWORD" psql \
  -h "$DB_HOST" \
  -p "${DB_PORT:-5432}" \
  -U "$DB_USER" \
  -d "$DB_NAME" \
  -c 'select 1;'

MySQL

bash

mysql \
  -h "$DB_HOST" \
  -P "${DB_PORT:-3306}" \
  -u "$DB_USER" \
  -p"$DB_PASSWORD" \
  -e 'select 1' \
  "$DB_NAME"

If TCP works but login fails, focus on:

wrong username or password
wrong database name
host-based access rules
missing grants
SSL mode mismatch
stale secret in runtime environment

6) Check database server health

Service status

bash

systemctl status postgresql
systemctl status mysql

Listening sockets

bash

ss -ltnp | grep -E '5432|3306'

You want to confirm the DB is listening on the expected interface.

Examples:

127.0.0.1:5432 only -> reachable only locally on that host
0.0.0.0:3306 or private IP -> reachable externally depending on firewall rules

7) Validate server-side access configuration

Postgres

Check listen_addresses:

conf

# postgresql.conf
listen_addresses = '*'
port = 5432

Check host rules in pg_hba.conf:

conf

# TYPE  DATABASE   USER      ADDRESS           METHOD
host    appdb      appuser   10.0.0.0/24       scram-sha-256

Reload after changes:

bash

sudo systemctl reload postgresql

MySQL

Check bind address:

conf

[mysqld]
bind-address = 0.0.0.0
port = 3306

Verify grants:

sql

SHOW GRANTS FOR 'appuser'@'%';

Grant example:

sql

GRANT ALL PRIVILEGES ON appdb.* TO 'appuser'@'10.%' IDENTIFIED BY 'strong-password';
FLUSH PRIVILEGES;

For managed DBs, use provider network allowlists instead of opening public access broadly.

8) Verify firewall and network policy

On VPS:

bash

sudo ufw status

Cloud platforms may also require security group or network rule updates.

Check that the app server IP is allowed to reach the DB port:

Postgres: 5432
MySQL: 3306

If managed DB access works locally but not from production, this is often an allowlist problem.

9) Check TLS / SSL mode

Managed providers often require SSL. If your app disables it, auth may fail or the connection may be rejected.

Examples:

SQLAlchemy Postgres URL

bash

DATABASE_URL="postgresql+psycopg://appuser:password@db.example.com:5432/appdb?sslmode=require"

psycopg connect args

python

import os
from sqlalchemy import create_engine

engine = create_engine(
    os.environ["DATABASE_URL"],
    pool_pre_ping=True,
)

MySQL SQLAlchemy URL

bash

DATABASE_URL="mysql+pymysql://appuser:password@db.example.com:3306/appdb?ssl=true"

If your provider requires a CA bundle, pass it explicitly according to your driver.

10) Fix Docker and Compose networking

Common working Compose example:

yaml

services:
  app:
    build: .
    environment:
      DB_HOST: db
      DB_PORT: 5432
      DB_NAME: appdb
      DB_USER: appuser
      DB_PASSWORD: secret
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: appdb
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      interval: 5s
      timeout: 5s
      retries: 10

Checks:

bash

docker ps
docker network ls
docker inspect <app_container>
docker inspect <db_container>

Do not assume depends_on alone solves readiness unless you also use health checks or app retry logic.

11) Check pool exhaustion and max connections

If errors only appear under traffic spikes, inspect connection counts and pool settings.

Typical symptoms:

app works after restart, then fails again under load
random 500s
timeout waiting for connection from pool
DB reports too many connections

SQLAlchemy example

python

from sqlalchemy import create_engine

engine = create_engine(
    DATABASE_URL,
    pool_size=5,
    max_overflow=10,
    pool_timeout=30,
    pool_recycle=1800,
    pool_pre_ping=True,
)

Guidelines for small SaaS apps:

keep pool size conservative
multiply pool size by total worker/process count
ensure total possible app connections stay below DB max_connections
enable pool_pre_ping=True for stale dropped connections

Example planning:

2 Gunicorn workers
pool_size=5
max_overflow=5

Potential peak app connections: 2 * (5 + 5) = 20

If the DB only supports 25 total connections and background jobs also use the DB, this can fail quickly.

12) Add startup retry only after fixing root cause

If the app starts before the DB is ready, a short retry loop helps. It does not replace correct networking or credentials.

Example shell entrypoint:

bash

#!/usr/bin/env sh
set -e

until nc -z "$DB_HOST" "${DB_PORT:-5432}"; do
  echo "waiting for db..."
  sleep 2
done

exec gunicorn app:app --bind 0.0.0.0:8000

Prefer application-level retry for transient startup conditions.

13) Verify migrations are not blocking startup

If your deploy runs migrations on boot, a failing migration can make the app look like it has a plain DB connectivity problem.

Check migration logs and run manually if needed.

14) Validate the fix end-to-end

After changes:

restart the app
run a simple read query
test one write path
verify logs stop showing connection failures

Example:

bash

systemctl restart gunicorn
journalctl -u gunicorn -n 100 --no-pager

Or:

bash

docker compose restart app
docker logs <app_container> --tail 100

Common causes

Wrong DATABASE_URL or mismatched DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, or DB_NAME
Using localhost inside a container when the database runs elsewhere
Database service is down, crashed, restarting, or still booting
Firewall, security group, or provider allowlist blocks traffic
Database listens only on 127.0.0.1
Incorrect pg_hba.conf rules in Postgres
Missing MySQL host grants
SSL/TLS required by provider but disabled in app
App pool exhaustion or DB max_connections reached
DNS resolution failure after infra changes
Secret rotation happened but app still uses old values
PgBouncer or pooler mode incompatible with ORM behavior
Long-running transactions causing connection starvation

Debugging tips

Compare the exact production connection string with the one that works locally.
Run all network and DB-client tests from the app container or VM, not your laptop.
Match timestamps between app logs and DB logs.
If failures only happen during load, inspect pool settings, active connections, and slow queries.
In Docker Compose, confirm the resolved environment values and service names.
If using managed Postgres or MySQL, verify sslmode=require or required CA settings.
Diff the latest deploy for changes in secrets, image tag, migration command, or worker count.
Temporary mitigation: reduce app worker count or pool size to lower DB pressure.
Check for leaked sessions in background jobs, scripts, or request handlers.
If using a proxy or pooler, verify its limits separately from the main database.

Useful commands:

bash

printenv | grep -E 'DATABASE_URL|DB_HOST|DB_PORT|DB_NAME|DB_USER|DB_PASSWORD'
getent hosts $DB_HOST
nslookup $DB_HOST
dig +short $DB_HOST
nc -vz $DB_HOST ${DB_PORT:-5432}
telnet $DB_HOST ${DB_PORT:-5432}
PGPASSWORD="$DB_PASSWORD" psql -h "$DB_HOST" -p "${DB_PORT:-5432}" -U "$DB_USER" -d "$DB_NAME" -c 'select 1;'
mysql -h "$DB_HOST" -P "${DB_PORT:-3306}" -u "$DB_USER" -p"$DB_PASSWORD" -e 'select 1' "$DB_NAME"
docker ps
docker logs <app_container> --tail 200
docker logs <db_container> --tail 200
docker inspect <app_container>
docker exec -it <app_container> sh
systemctl status postgresql
systemctl status mysql
ss -ltnp | grep -E '5432|3306'
sudo ufw status
journalctl -u gunicorn -n 200 --no-pager
journalctl -u postgresql -n 200 --no-pager
journalctl -u mysql -n 200 --no-pager

Checklist

✓ Database host resolves from the app runtime environment
✓ Configured port is reachable from the app runtime environment
✓ Credentials work with native DB client
✓ Database is running and accepting external or private-network connections
✓ Firewall, security groups, and allowlists permit traffic from the app
✓ TLS/SSL settings match provider requirements
✓ Connection pool size is reasonable for database max_connections
✓ App and worker processes use the same correct database config
✓ Migrations completed successfully
✓ Logs confirm a successful test query after the fix

For broader release validation, use the SaaS Production Checklist.

Related guides

FAQ

Why am I getting connection refused?

The database host and port are reachable, but nothing is listening there or the service is bound to a different interface. Check DB process status, listening sockets, and host/port configuration.

What does connection timeout usually mean?

The app cannot complete the TCP connection in time. Common causes are firewall rules, wrong hostname, private network issues, provider allowlists, or a heavily overloaded database.

Why does authentication fail even though the password looks correct?

The running app may be using a different secret than expected, the user may not have permission for that host or database, or special characters in the connection string may need URL encoding.

How do I fix database errors in Docker Compose?

Use the database service name as DB_HOST, ensure both services share a network, add health checks or retry logic, and verify the app is not trying to connect before the database is ready.

Can too many connections cause random 500 errors?

Yes. When the pool is exhausted or the database reaches max_connections, requests can fail intermittently. Reduce pool size, find slow queries, and close leaked sessions.

Why does the app connect locally but fail in production?

Production differs in hostnames, firewalls, SSL requirements, network topology, startup ordering, and environment variable loading. Verify from the real runtime environment.

Why does localhost fail in Docker?

Inside a container, localhost refers to that container itself. Use the database service name or the external database hostname.

How do I know if this is a credential issue or a network issue?

If TCP connection fails, it is usually network or service reachability. If TCP works but DB client login fails, it is authentication, authorization, or TLS configuration.

Should I add wait-for-db scripts?

They help with startup timing, but they do not fix bad credentials, firewall rules, DNS errors, or wrong DB hosts. Use them only as a short-term readiness mitigation.

Final takeaway

Debug database connection errors in this order:

config
DNS
TCP reachability
authentication
TLS
pool and load behavior

Always test from the actual runtime environment using the same credentials as the app.

Most production failures come from wrong env vars, blocked network access, SSL mismatches, Docker hostname mistakes, or exhausted connections, not from the ORM alone.