Monitoring Checklist — SaaS Builder Playbooks

Use this checklist to verify your SaaS has the minimum monitoring needed to detect failures, debug issues fast, and respond before users report problems. This page is built for small deployments on VPS, Docker, or simple cloud setups.

Quick Fix / Quick Setup

text

Monitoring baseline quick setup

1. Error tracking
- Install Sentry in app backend
- Set SENTRY_DSN in production
- Verify one test exception reaches dashboard

2. Uptime checks
- Add external checks for:
  - homepage or app URL
  - /health endpoint
  - API base endpoint
- Alert to email or Slack

3. Structured logs
- Send app logs to stdout or journald
- Capture Nginx/Gunicorn logs
- Include request_id, user_id, path, status_code

4. Metrics
- Track at minimum:
  - CPU
  - memory
  - disk
  - restart count
  - request latency
  - error rate
  - DB connections

5. Alerts
- Alert on:
  - uptime failure
  - 5xx spike
  - app crash/restart loop
  - disk > 85%
  - memory > 90%
  - queue backlog growth

6. Runbook
- Document where to check:
  - app logs
  - web server logs
  - Sentry
  - uptime monitor
  - database status
  - recent deploys

If you only do five things today: enable Sentry, create a /health endpoint, add uptime checks, centralize logs, and configure alerts for 5xx errors and server resource exhaustion.

What’s happening

Monitoring is your minimum production visibility layer.

Without it, failures are usually discovered by users first. That creates slower incident response, longer outages, and poor debugging after deploys.

For a small SaaS, the baseline is not full observability. It is enough signal to answer these questions fast:

Is the app up?
Is the app healthy?
Are users getting errors?
Did the latest deploy break something?
Is the server running out of memory, disk, or CPU?
Are background jobs stuck?
Where do I look first during an incident?

This checklist covers:

Application error tracking
Server and process health
HTTP uptime and endpoint checks
Performance and latency baselines
Database visibility
Background job visibility
Actionable alerts
Incident response readiness

Step-by-step implementation

1. Create a health endpoint

Create a fast endpoint that reports basic app readiness and dependency status.

Minimum requirements:

returns 200 when healthy
checks database connectivity
includes version or commit SHA
avoids expensive queries

Example JSON response:

json

{
  "status": "ok",
  "service": "app",
  "version": "git-sha-or-release-id",
  "database": "ok",
  "timestamp": "2026-04-20T12:00:00Z"
}

Basic test:

bash

curl -i https://yourdomain.com/health
curl -sS https://yourdomain.com/health | jq .

If your health endpoint includes dependency checks, keep them lightweight. Do not turn it into a slow diagnostic page.

2. Add external uptime checks

Monitor from outside your infrastructure.

At minimum, check:

main app URL
/health endpoint
API base endpoint if separate

Example targets:

text

https://yourdomain.com/
https://yourdomain.com/health
https://api.yourdomain.com/health

Recommended alert routing:

email for solo operators
Slack for active team visibility
pager tool if this is revenue-critical

Use consecutive failure thresholds to avoid noisy alerts.

3. Install error tracking

Use app-level exception tracking such as Sentry.

Backend checklist:

install SDK
set SENTRY_DSN in production
set environment to production
set release/version tag
send one test exception

Example env:

env

SENTRY_DSN=https://examplePublicKey@o0.ingest.sentry.io/0
SENTRY_ENVIRONMENT=production
SENTRY_RELEASE=git-sha-or-version

Verify with a deliberate test error after deploy.

If you have a browser app, add frontend error tracking too.

See:

Error Tracking with Sentry

4. Standardize logs

Use one consistent log format across app, proxy, and workers.

Include:

timestamp
severity
request_id
user_id if available
path
method
status_code
latency
exception name and stack trace

Preferred outputs:

stdout for containers
journald or systemd-managed services for VPS
reverse proxy access and error logs retained separately

Example JSON log line:

json

{
  "ts": "2026-04-20T12:00:00Z",
  "level": "error",
  "request_id": "req_123",
  "user_id": "user_456",
  "method": "POST",
  "path": "/api/orders",
  "status_code": 500,
  "message": "database timeout"
}

You need access to all of these log sources:

application logs
Nginx logs
Gunicorn or app server logs
worker logs
scheduler or cron logs

See:

Logging Setup (Application + Server)

5. Make logs easy to inspect

For systemd-based VPS:

bash

systemctl status nginx
systemctl status gunicorn
systemctl status celery

journalctl -u nginx -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager

For Docker:

bash

docker ps
docker logs --tail=200 <container_name>
docker stats --no-stream

For Nginx file logs:

bash

grep ' 5[0-9][0-9] ' /var/log/nginx/access.log | tail -n 50
tail -n 200 /var/log/nginx/error.log
nginx -t

6. Track infrastructure metrics

At minimum, capture:

CPU
memory
disk
load
network
process restarts
container restarts

Useful commands for direct inspection:

bash

df -h
free -m
top
htop
uptime
ss -tulpn
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head

For a small SaaS, lightweight options are enough:

Netdata
node_exporter with a basic dashboard
platform-provided metrics on managed services

7. Track app metrics

Add app-level metrics where possible:

request count
p95 latency
4xx rate
5xx rate
queue depth
failed jobs
DB pool usage
cache hit rate if used

If you cannot instrument everything now, prioritize:

request latency
5xx rate
worker failures
DB connections

8. Monitor database health

You do not need full DBA tooling for an MVP, but you need to know if the database is unavailable or exhausted.

Track:

database reachable/unreachable
connection usage
slow queries if supported
storage growth
replication status if relevant

Health endpoint should verify basic DB connectivity, but alerts should also exist outside the app if possible.

9. Monitor background jobs and workers

If you use queues, cron jobs, or workers, monitor them separately.

Track:

worker process count
failed jobs
queue depth
oldest job age
restart loops

If web requests are healthy but async jobs are failing, users still experience production issues.

10. Configure alerts

Set threshold-based alerts for the first set of real failure conditions.

Alert on:

main site down
health endpoint failing multiple times
5xx burst
app crash or restart loop
disk usage above 85%
memory above 90%
DB unavailable
DB connection exhaustion
queue backlog growth
webhook failures for critical integrations

Avoid low-value noisy alerts until you know your baseline.

See:

11. Add deploy markers

Tag monitoring data with version or release metadata.

Add:

commit SHA in health endpoint
release tag in Sentry
deploy timestamp in logs
release note marker in monitoring dashboards if supported

This reduces time-to-root-cause after regressions.

12. Create an incident runbook

Document exactly where to look first.

Minimum runbook sections:

uptime monitor URL
Sentry project URL
dashboard URL
app log command
Nginx log command
worker log command
rollback command or deploy command
database status page or admin access path
owner contact details

See:

Incident Response Playbook

13. Test the full monitoring chain

Force one event for each path:

one application exception
one failed health check
one alert delivery test

If possible, verify:

event reaches Sentry
uptime tool marks endpoint failed
alert arrives in email/Slack
logs show the event
version tag is visible

14. Review alert noise weekly

For small deployments, alert quality matters more than alert quantity.

Each week, review:

duplicate alerts
false positives
alerts with no clear owner
thresholds too strict or too loose
missing checks discovered during recent incidents

Common causes

Common monitoring gaps:

No external uptime check configured, so outages are discovered by users first.
Health endpoint exists but does not verify database or dependency health.
Error tracking installed only in development or missing DSN in production.
Logs are split across app, proxy, and workers with no consistent access path.
Alerts are too noisy, so important notifications are ignored.
No monitoring for background workers, scheduled jobs, or queue backlog.
No visibility into disk usage, causing failures when logs or uploads fill storage.
Deploys are not tagged in monitoring tools, making regressions hard to trace.
Metrics are collected but no thresholds or alerts are defined.
Monitoring is configured once and never tested after infrastructure changes.

Debugging tips

Use these commands during setup and incident response.

bash

curl -i https://yourdomain.com/health
curl -sS https://yourdomain.com/health | jq .
systemctl status nginx
systemctl status gunicorn
systemctl status celery
journalctl -u nginx -n 200 --no-pager
journalctl -u gunicorn -n 200 --no-pager
journalctl -u celery -n 200 --no-pager
docker ps
docker logs --tail=200 <container_name>
docker stats --no-stream
df -h
free -m
top
htop
uptime
ss -tulpn
ps aux --sort=-%mem | head
ps aux --sort=-%cpu | head
nginx -t
curl -I https://yourdomain.com
grep ' 5[0-9][0-9] ' /var/log/nginx/access.log | tail -n 50
tail -n 200 /var/log/nginx/error.log

Fast triage order:

Check external uptime status
Hit /health
Inspect recent deploys/releases
Check app logs
Check error tracker
Check Nginx and worker logs
Check CPU, memory, disk
Check DB connectivity and queue backlog

alert fired

check uptime

inspect logs

inspect Sentry

check recent deploy

rollback or hotfix

Flowchart

Checklist

Use this before launch and after infrastructure changes.

Checklist

✓ Health endpoint exists and is reachable without authentication or with controlled access.
✓ External uptime checks are configured for the main app and API.
✓ Error tracking is installed and a test event has been verified.
✓ Application logs are structured and retained.
✓ Nginx, Gunicorn, Docker, systemd, or platform logs are accessible.
✓ Server metrics are visible in one dashboard.
✓ Request latency and 5xx rates are tracked.
✓ Database health and connection usage are monitored.
✓ Background jobs and queue backlog are monitored.
✓ Alert channels are configured and tested.
✓ Recent deploy version or commit SHA is visible in logs or monitoring tools.
✓ A documented incident response path exists.
✓ On-call or owner contact details are current.
✓ Log retention and privacy rules are defined.
✓ Monitoring has been tested after the latest deployment.

Common setup patterns for small SaaS

VPS setup: UptimeRobot or Better Stack for uptime, Sentry for errors, journald plus Nginx logs for logs, and a lightweight metrics agent such as Netdata or node_exporter.
Docker setup: container stdout logs, Docker restart monitoring, cAdvisor or platform metrics, Sentry for app exceptions, and external uptime checks.
Managed platform setup: use platform logs and metrics, still add external uptime checks, and keep app-level error tracking enabled.
Background worker setup: monitor worker process count, failed jobs, queue depth, and oldest queued job age.

What to alert on first

Main site down or health endpoint failing for multiple consecutive checks.
Burst of 5xx responses over a short window.
App process crashing or restarting repeatedly.
Disk usage approaching full capacity.
Memory exhaustion or swap thrashing.
Database connection exhaustion or DB unavailable.
Job queue backlog increasing without recovery.
Webhook endpoint failure spikes for payments or integrations.

Browser

Nginx

App

GET /

Proxy Request

Query Data

Data Result

HTML Response

Render Page

Diagram: request flow from user to Nginx to app to DB with monitoring touchpoints marked.

Related guides

FAQ

What is the minimum monitoring stack for an MVP SaaS?

At minimum: external uptime checks, a health endpoint, structured app and server logs, error tracking such as Sentry, and basic server metrics with alerts for downtime, 5xx spikes, memory, and disk usage.

Should I monitor both the homepage and a health endpoint?

Yes. The homepage confirms user-facing availability. The health endpoint confirms application readiness and can include dependency checks such as database connectivity.

Do I need full observability tooling for a small SaaS?

No. Start with simple monitoring that you will actually check and maintain. Add tracing and advanced metrics only when traffic, complexity, or team size justifies it.

How often should I test alerts?

Test after initial setup, after major deployment or infrastructure changes, and on a recurring schedule such as monthly. Untested alerts are unreliable.

What should a health endpoint return?

A 200 response for healthy state, optional JSON with service name, version, timestamp, and dependency checks. Keep it fast and avoid expensive operations.

How long should logs be retained?

Enough to investigate incidents and regressions. For small SaaS products, start with at least 7 to 30 days depending on cost, legal requirements, and traffic volume.

Final takeaway

Monitoring is not one tool. You need coverage across uptime, logs, exceptions, metrics, and alerts.

For a small SaaS, the minimum production standard is simple:

health endpoint
uptime checks
structured logs
error tracking
server metrics
tested alerts

If an issue happens and you do not know where to look in the first two minutes, the monitoring setup is incomplete.