Uptime Monitoring Setup — SaaS Builder Playbooks

Uptime monitoring is the minimum production safety net. For a small SaaS, you need an external monitor that checks your app from outside your infrastructure, hits a stable health endpoint, and alerts you fast when the app is down or degraded.

This setup is enough for most MVPs and small production apps:

one public health endpoint
one external uptime provider
one alert channel you actually watch
checks for main domain, API domain, and SSL/TLS

If you do nothing else for production visibility, do this first.

Quick Fix / Quick Setup

Add a minimal public health endpoint and monitor it externally.

python

# 1) Add a minimal health endpoint

# FastAPI
from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/healthz")
def healthz():
    return JSONResponse({"status": "ok"}, status_code=200)

python

# Flask
from flask import Flask, jsonify

app = Flask(__name__)

@app.get("/healthz")
def healthz():
    return jsonify({"status": "ok"}), 200

Verify it from outside:

bash

curl -i https://yourdomain.com/healthz

Monitor these externally every 1 minute if possible:

text

https://yourdomain.com/healthz
https://www.yourdomain.com/healthz
https://api.yourdomain.com/healthz

Alert on:

non-200 response
timeout greater than 10s
SSL certificate problems
2+ consecutive failures

Use an external provider, not a cron job on the same VPS. Start with one public health endpoint, one alert channel, and multi-region checks if available.

What’s happening

Uptime monitoring checks whether your app is reachable over the public internet.

It catches issues that app logs alone may miss:

DNS failures
expired or invalid SSL certificates
reverse proxy misconfiguration
crashed app processes
bad deploys
broken routing
partial regional outages

A useful uptime setup should:

run outside your infrastructure
hit a lightweight endpoint
alert quickly
avoid false positives from one-off network blips

For a small SaaS, monitor at minimum:

main app domain
API domain
critical user-facing paths if they matter to revenue or auth flows

monitor

DNS

CDN/WAF

Nginx/load balancer

app

database

Process Flow

Step-by-step implementation

1) Define what "up" means

For most small SaaS deployments, basic uptime means:

DNS resolves
HTTPS works
reverse proxy accepts traffic
app responds successfully
routing is correct

This is what /healthz should confirm.

If you also want dependency checks like database access, expose a separate readiness endpoint such as /readyz.

2) Add a public `/healthz` endpoint

Keep it cheap and stable.

Good response:

json

{"status":"ok"}

Avoid in /healthz:

expensive queries
third-party API calls
full-page rendering
auth requirements
redirects

Example FastAPI:

python

from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/healthz")
def healthz():
    return JSONResponse({"status": "ok"}, status_code=200)

@app.get("/readyz")
def readyz():
    # Optional deeper check
    # Verify DB or queue only if needed
    return JSONResponse({"status": "ready"}, status_code=200)

Example Flask:

python

from flask import Flask, jsonify

app = Flask(__name__)

@app.get("/healthz")
def healthz():
    return jsonify({"status": "ok"}), 200

@app.get("/readyz")
def readyz():
    return jsonify({"status": "ready"}), 200

3) Expose the endpoint through your production stack

If you use Nginx in front of Gunicorn/Uvicorn, make sure /healthz is reachable through the public domain.

Example Nginx server block:

nginx

server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com api.yourdomain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com www.yourdomain.com api.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Test it externally:

bash

curl -i https://yourdomain.com/healthz
curl -i https://www.yourdomain.com/healthz
curl -i https://api.yourdomain.com/healthz

If your deployment stack is not stable yet, see Deploy SaaS with Nginx + Gunicorn.

4) Create external uptime checks

Use any external uptime provider that supports:

HTTPS checks
custom interval
SSL monitoring
multi-region checks
email or chat alerts

Recommended initial settings:

interval: 60 seconds
timeout: 10 seconds
expected status: 200
failure threshold: 2 consecutive failures
recovery notifications: enabled
region checks: multiple if available

Monitor these separately:

https://yourdomain.com/healthz
https://www.yourdomain.com/healthz
https://api.yourdomain.com/healthz

Do not combine all services into one health endpoint if they are independently exposed.

5) Add SSL certificate monitoring

Enable certificate expiry and TLS validation alerts if your provider supports them.

This catches:

expired certs
broken renewal
invalid chain
hostname mismatch

Manual verification:

bash

openssl s_client -connect yourdomain.com:443 -servername yourdomain.com

6) Configure alerting

Start simple.

Recommended alert design:

email
one real-time channel such as Slack, Discord, PagerDuty, or SMS
alert after 2 failures
send recovery notification
repeat every 15 to 30 minutes during ongoing incident

Each alert should include:

failing URL
status code
response time
failing region
first failure timestamp

Link the alert destination or runbook to:

7) Add one optional synthetic check

Basic uptime checks only verify reachability.

If a user-facing path is critical, add one synthetic check for:

login page load
pricing or checkout page availability
API auth callback landing page

Do not replace /healthz with synthetic monitoring. Use synthetic checks as a second layer.

8) Document ownership and response path

For each monitor, document:

who gets alerted
where alerts go
where the runbook lives
which logs to check first
who can roll back a deploy

This should live in the same place as your production runbooks and checklist. See SaaS Production Checklist.

9) Test the monitor before launch

Do not assume alerting works.

Test by doing one of these in staging:

point the monitor at a known failing URL
stop the app process
block the endpoint temporarily
break DNS on a temporary test domain

Confirm:

alert fires
correct person/channel receives it
recovery notification arrives after fix

endpoint

provider

alert channel

failure test

recovery

Process Flow

Common causes

Typical reasons uptime monitoring fails or reports a real outage:

no public health endpoint
health endpoint returns redirect instead of 200
DNS record points to wrong server
Nginx or reverse proxy returns 502 or 503
Gunicorn, Uvicorn, or app process is down
firewall or security group blocks traffic
SSL certificate expired or chain invalid
health endpoint depends on database and fails during transient DB issues
CDN or WAF blocks probes
recent deploy broke env vars, startup command, or route registration
provider outage affects only some regions

Debugging tips

Start from outside, then work inward.

External checks

bash

curl -i https://yourdomain.com/healthz
curl -vk https://yourdomain.com/healthz
dig yourdomain.com +short
dig api.yourdomain.com +short
nslookup yourdomain.com
openssl s_client -connect yourdomain.com:443 -servername yourdomain.com
ping yourdomain.com

Server checks

bash

systemctl status nginx
systemctl status gunicorn
journalctl -u nginx -n 100 --no-pager
journalctl -u gunicorn -n 100 --no-pager
tail -n 100 /var/log/nginx/access.log
tail -n 100 /var/log/nginx/error.log
ss -tulpn | grep -E ':80|:443|:8000'
curl -I http://127.0.0.1:8000/healthz

What to compare

When a monitor reports downtime:

compare failure timestamp to deploy timestamp
check Nginx access/error logs
check app logs
verify DNS resolution
verify TLS validity
verify upstream app process
test from another region or network

If monitor fails but local machine works, suspect:

DNS propagation
CDN/WAF rules
firewall
regional provider routing issue

If health checks pass but users still report broken flows, add synthetic checks and error tracking via Error Tracking with Sentry.

If the public endpoint is failing with proxy errors, see Debugging Production Issues.

Checklist

✓ public /healthz endpoint exists
✓ /healthz returns 200 quickly
✓ endpoint is reachable over HTTPS from outside your infrastructure
✓ main domain is monitored
✓ API domain is monitored
✓ www domain is monitored if users access it
✓ alert channel is configured
✓ alert channel has been tested
✓ failure threshold is set to 2 or more consecutive failures
✓ recovery notifications are enabled
✓ SSL certificate monitoring is enabled
✓ monitors are labeled by environment
✓ staging and production alerts are separated
✓ runbook for responding to alerts is documented
✓ at least one staged alert test has been completed

Related guides

Incident Response Playbook — incident handling after an alert fires
Debugging Production Issues — step-by-step production triage
Error Tracking with Sentry — capture exceptions when the app is technically up but broken
SaaS Production Checklist — production launch and monitoring checklist

FAQ

What should my health endpoint return?

Return HTTP 200 with a small static JSON payload such as:

json

{"status":"ok"}

Keep it fast and avoid heavy dependency checks.

Should uptime monitors check authenticated pages?

Not for the primary uptime check. Use a public health endpoint first. Add synthetic authenticated flows only if they are business-critical.

Why is my app marked down even though the server is running?

Uptime monitors validate the full public path:

DNS
TLS
reverse proxy
routing
application response

A running process alone does not mean the service is reachable.

How many endpoints should I monitor?

At minimum:

main app domain
API domain

Optionally add:

www domain
status page
webhook receiver path
login or billing synthetic path

What is the difference between uptime monitoring and error tracking?

Uptime monitoring tells you whether the service is reachable.

Error tracking captures application exceptions and stack traces when requests fail.

Should `/healthz` check the database?

Usually no. Keep /healthz cheap and stable. If you need dependency-aware checks, use a separate /readyz endpoint.

What interval should I use?

Use 1 minute for production if possible. If budget is tight, 5 minutes is an acceptable starting point.

Should I monitor the homepage or a health endpoint?

Prefer a dedicated health endpoint for reliable uptime checks. Add homepage or browser-based synthetic checks separately if the UI itself is critical.

Do I need multi-region checks?

Yes, if your provider supports them. Multi-region checks reduce false positives and catch regional routing failures.

Can I rely only on my cloud provider status page?

No. You need monitoring against your own domain, DNS, TLS, and app path.

Final takeaway

For a small SaaS, uptime monitoring should be simple, external, and tested.

Start with:

one fast public /healthz endpoint
one-minute checks
real alerting
SSL monitoring
tested response path

Then add:

incident handling via Incident Response Playbook
production debugging via Debugging Production Issues
error tracking via Error Tracking with Sentry
launch validation via SaaS Production Checklist

That gives you the minimum reliable monitoring stack for an MVP or small SaaS in production.

Quick Fix / Quick Setup

What’s happening

Step-by-step implementation

1) Define what "up" means

2) Add a public /healthz endpoint

3) Expose the endpoint through your production stack

4) Create external uptime checks

5) Add SSL certificate monitoring

6) Configure alerting

7) Add one optional synthetic check

8) Document ownership and response path

9) Test the monitor before launch

Common causes

Debugging tips

External checks

Server checks

What to compare

Checklist

Related guides

FAQ

What should my health endpoint return?

Should uptime monitors check authenticated pages?

Why is my app marked down even though the server is running?

How many endpoints should I monitor?

What is the difference between uptime monitoring and error tracking?

Should /healthz check the database?

What interval should I use?

Should I monitor the homepage or a health endpoint?

Do I need multi-region checks?

Can I rely only on my cloud provider status page?

Final takeaway

2) Add a public `/healthz` endpoint

Should `/healthz` check the database?