Handling Background Jobs (Celery / RQ)

The essential playbook for implementing handling background jobs (celery / rq) in your SaaS.

Background jobs move slow or failure-prone work out of the request cycle. Use them for email sending, webhooks, reports, file processing, imports, and scheduled tasks.

For most small SaaS apps:

  • Use Celery when you need retries, scheduling, routing, or more operational control.
  • Use RQ when you want the simplest Redis-backed queue with minimal setup.

Background processing is part of deployment, not just app code. Your worker, broker, scheduler, env vars, logging, and restart policies must ship together. For app structure and env setup, see Structuring a Flask/FastAPI SaaS Project and Environment Variables and Secrets Management. For broader production context, see SaaS Architecture Overview (From MVP to Production).

Quick Fix / Quick Setup

bash
# Celery quick setup (Redis broker)
pip install celery redis

# celery_app.py
from celery import Celery

celery = Celery(
    "app",
    broker="redis://localhost:6379/0",
    backend="redis://localhost:6379/0",
)
celery.conf.update(
    task_serializer="json",
    result_serializer="json",
    accept_content=["json"],
    timezone="UTC",
    enable_utc=True,
    task_acks_late=True,
    worker_prefetch_multiplier=1,
)

@celery.task(bind=True, autoretry_for=(Exception,), retry_backoff=True, max_retries=5)
def send_email(self, user_id):
    print(f"send email to {user_id}")

# enqueue
# from celery_app import send_email
# send_email.delay(123)

# run worker
celery -A celery_app.celery worker --loglevel=info

# optional scheduler for periodic jobs
celery -A celery_app.celery beat --loglevel=info

# RQ quick setup
pip install rq redis

# tasks.py
def send_email(user_id):
    print(f"send email to {user_id}")

# enqueue.py
from redis import Redis
from rq import Queue
from tasks import send_email

redis_conn = Redis(host="localhost", port=6379, db=0)
q = Queue("default", connection=redis_conn)
job = q.enqueue(send_email, 123)
print(job.id)

# run worker
rq worker default

Start with:

  • Redis
  • one queue: default
  • one worker process
  • idempotent tasks
  • retries with backoff
  • no heavy work inside web requests

Rules for the first production version:

  • Pass IDs, not ORM objects.
  • Fetch current state inside the task.
  • Add failure visibility before launch.
  • Keep worker env vars aligned with the web app.
  • Enqueue after DB commit.
app
queue
worker
external service/database

request flow diagram showing app -> queue -> worker -> external service/database.

What’s happening

A request should return quickly. Background jobs handle work that can complete after the response.

Basic flow:

  1. Web app accepts a request.
  2. App stores required DB state.
  3. App enqueues a job message.
  4. Worker pulls the job from Redis or broker.
  5. Worker executes the task.
  6. Worker retries, fails, or marks completion.

Key points:

  • Celery usually uses Redis or RabbitMQ.
  • RQ uses Redis only.
  • Periodic jobs need a scheduler:
    • Celery: celery beat
    • RQ: rq-scheduler or cron that enqueues work
  • Jobs are eventually consistent.
  • Your app must tolerate:
    • delayed execution
    • duplicate execution
    • retries
    • partial failure

If your SaaS architecture is still being defined, review SaaS Architecture Overview (From MVP to Production).

Step-by-step implementation

1. Identify work to offload

Good candidates:

  • email sending
  • webhook fan-out
  • image or file processing
  • CSV imports
  • report generation
  • cleanup jobs
  • scheduled billing or renewal checks

Do not offload short work that must complete before responding.

2. Install and verify Redis

bash
# local
redis-server

# verify
redis-cli ping

Expected:

bash
PONG

Use env vars, not hardcoded connection strings. See Environment Variables and Secrets Management.

Example:

bash
export REDIS_URL=redis://localhost:6379/0

3. Create a queue module

Celery example

python
# celery_app.py
import os
from celery import Celery

REDIS_URL = os.environ["REDIS_URL"]

celery = Celery(
    "app",
    broker=REDIS_URL,
    backend=REDIS_URL,
)

celery.conf.update(
    task_serializer="json",
    result_serializer="json",
    accept_content=["json"],
    timezone="UTC",
    enable_utc=True,
    task_acks_late=True,
    worker_prefetch_multiplier=1,
    task_time_limit=300,
    task_soft_time_limit=240,
)

RQ example

python
# queue_app.py
import os
from redis import Redis
from rq import Queue

redis_conn = Redis.from_url(os.environ["REDIS_URL"])
default_queue = Queue("default", connection=redis_conn)

4. Write tasks that accept primitive arguments only

Do this:

python
# tasks.py
def send_email(user_id: int):
    # fetch current data inside the task
    print(f"send email to {user_id}")

Do not do this:

python
# bad: stale object / serialization problems
def send_email(user):
    print(user.email)

5. Enqueue after DB commit

Do not enqueue a job before the database transaction is guaranteed to exist.

Pattern:

python
# pseudo-code
user = create_user(...)
db.session.commit()
send_email.delay(user.id)  # Celery

Or for RQ:

python
job = default_queue.enqueue(send_email, user.id)

If you enqueue before commit, the worker may run before the row exists.

6. Add retries with backoff

Use retries only for transient failures:

  • network timeouts
  • temporary upstream API errors
  • short broker/storage disruptions

Do not retry forever on:

  • validation errors
  • missing required records
  • bad input
  • permanent permission failures

Celery retry example

python
from celery import Celery

celery = Celery("app")

@celery.task(
    bind=True,
    autoretry_for=(TimeoutError, ConnectionError),
    retry_backoff=True,
    retry_jitter=True,
    max_retries=5,
)
def deliver_webhook(self, webhook_id: int):
    print(f"deliver webhook {webhook_id}")

RQ retry example

python
from rq import Retry
from queue_app import default_queue
from tasks import send_email

job = default_queue.enqueue(
    send_email,
    123,
    retry=Retry(max=5, interval=[10, 30, 60, 120, 300]),
)

7. Make tasks idempotent

A task can run more than once because of retries, crashes, or deploy interruption.

Patterns that work:

  • unique DB constraints
  • processed flags
  • idempotency keys for external APIs
  • upserts instead of blind inserts
  • checking current status before side effects

Example:

python
def process_invoice(invoice_id: int):
    invoice = get_invoice(invoice_id)

    if invoice.sent_at is not None:
        return

    external_id = send_invoice_to_provider(invoice)
    mark_invoice_sent(invoice_id, external_id=external_id)

For payment-related jobs, idempotency is mandatory.

8. Add worker process startup

Celery

bash
celery -A celery_app.celery worker --loglevel=info

RQ

bash
rq worker default

The worker is a separate service. Do not run jobs in the web process.

9. Add scheduling for periodic jobs

Celery Beat

bash
celery -A celery_app.celery beat --loglevel=info

Example config:

python
from celery.schedules import crontab

celery.conf.beat_schedule = {
    "cleanup-expired-uploads": {
        "task": "tasks.cleanup_expired_uploads",
        "schedule": crontab(minute="*/15"),
    },
}

RQ options

Use one of:

  • rq-scheduler
  • OS cron that calls a script and enqueues jobs

Example cron-triggered enqueue script:

python
# enqueue_cleanup.py
from queue_app import default_queue
from tasks import cleanup_expired_uploads

default_queue.enqueue(cleanup_expired_uploads)
bash
*/15 * * * * /path/to/venv/bin/python /app/enqueue_cleanup.py

10. Add production supervision

Use systemd, Supervisor, or container restart policies.

systemd example for Celery worker

ini
# /etc/systemd/system/celery.service
[Unit]
Description=Celery Worker
After=network.target redis.service

[Service]
User=www-data
WorkingDirectory=/app
Environment="REDIS_URL=redis://127.0.0.1:6379/0"
ExecStart=/app/venv/bin/celery -A celery_app.celery worker --loglevel=info
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

systemd example for RQ worker

ini
# /etc/systemd/system/rq-worker.service
[Unit]
Description=RQ Worker
After=network.target redis.service

[Service]
User=www-data
WorkingDirectory=/app
Environment="REDIS_URL=redis://127.0.0.1:6379/0"
ExecStart=/app/venv/bin/rq worker default
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Start and enable:

bash
sudo systemctl daemon-reload
sudo systemctl enable --now celery
# or
sudo systemctl enable --now rq-worker

11. Add queue separation only when needed

Start with one queue. Add more only if job classes interfere.

Useful split:

  • priority: password reset email, auth notifications, user-facing webhook retries
  • default: normal app tasks
  • bulk: imports, exports, backfills, reports

Celery worker with queue selection:

bash
celery -A celery_app.celery worker -Q priority,default --loglevel=info

RQ separate workers:

bash
rq worker priority
rq worker default
rq worker bulk

12. Add observability and failure review

Log fields:

  • task name
  • job ID
  • user/account/tenant ID
  • object ID
  • queue name
  • retry attempt
  • duration
  • result state

Track:

  • queue depth
  • oldest queued job age
  • failed jobs count
  • worker restarts
  • task runtime percentiles

For production validation, use SaaS Production Checklist.

Web Service
Queue
Worker
External Service

Worker Topology

Common causes

Most job failures or missing execution come from infrastructure mismatch, not task code.

  • Worker service is not running.
  • Worker is subscribed to the wrong queue.
  • Redis is unreachable or using the wrong DB index.
  • Firewall rules block Redis access.
  • Task import path is wrong.
  • Worker never registered the task.
  • Job payload contains unserializable objects.
  • Task was enqueued before database commit.
  • Scheduler is not running for periodic jobs.
  • Old workers still run old code after deploy.
  • Long-running jobs hit time limits.
  • External API calls hang because no timeout was set.
  • Memory or CPU pressure kills workers.
  • Retry policy is wrong:
    • retries forever
    • retries too fast
    • no retries on transient failures
    • retries permanent failures

Debugging tips

Start with the queue, then the worker, then task code.

Check Redis connectivity

bash
redis-cli ping
bash
redis-cli llen default
bash
redis-cli keys '*celery*'

Check worker processes

bash
ps aux | grep -E 'celery|rq|redis'

Celery inspection

bash
celery -A celery_app.celery inspect active
celery -A celery_app.celery inspect registered
celery -A celery_app.celery inspect stats
celery -A celery_app.celery report

RQ inspection

bash
rq info

Run a worker manually to observe output:

bash
rq worker default

View logs

bash
journalctl -u celery -n 200 --no-pager
journalctl -u rq-worker -n 200 --no-pager
docker logs <worker-container-name> --tail 200

Enqueue a trivial test task

Celery:

bash
python -c "from celery_app import send_email; r=send_email.delay(1); print(r.id)"

If a trivial task works, your broker and worker path are likely correct. Then debug business logic.

Useful checks

  • Confirm the queue actually contains jobs.
  • Confirm the worker logs show task registration at startup.
  • Confirm the worker and web app use the same env vars.
  • Confirm deploy updated worker code, not only web code.
  • Measure queue latency separately from task runtime.
  • Inspect failed job registries or result backend entries.
  • If jobs run twice, investigate:
    • acknowledgements
    • retries
    • worker crashes
    • deploy interruption

If this is happening after a deployment change, validate your service model and env propagation with Environment Variables and Secrets Management and Structuring a Flask/FastAPI SaaS Project.

Checklist

  • Redis is reachable from app and worker.
  • Worker process starts on boot.
  • Worker restarts automatically on failure.
  • Tasks are idempotent.
  • Jobs pass primitive arguments only.
  • Retries are configured for transient failures.
  • Timeouts are configured for external calls.
  • Long-running jobs are isolated if needed.
  • Queues are separated only when priorities require it.
  • Periodic scheduler is configured if needed.
  • Logs include task and object identifiers.
  • Failed jobs are visible and reviewable.
  • Requeue procedure exists.
  • Deploy procedure updates worker and web together.
  • In-flight jobs are handled safely during deploys.
  • Worker and web use the same env vars and credentials.

Cross-check final launch readiness with SaaS Production Checklist.

FAQ

Should I pick Celery or RQ for a small SaaS?

Pick RQ for the simplest Redis-only setup. Pick Celery if you need scheduled tasks, richer retries, multiple queues, or more control over worker behavior.

What jobs should not be offloaded?

Short, deterministic work that must complete before the response should stay inline. Background jobs are for slow, retryable, or asynchronous work.

How do I make tasks safe to retry?

Use idempotent updates, unique constraints, processed flags, external API idempotency keys, and avoid side effects before state is persisted.

Why are jobs enqueued but not executing?

Usually one of these:

  • worker is down
  • worker is listening to a different queue
  • worker cannot import the task
  • worker cannot connect to Redis

When should I add separate queues?

Add them when high-priority jobs like auth emails or webhooks are delayed by low-priority batch jobs such as imports, exports, or report generation.

Do I need a result backend?

Only if you need job results or framework-level task status lookup. Many small SaaS apps store task outcome in their own database instead.

Can I run background jobs in the web process?

No. That blocks requests and makes jobs unreliable during restarts, crashes, or request timeouts.

Final takeaway

Background jobs are required once your SaaS sends emails, processes files, calls flaky APIs, or runs scheduled work.

Keep the first version simple:

  • Redis
  • one queue
  • one worker
  • idempotent tasks
  • retries with backoff
  • failure visibility
  • supervised worker processes

Use RQ when simplicity matters most. Use Celery when retries, scheduling, routing, and queue control become core infrastructure. Ship worker deployment, Redis, env vars, logging, and restart behavior as one production unit, not as separate afterthoughts.