Handling Background Jobs (Celery / RQ)
The essential playbook for implementing handling background jobs (celery / rq) in your SaaS.
Background jobs move slow or failure-prone work out of the request cycle. Use them for email sending, webhooks, reports, file processing, imports, and scheduled tasks.
For most small SaaS apps:
- Use Celery when you need retries, scheduling, routing, or more operational control.
- Use RQ when you want the simplest Redis-backed queue with minimal setup.
Background processing is part of deployment, not just app code. Your worker, broker, scheduler, env vars, logging, and restart policies must ship together. For app structure and env setup, see Structuring a Flask/FastAPI SaaS Project and Environment Variables and Secrets Management. For broader production context, see SaaS Architecture Overview (From MVP to Production).
Quick Fix / Quick Setup
# Celery quick setup (Redis broker)
pip install celery redis
# celery_app.py
from celery import Celery
celery = Celery(
"app",
broker="redis://localhost:6379/0",
backend="redis://localhost:6379/0",
)
celery.conf.update(
task_serializer="json",
result_serializer="json",
accept_content=["json"],
timezone="UTC",
enable_utc=True,
task_acks_late=True,
worker_prefetch_multiplier=1,
)
@celery.task(bind=True, autoretry_for=(Exception,), retry_backoff=True, max_retries=5)
def send_email(self, user_id):
print(f"send email to {user_id}")
# enqueue
# from celery_app import send_email
# send_email.delay(123)
# run worker
celery -A celery_app.celery worker --loglevel=info
# optional scheduler for periodic jobs
celery -A celery_app.celery beat --loglevel=info
# RQ quick setup
pip install rq redis
# tasks.py
def send_email(user_id):
print(f"send email to {user_id}")
# enqueue.py
from redis import Redis
from rq import Queue
from tasks import send_email
redis_conn = Redis(host="localhost", port=6379, db=0)
q = Queue("default", connection=redis_conn)
job = q.enqueue(send_email, 123)
print(job.id)
# run worker
rq worker defaultStart with:
- Redis
- one queue:
default - one worker process
- idempotent tasks
- retries with backoff
- no heavy work inside web requests
Rules for the first production version:
- Pass IDs, not ORM objects.
- Fetch current state inside the task.
- Add failure visibility before launch.
- Keep worker env vars aligned with the web app.
- Enqueue after DB commit.
request flow diagram showing app -> queue -> worker -> external service/database.
What’s happening
A request should return quickly. Background jobs handle work that can complete after the response.
Basic flow:
- Web app accepts a request.
- App stores required DB state.
- App enqueues a job message.
- Worker pulls the job from Redis or broker.
- Worker executes the task.
- Worker retries, fails, or marks completion.
Key points:
- Celery usually uses Redis or RabbitMQ.
- RQ uses Redis only.
- Periodic jobs need a scheduler:
- Celery:
celery beat - RQ:
rq-scheduleror cron that enqueues work
- Celery:
- Jobs are eventually consistent.
- Your app must tolerate:
- delayed execution
- duplicate execution
- retries
- partial failure
If your SaaS architecture is still being defined, review SaaS Architecture Overview (From MVP to Production).
Step-by-step implementation
1. Identify work to offload
Good candidates:
- email sending
- webhook fan-out
- image or file processing
- CSV imports
- report generation
- cleanup jobs
- scheduled billing or renewal checks
Do not offload short work that must complete before responding.
2. Install and verify Redis
# local
redis-server
# verify
redis-cli pingExpected:
PONGUse env vars, not hardcoded connection strings. See Environment Variables and Secrets Management.
Example:
export REDIS_URL=redis://localhost:6379/03. Create a queue module
Celery example
# celery_app.py
import os
from celery import Celery
REDIS_URL = os.environ["REDIS_URL"]
celery = Celery(
"app",
broker=REDIS_URL,
backend=REDIS_URL,
)
celery.conf.update(
task_serializer="json",
result_serializer="json",
accept_content=["json"],
timezone="UTC",
enable_utc=True,
task_acks_late=True,
worker_prefetch_multiplier=1,
task_time_limit=300,
task_soft_time_limit=240,
)RQ example
# queue_app.py
import os
from redis import Redis
from rq import Queue
redis_conn = Redis.from_url(os.environ["REDIS_URL"])
default_queue = Queue("default", connection=redis_conn)4. Write tasks that accept primitive arguments only
Do this:
# tasks.py
def send_email(user_id: int):
# fetch current data inside the task
print(f"send email to {user_id}")Do not do this:
# bad: stale object / serialization problems
def send_email(user):
print(user.email)5. Enqueue after DB commit
Do not enqueue a job before the database transaction is guaranteed to exist.
Pattern:
# pseudo-code
user = create_user(...)
db.session.commit()
send_email.delay(user.id) # CeleryOr for RQ:
job = default_queue.enqueue(send_email, user.id)If you enqueue before commit, the worker may run before the row exists.
6. Add retries with backoff
Use retries only for transient failures:
- network timeouts
- temporary upstream API errors
- short broker/storage disruptions
Do not retry forever on:
- validation errors
- missing required records
- bad input
- permanent permission failures
Celery retry example
from celery import Celery
celery = Celery("app")
@celery.task(
bind=True,
autoretry_for=(TimeoutError, ConnectionError),
retry_backoff=True,
retry_jitter=True,
max_retries=5,
)
def deliver_webhook(self, webhook_id: int):
print(f"deliver webhook {webhook_id}")RQ retry example
from rq import Retry
from queue_app import default_queue
from tasks import send_email
job = default_queue.enqueue(
send_email,
123,
retry=Retry(max=5, interval=[10, 30, 60, 120, 300]),
)7. Make tasks idempotent
A task can run more than once because of retries, crashes, or deploy interruption.
Patterns that work:
- unique DB constraints
- processed flags
- idempotency keys for external APIs
- upserts instead of blind inserts
- checking current status before side effects
Example:
def process_invoice(invoice_id: int):
invoice = get_invoice(invoice_id)
if invoice.sent_at is not None:
return
external_id = send_invoice_to_provider(invoice)
mark_invoice_sent(invoice_id, external_id=external_id)For payment-related jobs, idempotency is mandatory.
8. Add worker process startup
Celery
celery -A celery_app.celery worker --loglevel=infoRQ
rq worker defaultThe worker is a separate service. Do not run jobs in the web process.
9. Add scheduling for periodic jobs
Celery Beat
celery -A celery_app.celery beat --loglevel=infoExample config:
from celery.schedules import crontab
celery.conf.beat_schedule = {
"cleanup-expired-uploads": {
"task": "tasks.cleanup_expired_uploads",
"schedule": crontab(minute="*/15"),
},
}RQ options
Use one of:
rq-scheduler- OS cron that calls a script and enqueues jobs
Example cron-triggered enqueue script:
# enqueue_cleanup.py
from queue_app import default_queue
from tasks import cleanup_expired_uploads
default_queue.enqueue(cleanup_expired_uploads)*/15 * * * * /path/to/venv/bin/python /app/enqueue_cleanup.py10. Add production supervision
Use systemd, Supervisor, or container restart policies.
systemd example for Celery worker
# /etc/systemd/system/celery.service
[Unit]
Description=Celery Worker
After=network.target redis.service
[Service]
User=www-data
WorkingDirectory=/app
Environment="REDIS_URL=redis://127.0.0.1:6379/0"
ExecStart=/app/venv/bin/celery -A celery_app.celery worker --loglevel=info
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.targetsystemd example for RQ worker
# /etc/systemd/system/rq-worker.service
[Unit]
Description=RQ Worker
After=network.target redis.service
[Service]
User=www-data
WorkingDirectory=/app
Environment="REDIS_URL=redis://127.0.0.1:6379/0"
ExecStart=/app/venv/bin/rq worker default
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.targetStart and enable:
sudo systemctl daemon-reload
sudo systemctl enable --now celery
# or
sudo systemctl enable --now rq-worker11. Add queue separation only when needed
Start with one queue. Add more only if job classes interfere.
Useful split:
priority: password reset email, auth notifications, user-facing webhook retriesdefault: normal app tasksbulk: imports, exports, backfills, reports
Celery worker with queue selection:
celery -A celery_app.celery worker -Q priority,default --loglevel=infoRQ separate workers:
rq worker priority
rq worker default
rq worker bulk12. Add observability and failure review
Log fields:
- task name
- job ID
- user/account/tenant ID
- object ID
- queue name
- retry attempt
- duration
- result state
Track:
- queue depth
- oldest queued job age
- failed jobs count
- worker restarts
- task runtime percentiles
For production validation, use SaaS Production Checklist.
Worker Topology
Common causes
Most job failures or missing execution come from infrastructure mismatch, not task code.
- Worker service is not running.
- Worker is subscribed to the wrong queue.
- Redis is unreachable or using the wrong DB index.
- Firewall rules block Redis access.
- Task import path is wrong.
- Worker never registered the task.
- Job payload contains unserializable objects.
- Task was enqueued before database commit.
- Scheduler is not running for periodic jobs.
- Old workers still run old code after deploy.
- Long-running jobs hit time limits.
- External API calls hang because no timeout was set.
- Memory or CPU pressure kills workers.
- Retry policy is wrong:
- retries forever
- retries too fast
- no retries on transient failures
- retries permanent failures
Debugging tips
Start with the queue, then the worker, then task code.
Check Redis connectivity
redis-cli pingredis-cli llen defaultredis-cli keys '*celery*'Check worker processes
ps aux | grep -E 'celery|rq|redis'Celery inspection
celery -A celery_app.celery inspect active
celery -A celery_app.celery inspect registered
celery -A celery_app.celery inspect stats
celery -A celery_app.celery reportRQ inspection
rq infoRun a worker manually to observe output:
rq worker defaultView logs
journalctl -u celery -n 200 --no-pager
journalctl -u rq-worker -n 200 --no-pager
docker logs <worker-container-name> --tail 200Enqueue a trivial test task
Celery:
python -c "from celery_app import send_email; r=send_email.delay(1); print(r.id)"If a trivial task works, your broker and worker path are likely correct. Then debug business logic.
Useful checks
- Confirm the queue actually contains jobs.
- Confirm the worker logs show task registration at startup.
- Confirm the worker and web app use the same env vars.
- Confirm deploy updated worker code, not only web code.
- Measure queue latency separately from task runtime.
- Inspect failed job registries or result backend entries.
- If jobs run twice, investigate:
- acknowledgements
- retries
- worker crashes
- deploy interruption
If this is happening after a deployment change, validate your service model and env propagation with Environment Variables and Secrets Management and Structuring a Flask/FastAPI SaaS Project.
Checklist
- ✓ Redis is reachable from app and worker.
- ✓ Worker process starts on boot.
- ✓ Worker restarts automatically on failure.
- ✓ Tasks are idempotent.
- ✓ Jobs pass primitive arguments only.
- ✓ Retries are configured for transient failures.
- ✓ Timeouts are configured for external calls.
- ✓ Long-running jobs are isolated if needed.
- ✓ Queues are separated only when priorities require it.
- ✓ Periodic scheduler is configured if needed.
- ✓ Logs include task and object identifiers.
- ✓ Failed jobs are visible and reviewable.
- ✓ Requeue procedure exists.
- ✓ Deploy procedure updates worker and web together.
- ✓ In-flight jobs are handled safely during deploys.
- ✓ Worker and web use the same env vars and credentials.
Cross-check final launch readiness with SaaS Production Checklist.
FAQ
Should I pick Celery or RQ for a small SaaS?
Pick RQ for the simplest Redis-only setup. Pick Celery if you need scheduled tasks, richer retries, multiple queues, or more control over worker behavior.
What jobs should not be offloaded?
Short, deterministic work that must complete before the response should stay inline. Background jobs are for slow, retryable, or asynchronous work.
How do I make tasks safe to retry?
Use idempotent updates, unique constraints, processed flags, external API idempotency keys, and avoid side effects before state is persisted.
Why are jobs enqueued but not executing?
Usually one of these:
- worker is down
- worker is listening to a different queue
- worker cannot import the task
- worker cannot connect to Redis
When should I add separate queues?
Add them when high-priority jobs like auth emails or webhooks are delayed by low-priority batch jobs such as imports, exports, or report generation.
Do I need a result backend?
Only if you need job results or framework-level task status lookup. Many small SaaS apps store task outcome in their own database instead.
Can I run background jobs in the web process?
No. That blocks requests and makes jobs unreliable during restarts, crashes, or request timeouts.
Final takeaway
Background jobs are required once your SaaS sends emails, processes files, calls flaky APIs, or runs scheduled work.
Keep the first version simple:
- Redis
- one queue
- one worker
- idempotent tasks
- retries with backoff
- failure visibility
- supervised worker processes
Use RQ when simplicity matters most. Use Celery when retries, scheduling, routing, and queue control become core infrastructure. Ship worker deployment, Redis, env vars, logging, and restart behavior as one production unit, not as separate afterthoughts.