A Guide to Async Patterns: Django, Celery, asyncio, React

The codebase for a Reddit scraping side project I'm building has a dozen places where someone might say "this thing runs in the background." Or "this thing streams live." Or "the frontend keeps refreshing." Read three of them in a row and they all look like the same kind of thing. They're not. They're solving different problems with different machinery, and the machinery has different rules.

I've spent a non-trivial fraction of debugging time on async problems where the bug wasn't in the code itself — it was in which kind of async I thought was in play. Fix that confusion at the top of the call stack and the actual bug is usually obvious. This post is the map I wish I had when I started: seven concrete async patterns I use across this codebase, plus four bridges that connect the backend async work to the frontend UI. Most "async" in this codebase is one of those seven; mixing them up is where the bugs live.

The kitchen analogy

The codebase has four time-domains, and they map cleanly onto the areas of a restaurant kitchen.

  • The hot line — regular HTTP requests. A customer orders, a plate arrives in seconds. Synchronous, blocking, fast. Most of the API is this.
  • The prep kitchen — Celery tasks. Work done out-of-band so the dining room stays responsive. The customer doesn't wait at the table for the salad dressing to be whisked.
  • The teppanyaki bar — Server-Sent Events. The customer sits at the bar and watches their dish being made in real time, narrated as it goes. The kitchen and the customer share a synchronised view of progress.
  • The phone tree — webhooks. The kitchen calls customers who set up a number to call, with retries if no one picks up.

Different time-domains, different rules. A waiter (synchronous handler) who tries to do a 20-minute braise on the hot line blocks every other customer in the room. A prep cook (Celery task) who tries to narrate live to the teppanyaki bar (SSE) has no way to send chatter back over the broker. The kitchen analogy is silly, but it sticks — once a pattern is anchored to its "kitchen area" you mostly stop using the wrong machinery for the wrong job.

The seven patterns

I count seven concrete async patterns in this codebase. Some of them stack on top of each other (SSE is built from async views + async generators, sitting on top of Django's ASGI runtime). One of them cross-cuts almost everything (cancellation). The rest are mostly independent — different problems, different solutions.

The Seven Patterns

Two entry points, two composition layers, two specialised patterns — and one that cross-cuts everything

composescomposesspecialisescross-cuts 1, 3, 4, 61Celery tasksbroker-coordinated background work2async/await + Django ORMevent-loop concurrency in the web tier3async generatorsproducer/consumer with free backpressure4SSE wire formattransport on top of async views + generators6webhook retry chainself-rescheduling Celery task7adaptive frontend pollingcadence keyed to row state5cooperative cancellationa flag and three checkpoints
Patterns 1 + 2 are entry points · 3 + 4 are composition · 6 + 7 are specialised · 5 is everywhere.

A few things to call out on the map. Patterns 1 and 2 are entry points — Celery tasks and async/await views are the two ways "this thing might not return right away" enters the system. Patterns 3 and 4 compose — async generators are the engine that SSE drives. Patterns 6 and 7 are specialised — webhook retry chains and adaptive frontend polling are each their own thing, built on top of the basics. And Pattern 5 cross-cuts — cooperative cancellation appears wherever long-running work happens, which means it appears almost everywhere.

The rest of this post walks each of the seven in turn. None of them is doing anything exotic on its own. The interesting part is how they interact — which is what the gotchas section near the end is about.

Pattern 1 — Celery tasks

The oldest async pattern in any Django codebase. A task is a Python function decorated with @shared_task; a worker process (separate from the web tier) picks it up off a Redis broker and runs it; the web request that enqueued it returned in milliseconds. The codebase has seven task files across six apps (scraper, discovery, projects, content_analysis, webhooks, exports).

The decorator carries the usual knobs: bind=True (the task gets self), max_retries (Celery-managed retry, usually 0 in this codebase because we own retry state), soft_time_limit vs time_limit (graceful warning vs hard kill). What surprises people new to Celery is that @shared_task is a capability flag, not a requirement to use .delay(). You can call a decorated task as a plain Python function inline, and it just runs on the calling process. The bulk orchestrator does exactly this:

@shared_task(bind=True, max_retries=0, soft_time_limit=1800, time_limit=2100)
def run_scraping_job_task(self, scraping_job_id: str) -> dict[str, Any]:
    job = ScrapingJob.objects.select_related("project").get(id=scraping_job_id)
    job.mark_running()
    for subreddit in job.config.subreddits:
        try:
            # Inline call — NOT scrape_subreddit_posts_task.delay(...)
            scrape_subreddit_posts_task(job.id, subreddit)
        except Exception as exc:
            job.errors.append({"subreddit": subreddit, "error": str(exc)})
    if job.errors:
        job.mark_partial()
    else:
        job.mark_completed()
    return {"job_id": str(job.id), "status": job.status}

If scrape_subreddit_posts_task.delay(...) were used instead, every subreddit would fan out to a separate worker — different workers, different processes, no shared transaction. By calling inline, the orchestrator keeps everything on one worker, in one transaction-able context, with deterministic ordering. The orchestrator decides when to fan out and when not to; the @shared_task decorator just makes the option available. For a richer Celery example — a task that re-schedules itself via apply_async(countdown=…) to implement a retry chain — see Durable Outbound Webhooks with HMAC and Exponential Backoff.

Pattern 2 — Async/await + Django ORM

Python's asyncio event loop, Django 4+'s native async ORM (aget, aupdate_or_create, aget_or_create, etc.), and the sync_to_async bridges for everything in the Python ecosystem that doesn't yet have an async equivalent. Two locations in the codebase use this: the SSE streaming views in scraper/streaming_views.py, and the async scraper adapters inside the data_provider package.

The plain shape of an async view:

async def stream_job(request, job_id):
    await _authenticate_or_401(request)
    job = await ScrapingJob.objects.aget(id=job_id)
    if job.project.user_id != request.user.id:
        return JsonResponse({"detail": "Not found"}, status=404)
    return StreamingHttpResponse(
        _event_stream(job),
        content_type="text/event-stream",
    )

Two non-obvious things going on. First, this is a plain Django async view, not a DRF @api_view-decorated view. DRF 3.16 still doesn't support async dispatch — the request would be handled synchronously even if the function is async def. So in code that touches the streaming layer, I authenticate manually:

async def _authenticate_or_401(request):
    auth_result = await sync_to_async(JWTAuthentication().authenticate)(request)
    if auth_result is None:
        raise SuspiciousOperation("Unauthenticated")
    request.user, _ = auth_result

JWTAuthentication().authenticate does a database lookup; calling it directly inside an async view would block the event loop. The sync_to_async wrapper shunts it to a thread pool, so the loop keeps spinning. Mildly annoying boilerplate that goes away the day DRF lands async support.

Second, the ORM calls are aget, not get. The a* prefix is Django's native async query method. Mixing get (sync) and aget (async) in the same view is fine but worth being deliberate about — accidentally calling a sync ORM method inside an async view will block the loop just as badly as the auth call would, with none of the warnings.

Pattern 3 — Async generators

async def functions that contain yield. They look like ordinary generators except the consumer drives them with async for. The codebase has three implementations of async def scrape_subreddit_posts_stream(...) (native JSON, Playwright, nodriver CDP), and one consumer that drives them:

async def _event_stream(job):
    async for dto in scraper.scrape_subreddit_posts_stream(subreddit, limit):
        # Cancel checkpoint
        await sync_to_async(job.refresh_from_db)(fields=["cancel_requested"])
        if job.cancel_requested:
            yield b"event: cancelled\ndata: {}\n\n"
            return
        # Persist and emit
        await ScrapedPost.objects.aupdate_or_create(...)
        yield _fmt_event("post", dto.model_dump())

The bit that's worth dwelling on: backpressure is free. The consumer's loop body decides the pace. If it's doing a DB write that takes 200ms per post, the producer's next yield waits 200ms before producing. There is no queue between them, no manual signalling, no asyncio.Queue with a maxsize. The generator protocol itself is the signal — next() returns when the producer is ready, the body runs, the producer waits.

The second free thing is cleanup. When the consumer's async for exits early — because the client disconnected, because cancellation fired, because an exception bubbled up — Python's generator machinery throws GeneratorExit into the producer at the suspended yield. The producer's try/finally block runs. In the Playwright scraper, that means await browser.close(). In the native JSON scraper, it's nothing. Either way, no leaked resources.

These generators are the engine behind the data_provider package — for the factory/registry layer that picks which scraper backend produces the events at runtime, see One Interface, Five Scraping Backends.

Pattern 4 — Server-Sent Events (SSE)

A transport layer, not an execution model. SSE rides on top of a plain HTTP response that the server keeps open and writes to incrementally. Django's StreamingHttpResponse wraps an iterable (in our case, an async generator) and flushes each chunk as it's yielded. The wire format is a tiny, strict text grammar:

event: post
data: {"id": "t3_abc", "title": "...", "author": "..."}

:keepalive

event: complete
data: {"posts_scraped": 30}

Three rules: event: names the message type, data: carries one line of UTF-8 (usually JSON), and a blank line ends the message. A line starting with : is a comment — browsers ignore it, but proxies see traffic, which is what the :keepalive lines are for.

This is the layer the SSE article lives in. For the full breakdown of the wire format, the proxy_buffering off trap that makes SSE look broken in production until you find it, and why I use fetch + ReadableStream instead of the browser's built-in EventSource, see Server-Sent Events, in Production. For this article, the takeaway is that SSE is just Pattern 2 (async views) producing Pattern 3 (async generators) with a wire format that the browser knows how to chop into frames. The "real-time" feeling is the natural consequence of incremental flushes, not a separate piece of machinery.

Pattern 5 — Cooperative cancellation

The cross-cutting one. Cancellation in this codebase is a boolean column on the job rowcancel_requested = BooleanField(default=False) — and three observation points that read it. The bulk orchestrator (Celery, Pattern 1) checks the flag between subreddits. The SSE streaming view (Pattern 4 over Pattern 3) checks at the top of every generator iteration. The exports task checks between sheets.

Two-Sided Cancel

One click fires two halves in parallel — server stops working, client stops reading

user

server halfPOST /cancel · flag flip · checkpoint sees · emit cancelled
clientserver / generator
client halfAbortController.abort() · consumer ignores
controllerfetch reader

Forget either half: the server keeps working invisibly, or the client renders ghost events from a stream the user already “stopped”.

Cancellation is safe but delayed. If the producer is mid-network-call to Reddit when the cancel arrives, the cancel doesn't fire until the network call returns and control flow reaches the next checkpoint. You can't preempt arbitrary code — you can only check the flag at known safe points. Pick those points carefully (between subreddits, not in the middle of one) and the worst-case latency is bounded by the longest operation between checkpoints. In practice that's a few seconds, which is fine for a user pressing Cancel.

The other half of cancellation lives on the client. The frontend hook owns an AbortController; when the user clicks Cancel, it fires two requests: a POST /cancel to flip the server flag, and controller.abort() to kill the local fetch. Both halves are necessary. The POST /cancel ensures the server-side work stops (and that any other consumer of the same job sees the cancelled state). The controller.abort() ensures the SSE consumer ignores any events that were already in flight when the cancel decision was made. Forget either half and you end up with weird state — either the server keeps working invisibly, or the client renders ghost events from a stream the user already "stopped."

Pattern 6 — Webhook retry chain

A Celery task that re-schedules itself. When a webhook delivery fails, the courier task computes the next backoff (1m, 5m, 15m, 1h, 6h — the exponential schedule) and calls deliver_webhook_task.apply_async(args=[delivery_id], countdown=seconds). After five failed attempts the row is marked dead and the chain stops. State — attempt_count, next_retry_at, status — lives on the WebhookDelivery row in the database.

The reason I don't use Celery's built-in self.retry(...) is the central argument of the webhooks article — owning retry state in the database (rather than the broker) buys you queue observability (the next attempt is visible to the web UI as WebhookDelivery.objects.filter(status='failed', next_retry_at__isnull=False)) and idempotency (the is_terminal() guard at task entry no-ops on accidental double-delivery). For the full architectural argument, the visualization of the 7-hour retry timeline, and the HMAC signing details, see Durable Outbound Webhooks with HMAC and Exponential Backoff.

For this article, the takeaway is that this is also Pattern 1 (a Celery task), but the task's failure path schedules itself rather than calling self.retry. Same broker, same workers, same decorator — different control flow for "what happens when this fails."

Pattern 7 — Adaptive frontend polling

The seventh pattern lives in the React frontend, not the Python backend. A useEffect hook schedules a setInterval whose cadence depends on the most recently fetched state. While pending, poll every 3 seconds. While running, poll every 5 seconds. When the state hits terminal (one of complete, failed, cancelled), the polling stops.

useEffect(() => {
  if (intervalRef.current) clearInterval(intervalRef.current)
  if (!jobs) return
  const hasRunning = jobs.some((j) => j.status === 'running')
  const hasPending = jobs.some((j) => j.status === 'pending')
  if (!hasRunning && !hasPending) return
  const cadence = hasRunning ? 5000 : 3000
  intervalRef.current = setInterval(() => { refetch() }, cadence)
  return () => { if (intervalRef.current) clearInterval(intervalRef.current) }
}, [jobs, refetch])

Polling Cadence

State drives cadence — terminal means no polling at all

status = "running"status terminalpendingcadence: 3 spolls firedrunningcadence: 5 spolls firedterminalcadence: no polling
Each component owns its own interval — no global subscription manager, no shared state to debug.

The insight: polling is state-driven, not time-driven. A polling interval picked once and held forever wastes traffic when nothing is happening and feels slow exactly when the user wants the most responsiveness — at state transitions. By rescheduling on every fetch, the cadence is always relative to the latest known state. The pending→running transition is the moment a user cares most; the 3-second cadence catches it fast. Once running, the work is going to take a while regardless, so the 5-second cadence is fine. When terminal, polling stops completely — no traffic, no battery, no proxy load.

This is intentionally not a global subscription manager. Each component owns its own interval. The trade is that two components polling the same endpoint will each fire their own request — which is fine at our scale (a handful of open tabs, maybe 4 endpoints between them) and would only become a problem at much higher concurrency. The simpler architecture means there's no global state to debug when something goes weird.

Execution models compared

Step back from the patterns and notice that they sit on top of three different execution models:

  • Celery worker: a separate process, coordinated via a broker (Redis here). One worker can run many tasks, but each task gets its own execution context. Web tier hands off and forgets; the worker may live for hours.
  • Async event loop: a single thread inside a single web-worker process, multitasking cooperatively. Many in-flight requests can share one process if they spend most of their time await-ing I/O. There's no parallelism here — only concurrency — but the cost-per-in-flight-request is tiny.
  • Frontend polling: the browser sending small, sequential requests on a schedule. Each request hits a normal sync endpoint that returns quickly with a snapshot of state. The "async-ness" is in the cadence, not the protocol.

Execution Models

What each model actually does at runtime — same wall-clock axis (30 s), animated in sync

Pattern 1

Celery worker

· web tier hands off · worker runs long

web tier

celery worker

0 s10 s20 s30 s
Latency to done:
~25 s
Connections held:
0 on web tier · 1 on worker
Infra:
web + worker + broker
Pattern 2

Async event loop

· one process · five overlapping in-flight requests

web worker (event loop)

0 s10 s20 s30 s
Latency to done:
~6 s each
Concurrent in flight:
5 sharing 1 worker
Infra:
web tier only
Pattern 7

Frontend polling

· repeated small requests every 3 s · last one returns terminal

browser

0 s10 s20 s30 s
Latency to done:
bounded by cadence (≤ 3 s)
Requests/min:
~20
Infra:
none beyond HTTP
Same wall-clock axis across all three — the sweep line shows what each model is doing at the same instant.

Different latency, throughput, isolation, failure profiles. Pick by what you need. Celery for "long, can-afford-to-be-slow, must-survive-restart" (typical Django use case: 30-second scrape, 5-minute export). Async event loop for "many in-flight things in the web tier" (typical use case: a streaming endpoint that holds the connection open for minutes while yielding events). Polling for "show progress on long-running work without standing up new infrastructure" (typical use case: status row that updates from pendingrunningcomplete).

The mistake is using the wrong one. A polling loop that fetches every 100ms because someone wants "real-time" is doing the work of an SSE connection at ten times the cost. An async view that does ten sequential synchronous DB queries blocks the loop for the entire round-trip, defeating the point. A Celery task that finishes in 50ms is just overhead — the dispatch alone takes longer than the work.

The four bridges

Each backend async pattern needs a way to surface to the user-facing code. The codebase has four distinct bridges, each with different latency / cancellation / infrastructure trade-offs.

  • Bridge A — SSE → useScrapeStream hook: backend emits SSE frames; frontend consumes via fetch + ReadableStream and a hand-written wire-format parser. Near-zero latency, two-sided cancel (server flag + client AbortController), no extra infrastructure beyond what HTTP already provides.
  • Bridge B — Celery → adaptive polling: backend returns 202 Accepted with a job_id; frontend polls /api/.../jobs/{id}/ every 3 or 5 seconds until terminal. Latency bounded by the cadence; cancellation is just a flag flip (next poll sees the new state); no new infra (no Channels, no WebSockets).
  • Bridge C — Sync HTTP → async work: /fast/ endpoints block until they have results (5–15 seconds), /deep/ endpoints dispatch a Celery task and return 202. Different user intents — quick keyword extraction vs full LLM-driven discovery — drive which endpoint shape gets used.
  • Bridge D — Backend → external (webhooks): backend is the client, external receiver is the server. The frontend sees this indirectly — by polling the WebhookDelivery history, which becomes a user-visible audit trail of attempts.

The Four Bridges

Backend async → frontend UI — same job, four different transports

BridgeLatency to UICancellationNo new infraReach for it when…
A

SSE

backend pushes · frontend reads stream

near-zerotwo-sided · server flag + AbortController

plain HTTP

live feed of many events while the user watches
B

Polling

frontend fetches every N seconds

3–5 sone-sided · server flag, next poll sees it

plain HTTP

one terminal transition, no infra appetite
C

Fast / Deep

sync vs async routes by user intent

sync: 5–15 s · async: 202 + pollsync: no cancel · async: same as B

plain HTTP

one feature with two latency expectations
D

Webhooks

backend is the client · external receives

seconds–hours (retries)no cancel · delivery is the user intent, not a process

plain HTTP + retry chain

notify external integrations of completions
None of the bridges need infrastructure beyond plain HTTP — that’s the cost ceiling.

Three things to notice. The bridges aren't interchangeable. SSE gives near-zero latency at the cost of a complex parser and a two-sided cancel; polling gives mediocre latency at the cost of nothing. Cancellation gets harder as the bridge gets more interactive. Bridge B (polling) cancels in one round-trip; Bridge A (SSE) cancels in two parallel paths that have to meet. Infrastructure adds up. None of these bridges need new infrastructure — that's deliberate. The day I need true many-to-many push (one server event to N clients at once), Channels and a real WebSocket layer become unavoidable. Today, none of the bridges need that, and the simpler stack pays back on every deployment.

Gotchas — when patterns interact

Each pattern in isolation is straightforward. The bugs happen at the seams.

Two writers on the same row. The SSE streaming path has three patterns running on it at once: an async view (Pattern 2), an async generator (Pattern 3), and a cancellation checkpoint (Pattern 5). All three are reading and writing to the same ScrapingJob row. If a Celery bulk task (Pattern 1) targets the same job at the same time, both writers race over mark_completed(). The fix is convention, not code: the streaming path targets one subreddit at a time; the bulk path targets many. They never overlap on the same job ID. Enforcing this in code (with a "streaming-mode" flag on the job) is on the list, but a clear convention plus a code-review rule has been enough so far.

.delay() vs inline calls. Calling task.delay() inside another task fans out to N workers. Sometimes that's what you want (parallelism!), sometimes it's catastrophic (every worker now holds a DB connection, the orchestrator can't reason about completion order). Inline calls keep the work on the calling worker — same process, same transaction-able context. Pick deliberately; the default of .delay() is the wrong default for orchestrator-style tasks.

Async ORM inside sync helpers. The a* ORM methods are great. Calling one of them inside a sync helper function (via async_to_sync(...)) once per row in a loop over 1000 rows is terrible — you pay the cost of an async↔sync bridge per row, the connection pool churns, and a Postgres lock that would be held briefly in batch becomes a contention hotspot. If you need async ORM inside sync code, batch it: await asyncio.gather(*[Model.objects.aupdate(...) for ...]) and bridge once at the top.

Polling stops when fetched state is terminal, not when desired state is terminal. If a user clicks "Retry" on a dead delivery, the row flips to pending and you enqueue a fresh task — but the polling loop won't restart unless the component refetches. The fix is to always refetch on user actions (cancel, retry, trigger), not just on schedule. The polling cadence handles the "what's happening now" question; explicit refetches handle the "I just did something" question.

Closing: the debugging mental model

Seven async patterns, each with a specific problem-shape it solves. Four bridges between backend async and frontend code, each with a different latency / visibility / cancel profile. When debugging anything async, the first question is "which of the seven is in play?" — and the second is "what's the bridge to the user-facing surface?" Most async bugs in this codebase have turned out to be misidentified patterns: a developer reaching for .delay() when they meant inline, an async view that accidentally does sync DB work, a polling loop that doesn't restart after a user action.

This article is the architectural overview that ties the prior four pieces together: Server-Sent Events, in Production is the deep dive on Pattern 4. One Interface, Five Scraping Backends is the deep dive on the factory layer that produces the async generators of Pattern 3. An LLM Abstraction Layer with Pre-Built Agents is Pattern 2 applied to LLM provider calls (with a ReAct loop on top). Durable Outbound Webhooks with HMAC and Exponential Backoff is the deep dive on Pattern 6. The series intentionally goes from features (the prior four) to patterns (this one) — features are easier to motivate, patterns are what you reach for next time.