Server-Sent Events, in Production

I'm watching subreddit posts appear, one at a time, on a page I built. There's no refresh, no polling spinner, no "next 10" button. The posts just arrive — first one, then another, then a third — in the same order the backend pulls them from Reddit. Behind the scenes, a long-running job is scraping a subreddit; in front, a React component is rendering each post the moment it lands.

The question worth asking, the one that ends up shaping the whole architecture, is how do the bytes get there? The answer is older than WebSockets, simpler than Django Channels, and quietly powers a surprising slice of the modern web. It's called Server-Sent Events, and for a one-way live feed it is — almost always — the right tool.

This post is the production walkthrough I wish I had when I picked it up: what SSE actually is, how the server holds a connection open, why I deliberately don't use the browser's built-in EventSource API, the one infrastructure flag that breaks SSE silently behind every reverse proxy, and when to reach for something heavier instead.

What SSE actually is

The clearest analogy I've found: SSE is a one-way radio broadcast over a phone call.

The frontend dials a number (a normal HTTP GET). The backend picks up. Instead of saying one sentence and hanging up — which is what every normal HTTP request does — the backend keeps the line open and talks continuously, sending one short message at a time, until either the work is done or someone hangs up.

Each "sentence" the backend sends has a strict shape — the SSE wire format — so the frontend can chop the audio into discrete messages. Every fifteen seconds or so the backend coughs (:keepalive) so middlemen on the line don't think the call has gone idle and disconnect it.

Crucially: only one direction. The frontend can't talk back over the same line. If the frontend wants to say "stop," it has to make a separate phone call to a different number. That asymmetry — server talks, client listens — is the whole reason SSE exists as a thing distinct from WebSockets. WebSockets is two phones with simultaneous talk. SSE is a one-direction broadcast over plain HTTP. The simpler model unlocks simpler infrastructure: there's no protocol upgrade, no special server, no custom proxy config beyond one flag we'll meet shortly.

The cleanest mental model: SSE is "an incremental HTTP response, with a known wire format, parsed by the browser." It works because modern HTTP servers can hold a response open and flush bytes as a generator yields them. Everything else is convention.

Transport Shapes

Polling vs WebSocket vs SSE — what the wire actually does, over 10 seconds

Polling

high overhead

request every 2 s · most responses empty

clientserver

0s10s

GET

204

GET

204

GET

data

GET

204

GET

204

WebSocket

bidirectional

upgrade once · messages both ways

clientserver

0s10s

upgrade

msg

SSE

server → client

one open response · events flow down

clientserver

0s10s

GET

event

A message on the wire

What actually flows over the wire is boring, and that's the point. The SSE format is a tiny, strict, line-delimited text protocol:

event: post
data: {"id":42,"title":"...","author":"..."}

:keepalive

event: complete
data: {"posts_scraped":30}

Three rules. An event: line names the event so the client can route it. A data: line carries the payload (almost always a JSON string, but spec-wise it's just UTF-8 text). A blank line marks the end of one message — required, no exceptions. A line starting with : is a comment; browsers throw it away, but proxies still see traffic. That's what :keepalive is for — it tells every middlebox on the path that yes, this connection is still alive, please don't kill it.

Wire Format

Anatomy of a single SSE message frame

1event: post

2data: {"id":42,"title":"...","author":"..."}

4:keepalive

·event: complete

·data: {"posts_scraped":30}

1
event:
Names the event so the client can route it to the right handler.
2
data:
The payload — UTF-8, one line. JSON by convention, anything by spec.
3
blank line
The delimiter between messages. Required. Without it, the parser never finishes a frame.
4
:keepalive
A comment line. Browsers ignore it; proxies see traffic and keep the connection open.

That's the entire protocol. No framing, no length prefixes, no binary. You could parse it with a regex. The browser has built-in code that does it for you (we'll get to why I don't use that), but if you ever need to debug an SSE stream, you can just curl -N it and read the lines.

The server side: holding a connection open

The server-side trick is that you have to keep an HTTP response open across many yields without blocking a worker. In Django, three things in order make that possible.

ASGI runtime. WSGI — the older Python web protocol — assumes request-in, response-out, one shot. There's no story for streaming. ASGI is its async-aware successor; it lets a server hold a response open across await points. In practice this means running Django under gunicorn -k uvicorn.workers.UvicornWorker. Without ASGI, you cannot stream. WSGI's whole model is a one-shot exchange.

Async views. A regular Django view is a synchronous function; the request blocks a thread until it returns. An async view is async def, which means the view can await and yield without holding a thread captive. (One side note: Django REST Framework doesn't yet support async dispatch, so the streaming view is a plain Django async view, not a DRF @api_view. I authenticate manually inside it with sync_to_async(JWTAuthentication().authenticate). Mildly annoying; not a dealbreaker.)

StreamingHttpResponse plus an async generator. This is where the magic happens:

from django.http import StreamingHttpResponse

async def stream_job(request, job_id):
    # ... authenticate, look up the job, check ownership ...
    return StreamingHttpResponse(
        _event_stream(job),                  # async generator
        content_type="text/event-stream",   # the SSE content type
    )

_event_stream(job) is an async def function with yield statements in it. That makes it an async generator. Django's ASGI handler iterates over it (async for chunk in generator) and writes each yielded byte string to the wire as it arrives. No buffering at the Django layer. What you yield is what the client gets, in order, as soon as you yield it.

Here's roughly what the generator looks like:

async def _event_stream(job) -> AsyncIterator[bytes]:
    await sync_to_async(job.mark_running)()

    last_keepalive = time.monotonic()

    async for dto in scraper.scrape_subreddit_posts_stream(...):
        # Cancel checkpoint — the only place we honor a stop request
        await sync_to_async(job.refresh_from_db)(fields=["cancel_requested"])
        if job.cancel_requested:
            await sync_to_async(job.mark_cancelled)()
            yield _fmt_event("cancelled", {"detail": "Cancelled by user"})
            return

        # Persist + emit
        await ScrapedPost.objects.aupdate_or_create(...)
        yield _fmt_event("post", dto.model_dump(mode="json"))

        # Heartbeat
        now = time.monotonic()
        if now - last_keepalive >= 15:
            yield b":keepalive\n\n"
            last_keepalive = now

    await sync_to_async(job.mark_completed)()
    yield _fmt_event("complete", {"posts_scraped": count})


def _fmt_event(kind: str, data: dict) -> bytes:
    return f"event: {kind}\ndata: {json.dumps(data)}\n\n".encode("utf-8")

Three details worth pointing at. async for dto in scraper.scrape_subreddit_posts_stream(...) — the scraper itself is an async generator, so back-pressure is automatic; the next post isn't produced until this consumer is ready for it. aupdate_or_create is Django's native async ORM (Django 4+); the SQL runs on the async-aware connection without thread-bridging. sync_to_async(job.mark_cancelled)() — for custom model methods that don't have async variants, asgiref.sync.sync_to_async shunts the call to a thread pool so the event loop doesn't block. I use it only for occasional lifecycle helpers; the per-event writes go through the native async ORM directly.

One pedantic but load-bearing detail: StreamingHttpResponse expects an iterable of bytes, not strings. _fmt_event returns bytes. The keepalive is b":keepalive\n\n". If you accidentally yield a str, Django rejects it. Pin the generator's signature as AsyncIterator[bytes] and the type checker will catch this for you.

The client side: why I don't use EventSource

The browser ships an API called EventSource. It is built for exactly this. It has automatic reconnection. Tutorials love it. The minimal example is five lines:

const es = new EventSource("/api/scraper/jobs/123/stream/")
es.addEventListener("post", (e) => console.log(JSON.parse(e.data)))
es.addEventListener("complete", () => es.close())

I do not use it. The reason is simple and not negotiable: EventSource cannot carry an Authorization: Bearer ... header. None. There is no option, no constructor argument, no second-pass API. You can pass query parameters, but a JWT in a query parameter ends up in your reverse-proxy access logs, your CDN logs, your browser history, and any intermediate logging — leaking your bearer token in five places at once. You can use cookies (and I'd love to), but my auth is currently localStorage-based for unrelated reasons.

So I deliberately use fetch + ReadableStream and parse SSE by hand. It's about thirty lines of code, and it teaches you what the browser was doing for you all along:

async function consume(streamUrl: string, token: string, controller: AbortController) {
  const res = await fetch(streamUrl, {
    method: "GET",
    headers: { Authorization: `Bearer ${token}` },
    signal: controller.signal,
  })
  if (!res.ok || !res.body) throw new Error(`Stream failed: HTTP ${res.status}`)

  const reader = res.body.getReader()
  const decoder = new TextDecoder()
  let buffer = ""

  while (true) {
    const { done, value } = await reader.read()
    if (done) break
    buffer += decoder.decode(value, { stream: true })

    // SSE messages are delimited by a blank line ("\n\n"). A single
    // network chunk can contain several complete messages plus a
    // trailing partial one. Keep the partial as the next buffer.
    const messages = buffer.split("\n\n")
    buffer = messages.pop() ?? ""

    for (const raw of messages) {
      const ev = parseSSEMessage(raw)
      if (ev) applyEvent(ev)
    }
  }
}

Two subtleties matter here. The buffer-the-tail pattern is the canonical SSE parser shape: split on \n\n, treat the last fragment as incomplete, hold it for the next read. Skip this and you'll randomly drop the first message after every chunk boundary. The AbortController owns cancellation; when the user clicks Cancel, I controller.abort() on the local fetch and fire a separate POST /cancel to set the server-side flag the generator checks (more on that below).

Browser APIs

EventSource vs fetch + ReadableStream — capabilities side by side

Capability	EventSource	fetch + ReadableStream
Custom request headers e.g. Authorization: Bearer …
Cookie auth
Auto-reconnect with Last-Event-Id replay
AbortController for cancel
Parses the SSE wire format for you
Code surface	~5 lines	~30 lines

The trade is honest: I give up the browser's free auto-reconnect. When the stream dies (proxy timeout, server restart, network blip), my hook surfaces state: "error" and the UI shows a Retry button. That's worse than EventSource's silent reconnect-and-replay, but it's the price of HTTP-native auth. If I ever migrate to httpOnly cookies for auth, I'll happily revisit using EventSource.

The reason I'm spelling this out: a developer reading the code next month is going to be tempted to "simplify" it back to EventSource and silently break auth. I left a comment at the top of the hook explaining exactly this so it doesn't happen.

The full request, end to end

Stepping back, here's the full path a single SSE request takes — from the moment the React hook calls fetch to the moment a React component re-renders with a new post.

Lifecycle

The full path of a single SSE request, top to bottom

The thing that surprises people coming from a WSGI background: one uvicorn worker holds the entire stream. Not a thread per request — a coroutine. The worker can hold hundreds or thousands of concurrent SSE connections cheaply because each one is just suspended coroutine state, waiting for the next async for iteration. The expensive resource isn't the worker; it's the database connection the generator holds for the duration. That's the real ceiling, and the one to watch when you scale.

The trap that breaks SSE in production

This is the section to skim if you skim nothing else.

You build the streaming view. It works in dev. The events arrive perfectly. You ship it. Then a colleague messages you: "the live feed is broken on staging." You check the API directly with curl -N — events stream fine. You open the browser dev tools — no errors. The frontend connection is open. Nothing is happening. Ninety seconds later, boom, every event arrives at once, in a single burst.

The culprit is your reverse proxy.

By default, nginx buffers the entire HTTP response before forwarding it to the client. This is fine — actually preferable — for normal requests, where it lets nginx absorb slow upstreams and slow clients without holding TCP buffers in the kernel. For SSE, it's a death sentence. "The entire response" for an SSE stream is "everything that ever gets yielded." On a thirty-second scrape, that's thirty seconds of nothing followed by a fire-hose. On a long-running stream, that's potentially hours of silence.

The trap

What proxy_buffering does to an SSE stream

bad

proxy_buffering on (default)

Same yields. Proxy holds them all. Browser sees nothing until the end.

Generator

yields events

Proxy

buffer holds bytes

Browser

nothing — then a burst

event #1

event #2

event #3

event #4

event #5

good

proxy_buffering off

Same yields. Each one flushes through. Browser stays live.

Generator

yields events

Proxy

flushes immediately

Browser

events arriving live

event #1

event #2

event #3

event #4

event #5

The fix is one line of nginx config:

location /api/scraper/jobs/ {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_buffering off;        # the critical one
    proxy_read_timeout 24h;     # streams stay open a long time
    proxy_send_timeout 24h;
}

proxy_buffering off tells nginx to flush bytes upstream as soon as it receives them. The long timeouts prevent the proxy from killing what looks like an idle connection (your 15-second keepalives are the second half of that defense — they prove the connection is alive without actually being events the client cares about).

The exact same trap applies on Coolify, Cloudflare, Fly, Railway, and basically every edge layer. Each has its own config flag for the same thing. Whenever you deploy SSE to a new layer, the first thing to check is the streaming-response semantics. The symptom — "works locally, dead in production, nothing in the logs because no errors fired" — is otherwise the kind of thing that quietly eats an afternoon.

When SSE wins, and when it doesn't

The case for SSE is narrower than its case against. Here's how I think about the three plausible alternatives.

SSE vs WebSockets. WebSockets is the more general protocol. It is also more complex everywhere — server, client, proxy, mental model. For a one-way live feed, WebSockets buys you nothing concrete: there's no bidirectional traffic to justify the bidirectional protocol, the proxy config is harder (Upgrade and Connection: upgrade headers, special timeout handling, often a flag to enable WS support at all), the auth story is worse (token-in-URL leaks; first-message-handshake auth is yet more code), and you usually need new infrastructure (Django Channels, a channel layer like Redis, a separate worker class). I'd reach for WebSockets the moment I needed immediate bidirectional commands — "change the sort order without restarting the stream" — and not a moment sooner.

SSE vs Django Channels. Channels is the canonical way to do WebSockets in Django, and it earns its keep when you need real broadcasting: many connected clients seeing the same updates, server-initiated push to specific users on events that aren't tied to a request, or genuinely bidirectional commands. For a single user watching one job stream, it's overkill. The setup tax — a whole second async layer, a Redis-backed channel layer, routing.py, consumer classes, ProtocolTypeRouter — only pays off when you actually use any of the broadcasting machinery. I don't, so I don't pay it.

SSE vs polling. Polling — the frontend asking "anything new?" every N seconds — is the right call for low-event-rate updates where you're mostly waiting for a single terminal status flip. A scraping job that takes 60 seconds and produces 30 posts is high-event-rate; polling that with a 3-second cadence means twenty round-trips of full headers and JSON payloads, most of them returning "still running, no new posts." SSE does it as one connection with thirty small frames. The math is decisive. I use SSE for the live feed and polling for the other job-status checks (a 3-second cadence while pending, 5s while running, none once terminal) because those checks fit the polling shape much better than the streaming one.

The scale ceiling for SSE on a modest async stack: one uvicorn worker can comfortably hold a few thousand concurrent open streams as coroutines. The real bottleneck shows up at the database — each stream usually holds one connection for its duration, so a thousand concurrent streams wants a thousand DB connections (or a connection pooler in front). For my use case — one user, one stream at a time — SSE is essentially free.

What I left out

A pragmatic deep-dive doesn't get to cover everything. Things I deliberately skipped:

Testing SSE. Unit-testing async generators and integration-testing the wire format are their own small craft; I'll write that up separately once I've made all the mistakes.
Browser compatibility. ReadableStream is broadly supported in modern browsers, but the very-old corners of Safari and any IE-class browser need a polyfill or fallback path. If you have to support those, EventSource (with cookie auth) becomes more attractive again.
The httpOnly-cookie migration. Moving auth out of localStorage would let me go back to EventSource for the simpler API surface. It's on the roadmap.
Multi-user broadcasting. If I ever need many clients seeing the same stream, that's Channels-and-Redis territory, not SSE-with-a-fan-out.
SSE spec extras. Last-Event-Id for resumable streams, the retry: field for custom reconnect intervals, named event types beyond the four I use. The spec has more room than I've used; the implementation has more room than I've described.

Closing

The architecture is the message. SSE survives in production for one-way real-time because it's just HTTP, parsed by a tiny convention everyone already understands. The temptation, always, is to reach for the more general tool — WebSockets, Channels, a message broker — because the more general tool is more impressive on a résumé and more flexible in principle. The discipline is to notice when the more general tool is solving a problem you don't have, and the more boring tool would let you ship the same feature in half the code and a fraction of the infrastructure.

For a live feed of items appearing one at a time, on a page, in the right order, with no clicks and no spinners — SSE is the boring tool. Reach for it first.