An LLM Abstraction Layer with Pre-Built Agents

In a Reddit scraping side project I'm building, the code that calls an LLM doesn't pick one model and stop there — it picks from three, based on what the call actually needs. Gemini is the default: cheap, fast, generous free tier. Claude is the reasoning specialist: I reach for it when I need careful structured output or a complex tool-calling flow. OpenAI is the familiar one: wide ecosystem support, strong tool-calling, good when I want the most-tested option.

The orchestrator code calling these doesn't get to know which one ran. It asks for an LLM (get_llm()), or asks for one that produces a typed Pydantic object (get_llm_with_structured_output(Schema)), or just asks a high-level question (discover_subreddits(["python", "async"])) — and gets back the same shape of answer regardless of provider. The layer that makes this work is a small Python package called ai_provider, and it's the third in this project's series of "make many things look like one thing" abstractions. This post opens it up.

The analogy

The cleanest way in is by analogy: ai_provider is a translation agency with three resident translators and a binder of pre-built lesson plans.

The agency has three translators on staff. There's the fast and cheap one (Gemini) — generous free tier, quick turnaround, the default. There's the reasoning specialist (Claude) — best when you need careful structured output or complex tool-calling. And there's the familiar one (GPT) — wide ecosystem, strong tool-calling, the one most people in the building already know how to talk to.

When you walk in with a job, you can either pick a translator yourself (get_llm(provider="anthropic")) or let the receptionist pick the house default (just get_llm(), which reads settings.LLM_DEFAULT_PROVIDER). Whoever you get, they hand you their work in the same notebook format — LangChain's BaseChatModel. Your downstream code (agents, structured-output wrappers, tool-binding) doesn't care which translator filled it in; the format is identical.

The agency also keeps a binder of pre-built lesson plans — these are the agents. Each one combines a translator (LLM), a textbook the translator has to follow (the system prompt), a set of reference materials (tools), and a grading rubric (a Pydantic schema the LLM must populate exactly). Two lesson plans are baked in today: one that finds Reddit communities for a list of topics, and one that extracts searchable keywords from a fuzzy English description.

The cleanest framing: ai_provider is data_provider's shape, applied to LLMs instead of scrapers — same enum-keyed factory, same per-provider adapters, plus a small library of pre-baked applications on top.

The architecture, in one picture

Three caller patterns sit at the top of the package. The cheapest one asks for a raw LLM (get_llm()). The middle one asks for an LLM that's been wrapped to produce a Pydantic object (get_llm_with_structured_output(Schema)). The richest one asks for a whole pre-built agent (discover_subreddits(...) and friends). All three of them eventually go through the same factory, which dispatches to one of three provider-specific builders, and all three builders return an instance of LangChain's BaseChatModel.

If you read the earlier post on the data_provider package, this is the same shape applied to a different domain — same enum-keyed factory, same per-provider builders, with pre-baked agent applications layered on top. The two packages are siblings.

Architecture

Three caller shapes, one factory, three builders, one converged interface

A few things to call out on the diagram. The interface contract isn't ours. Where data_provider defined its own ABCs (BaseRedditScraper, BaseRedditScraperAsync, etc.) inside the package, ai_provider re-exports LangChain's BaseChatModel and calls that the contract. There's a deliberate reason for this — wrapping it would break LangGraph's introspection — and we'll spend a whole section on it later. For now, just notice: the converged interface every adapter returns is something from an external library, not something we own.

The agents layer sits above the raw LLM layer, not next to it. Each agent (built in agents.py) calls get_llm() internally to produce a BaseChatModel, then wraps that model in create_react_agent(...) along with a system prompt, a list of tools, and a response_format schema. The agent IS an LLM in a control loop with extra context — not a separate provider.

The schemas live in their own file (schemas.py), not under a providers_core/dtos.py like in data_provider. They're the same architectural concept (Pydantic types that enforce the shape of what comes back), but the naming differs for a deliberate reason that comes up in the gotchas section.

The three providers

Why three? Each one earns its slot for a different reason.

Gemini is the default. Google's free tier is generous enough that you can do real work without ever pulling out a credit card, and gemini-2.0-flash (the package's default model) is fast — sub-second responses for typical structured-output prompts. The cost profile is the lowest of the three. Unless a call has a specific reason to want something else, Gemini is what runs.

Claude is the reasoning specialist. The models in use are claude-opus-4-7 for the heaviest reasoning and tool-calling work, and claude-sonnet-4-6 as the everyday workhorse — both with the careful structured-output behavior that makes Anthropic models the right pick for agent flows where the LLM has to reliably populate a complex schema. Tool-calling on Claude is particularly clean — it's much less likely to invent a tool that doesn't exist or skip the structured-output schema at the last step. Higher-tier cost, especially Opus — you reach for it when reasoning quality justifies the bill.

OpenAI is the familiar one. The models in use are gpt-4o (the proven general-purpose option) and gpt-5 (the newer one, when you want the bleeding edge) — both bringing the wide ecosystem familiarity that comes from OpenAI being the API everyone tested first. Tool-calling is mature and well-documented. When you specifically need OpenAI features (vision input, the Assistants API, certain newer model variants), this is the slot.

Providers

Three LLMs, each pulling its weight for a different reason

Provider	Models	Strengths	Cost	Reach for it when…
Gemini LLMProvider.GEMINI	gemini-2.0-flash	fast, generous free tier	$	the default — when no special demand
Claude LLMProvider.ANTHROPIC	claude-opus-4-7claude-sonnet-4-6	careful structured output, clean tool-calling	$$$	reasoning, complex tool flows, schema-heavy work
OpenAI LLMProvider.OPENAI	gpt-4ogpt-5	broad ecosystem, mature tool-calling	$$	OpenAI-specific features, familiar surface

Provider

Models

Strengths

Cost

Reach for it when…

Gemini

LLMProvider.GEMINI

gemini-2.0-flash

fast, generous free tier

the default — when no special demand

Claude

LLMProvider.ANTHROPIC

claude-opus-4-7claude-sonnet-4-6

careful structured output, clean tool-calling

$$$

reasoning, complex tool flows, schema-heavy work

OpenAI

LLMProvider.OPENAI

gpt-4ogpt-5

broad ecosystem, mature tool-calling

OpenAI-specific features, familiar surface

The default is set in one place — settings.LLM_DEFAULT_PROVIDER, currently "gemini". Every call site that doesn't pass a provider argument inherits that default. To switch the whole project to Claude tomorrow, you'd change one setting. To switch one specific call site, you pass provider="anthropic" at the call. The factory enforces validation: ask for "foobar" and you get a ValueError with the list of valid providers, not a silent fall-through to the default.

The three caller shapes

There are exactly three ways callers consume the package. Each builds on the one below it. Walking through them top to bottom is the fastest way to understand the surface area.

Caller Shapes

Three public-API patterns — each one composes on top of the one to its left

Plain LLM

free-form text

llm = get_llm()
response = llm.invoke(
  "What does ts_rank measure?"
)

what runs internally

BaseChatModel.invoke()

returns

BaseMessage

Structured output

typed Pydantic object

llm = get_llm_with_structured_output(
  KeywordExtractionResult,
)
result = llm.invoke("...")

what runs internally

BaseChatModel

.with_structured_output(Schema)

returns

KeywordExtractionResult

Pre-built agent

ReAct loop + tools + schema

result = discover_subreddits([
  "python", "async", "scraping",
])

what runs internally

BaseChatModel

+ system prompt + tools

create_react_agent (LangGraph loop)

response_format=Schema

returns

SubredditDiscoveryResult

Shape A — Plain LLM. Two lines:

from ai_provider import get_llm
llm = get_llm()
response = llm.invoke("What does ts_rank measure?")

What runs: get_llm() resolves the provider (default if not passed), looks up the provider's builder function in _LLM_REGISTRY, invokes the builder, and returns the resulting BaseChatModel. The builder reads the API key from an env var, calls LangChain's init_chat_model(...), and returns the model. The caller invokes it and LangChain handles the HTTP round-trip. The return value is a BaseMessage (free-form text in response.content).

Shape B — Structured output. Same as Shape A, but the LLM must produce a Pydantic-validated object instead of free-form text:

from ai_provider import get_llm_with_structured_output, KeywordExtractionResult
llm = get_llm_with_structured_output(KeywordExtractionResult)
result = llm.invoke("I'm researching what indie founders complain about")
# result is a KeywordExtractionResult instance:
#   result.keywords  → list[ExtractedKeyword]
#   each ExtractedKeyword has .keyword, .confidence, .reasoning

What runs: same as Shape A, plus .with_structured_output(schema) wraps the model. LangChain handles the JSON-schema / function-calling dance with the underlying provider so the model is forced to return a payload that fits the schema. On .invoke(), validation happens inside LangChain and you get a Python object back, not a BaseMessage. This is the same trust-boundary pattern the data_provider article spelled out — strict Pydantic types catch malformed output at the boundary — except here LangChain owns the round-trip.

Shape C — Pre-built agent. One line:

from ai_provider import discover_subreddits
result = discover_subreddits(["python", "async", "scraping"])
# result.subreddits is list[DiscoveredSubredditSchema] with relevance reasons

That one line wraps a whole create_react_agent(...) invocation behind the scenes. The agent gets a system prompt, a serp_tool (BrightData SERP search), and a response_format=SubredditDiscoveryResult schema. Under .invoke(), LangGraph runs a ReAct loop: the LLM emits tool calls, the SERP tool runs, results come back, the LLM either calls more tools or finalizes, and the final output is constrained to the schema. The whole loop is what makes Shape C interesting enough to deserve its own visualization later in the post.

The thing worth noticing about the three shapes: each one composes on top of the one below. Shape A is just a model. Shape B is Shape A.with_structured_output(Schema). Shape C is "Shape B + tools + a ReAct loop", wrapped behind a one-line convenience function. The implementation is a stack, not three parallel features.

Why we don't wrap `BaseChatModel`

If you walked into providers_core/base.py expecting to find a class BaseLLM(ABC) with abstract methods — the way data_provider/providers_core/base.py defines BaseRedditScraper(ABC) — you'd hit something surprising:

from langchain_core.language_models.chat_models import BaseChatModel
__all__ = ["BaseChatModel"]

That's the whole file. No abstract methods. No local ABC. Just a re-export of LangChain's BaseChatModel and a __all__ so other code can import it from our package without reaching directly into LangChain's namespace.

This looks like an asymmetry with data_provider. It is. It's deliberate.

The Asymmetry

Why ai_provider re-exports BaseChatModel instead of wrapping it

don'tIf we wrapped

Step 1 · local ABC

# providers_core/base.py (hypothetical)
class BaseLLM(ABC):
    @abstractmethod
    def invoke(self, prompt: str) -> str: ...

Step 2 · adapter inherits from ABC

class GeminiAdapter(BaseLLM):
    def invoke(self, prompt):
        return chat_model.invoke(prompt)
# GeminiAdapter is NOT a BaseChatModel

Step 3 · LangGraph rejects it

introspection fails

create_react_agent, .with_structured_output, .bind_tools all expect BaseChatModel methods — and our wrapper doesn't satisfy them.

doWhat we actually do

Step 1 · re-export, don't wrap

# providers_core/base.py (real)
from langchain_core.language_models.chat_models \
    import BaseChatModel

__all__ = ["BaseChatModel"]

Step 2 · adapter returns BaseChatModel

def build_gemini_llm(model, temperature, **kwargs):
    return init_chat_model(
        f"google_genai:{model}",
        temperature=temperature, **kwargs,
    )  # returns a BaseChatModel directly

Step 3 · LangGraph accepts it

full feature set works

create_react_agent, .with_structured_output, .bind_tools all introspect a real BaseChatModel and find everything they need.

The reason is that LangGraph and the wider LangChain ecosystem don't just trust that something behaves like a chat model — they introspect BaseChatModel directly. create_react_agent(model=...), model.with_structured_output(Schema), model.bind_tools([...]) — all of these look at specific attributes and method signatures on the actual BaseChatModel class. If you wrapped it like this:

# DON'T DO THIS
class BaseLLM(ABC):
    @abstractmethod
    def invoke(self, prompt: str) -> str: ...

class GeminiAdapter(BaseLLM):
    def invoke(self, prompt: str) -> str: ...

…then your GeminiAdapter instance no longer IS a BaseChatModel from LangChain's perspective. The introspection fails — create_react_agent doesn't recognize it as a valid model argument, with_structured_output doesn't exist on your class, bind_tools is missing. You could try inheriting from both (class GeminiAdapter(BaseLLM, BaseChatModel)) and you've achieved nothing — you've just paid the cost of a second ABC for zero added value.

So ai_provider makes a deliberate trade: the interface contract lives in an external library, not in the package. The other layers — enum, factory, per-provider builders, validators, Pydantic schemas — are still ours, because those are the parts where wrapping adds clear value (validation, defaults, error messages, future-flexibility). The chat-model interface itself stays where LangChain owns it. Less symmetry with data_provider, but the symmetry would have cost us LangGraph compatibility, and that cost wasn't worth paying.

The lesson generalizes: when the value of an abstraction comes from an external library's introspection, don't wrap it. Re-export and add value at the other layers instead.

Pre-built agents and the ReAct loop

The package's most interesting layer sits above the raw LLM. discover_subreddits(...) and extract_keywords(...) are thin convenience wrappers over get_reddit_discovery_agent() and get_keyword_extraction_agent() — and both agents are built with the same recipe:

def get_reddit_discovery_agent():
    model = get_llm(temperature=0.1)             # Shape A under the hood
    serp_tool = get_serp_tool()                  # BrightDataSERP search
    agent = create_react_agent(
        model=model,
        tools=[serp_tool],
        prompt=SUBREDDIT_DISCOVERY_SYSTEM_PROMPT,
        response_format=SubredditDiscoveryResult,
    )
    return agent

Four arguments, four responsibilities: the model decides what to do, the tools are what it can do, the system prompt is how it's been instructed to behave, and response_format constrains the final output to a Pydantic schema. When you call agent.invoke({"messages": [...]}), LangGraph runs a ReAct loop until either the LLM produces a structured output or it gives up.

ReAct Loop

What runs inside create_react_agent — LLM, tool, repeat, then structured output

iteration

12final

LLM

BaseChatModel

“find Reddit communities for python, async”

serp_tool("python subreddits")

r/python, r/learnpython, ...

serp_tool("async python subreddits")

r/asyncio, r/Python, ...

llmtool

serp_tool

BrightDataSERP

final output zone (empty until the loop terminates)

SubredditDiscoveryResult (validated Pydantic instance)

The LLM decides what runs next. LangGraph just runs the loop and enforces the schema at the end.

The loop, step by step: LangGraph sends the system prompt plus the user's message to the LLM. The LLM decides — "I should search for python subreddits first" — and emits a tool call. LangGraph runs the named tool with the LLM's arguments and gets results back. LangGraph sends those results back to the LLM as a tool message. The LLM looks at what came back and either calls another tool ("let me also search for async subreddits") or decides it has enough information to answer. When the LLM is ready to answer, LangGraph constrains the final output to the response_format schema — the answer doesn't just come out as text, it comes out as a SubredditDiscoveryResult Python object with the schema's fields populated. The result dict has a "structured_response" key holding the validated Pydantic instance.

An agent, in this framing, is a state machine on top of an LLM, with the LLM acting as the controller. LangGraph just runs the loop and enforces the schema at the end.

A small implementation detail worth noticing: the agent is created fresh on every call. No singleton caching. Each discover_subreddits(...) call builds a new BaseChatModel, a new BrightDataSERP tool, and a new create_react_agent(...) instance. The cost of construction is tiny (a few attribute assignments — the expensive part is the actual LLM and tool calls), and the benefit is that if you mutate the system prompt or swap a tool, the very next call sees the new configuration without a process restart. The keyword-extraction agent (get_keyword_extraction_agent) is the simpler sibling: same create_react_agent, but with tools=[] (no SERP needed — the LLM just analyzes the user's input text). Same recipe, fewer moving parts.

Two more gotchas

Schemas, not DTOs. ai_provider/schemas.py and data_provider/providers_core/dtos.py serve the same architectural role — Pydantic types that act as the trust boundary between external systems and our domain. But they're named differently, and that's on purpose. In the LLM world, "schema" is the term of art for "the JSON-schema-like structure the model is constrained to" — LangChain's API uses with_structured_output(schema=...), not with_structured_output(dto=...). Renaming the file to dtos.py would create unnecessary friction with LangChain idioms. So the same architectural idea wears two hats depending on which app you're in. Annoying but defensible; consistency with LangChain idioms beats consistency across our own packages.

Stateless agents, but the SERP tool isn't free. Each call to discover_subreddits([...]) runs one agent invocation, which the LLM can internally fan out into one to five SERP search calls (each one a paid BrightData request) before producing its structured output. A single call is fine. A for loop calling discover_subreddits on 100 different topic lists is roughly 100–500 SERP requests — which adds up quickly on the BrightData bill if you forget to think about it. The agent is stateless, but external tool calls aren't, and tight loops over agent invocations are where surprise costs live.

Closing

ai_provider is data_provider's twin. Same enum-keyed factory. Same per-provider adapters. Same Pydantic-typed boundary between the messy outside world and our orderly Python domain. The differences are deliberate: the interface contract is borrowed from LangChain instead of defined locally (because wrapping would break LangGraph), and a small library of pre-baked agents sits on top of the raw LLM layer (because the agent-construction recipe is just enough boilerplate that hiding it behind a one-liner is worth the abstraction).

The trilogy of this project's *_provider layers — data_provider for scraping backends, ai_provider for LLMs, and (if you read it) the SSE post for the live-feed pipeline that consumes both — share one principle: when a layer of the system might need to swap out one underlying technology for another, wrap that swappability behind a factory and let the callers pretend the choice doesn't exist. It earns its keep when the underlying things multiply. If you only ever call one LLM, this whole package is dead weight — a factory function calling a single builder, a registry with one entry. The abstraction pattern is worth reaching for the moment a second provider shows up; before that, it's premature.

The analogy

The architecture, in one picture

The three providers

The three caller shapes

Why we don't wrap BaseChatModel

don'tIf we wrapped

doWhat we actually do

Pre-built agents and the ReAct loop

Two more gotchas

Closing

Why we don't wrap `BaseChatModel`