How to Build Type-Safe LLM Agents with Pydantic AI: A Step-by-Step Guide

Introduction

Large language models (LLMs) are powerful, but their free-form text responses can be unreliable in production. By combining Pydantic AI with type-safe structured outputs, validation retries, and dependency injection, you can build agents that are both robust and maintainable. This guide walks you through the entire process, from setting up your environment to deploying with production trade-offs in mind. You’ll learn how Pydantic AI enforces data contracts, how retries improve accuracy, how tools and function calling extend agent capabilities, and how RunContext handles external dependencies.

How to Build Type-Safe LLM Agents with Pydantic AI: A Step-by-Step Guide

What You Need

Python 3.9+ installed on your machine
An LLM API key (e.g., OpenAI, Anthropic, or Google)
Pydantic AI library (pip install pydantic-ai)
Pydantic (included with pydantic-ai)
Basic familiarity with Python and async concepts
A code editor or IDE

Step-by-Step Instructions

Step 1: Set Up Your Environment

Create a new Python virtual environment and install the required packages:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install pydantic-ai

Store your API key as an environment variable (e.g., OPENAI_API_KEY) to keep it secure.

Step 2: Define Pydantic Models for Structured Output

Type safety starts with defining the exact data shape you expect from the LLM. Create a Pydantic model that describes the fields and types you need.

from pydantic import BaseModel

class WeatherReport(BaseModel):
    city: str
    temperature: float
    unit: str  # e.g., "Celsius" or "Fahrenheit"
    conditions: str

This model ensures the LLM’s response will be parsed into a validated Python object. If the output doesn’t match the schema, Pydantic AI will raise a validation error.

Step 3: Configure the LLM Agent

Initialize a Pydantic AI agent with your chosen LLM model and attach your output schema. Use the result_type parameter to specify the model.

from pydantic_ai import Agent

agent = Agent(
    'openai:gpt-4',  # or any supported model
    result_type=WeatherReport,
    system_prompt="You are a helpful weather assistant."
)

Now every call to the agent will aim to return a WeatherReport instance. The agent automatically attempts to parse the LLM's response into your model.

Step 4: Implement Validation Retries

LLMs sometimes produce malformed or incomplete outputs. Pydantic AI lets you set a maximum number of retries when validation fails. This improves reliability without manual intervention.

agent = Agent(
    'openi:gpt-4',
    result_type=WeatherReport,
    retries=3,  # try up to 3 times
)

During a retry, the agent sends the previous error message back to the LLM, asking it to fix the output. This works well for simple format issues but increases latency and cost—so choose a number that balances speed and accuracy.

Step 5: Add Tools and Function Calling

Tools (or functions) allow the agent to interact with external APIs, databases, or perform computations. Define a tool using a regular Python function and register it with the agent.

from pydantic_ai import Tool

def get_weather_api(city: str) -> dict:
    # Replace with real API call
    return {"temperature": 22, "unit": "Celsius", "conditions": "sunny"}

tool_weather = Tool(
    name="get_weather",
    description="Fetch current weather for a city",
    function=get_weather_api
)
agent = Agent(
    'openai:gpt-4',
    result_type=WeatherReport,
    tools=[tool_weather],
    retries=2
)

The agent will now decide when to call the tool, parse the output, and incorporate it into the final structured result. This pattern is essential for agents that need live data or side effects.

Step 6: Use Dependency Injection with RunContext

For production applications, you often need to pass in external dependencies (like a database session, user session, or configuration). Pydantic AI provides RunContext to inject such objects into your agent’s tools and prompts.

from pydantic_ai import RunContext
from dataclasses import dataclass

@dataclass
class MyDeps:
    api_key: str
    user_id: str

async def get_weather_api(ctx: RunContext[MyDeps], city: str) -> dict:
    # Use ctx.deps.api_key for authentication
    return await fetch_weather(city, api_key=ctx.deps.api_key)

agent = Agent(
    'openai:gpt-4',
    result_type=WeatherReport,
    deps_type=MyDeps,
    tools=[Tool(get_weather_api)]
)

When running the agent, provide the dependencies:

deps = MyDeps(api_key="secret123", user_id="user42")
result = await agent.run("What is the weather in Paris?", deps=deps)

This keeps your code clean and testable: you can swap dependencies in tests or different environments.

Step 7: Handle Production Trade‑Offs

Running type-safe LLM agents at scale introduces several trade-offs to consider:

Cost and latency: Retries and tool calls multiply token usage. Set retry limits carefully and consider caching results for repeated queries.
Rate limits: LLM APIs throttle requests. Implement exponential backoff or queueing to handle high concurrency.
Error recovery: When retries are exhausted, fall back to a default or ask the user for clarification instead of failing silently.
Monitoring: Log validation failures and tool call metrics to detect drift or broken dependencies early.
Type safety overhead: Complex nested models increase parsing time. Keep schemas as simple as possible for high‑throughput scenarios.

Balancing these factors is key to a reliable, cost‑effective agent.

Tips for Success

Start simple: Begin with a single output model and no retries. Gradually add complexity as you validate each part.
Test with diverse prompts: Try edge cases (empty or very long input) to see how well your validation retries handle them.
Use RunContext for everything external: It makes your agent testable and decouples logic from infrastructure.
Monitor your retry rate: A high frequency of validation failures may indicate your model is too strict or the LLM prompt needs improvement.
Consider streaming: For real‑time applications, use Pydantic AI’s streaming support to show partial progress while maintaining type safety.