Building AI Agents with Python: A Complete Guide for 2025

Python has always been the language of AI — and in 2025, it’s the undisputed home of the AI agent revolution. From LangGraph and CrewAI to the OpenAI Agents SDK, the Python ecosystem now offers a rich, mature set of tools for building autonomous AI agents that can reason, use tools, maintain memory, and collaborate with other agents to complete complex tasks.

Whether you’re a beginner writing your first agent or an experienced engineer designing production-grade multi-agent pipelines, this guide covers everything you need to know — with real, working code examples at every step.

What Is an AI Agent?

An AI agent is an autonomous program that perceives its environment, reasons about a goal, selects and uses tools, and takes action — often in a continuous loop until the task is complete. Unlike a simple chatbot that responds to a single prompt, an agent can:

Break complex goals into multi-step plans
Call external tools (web search, APIs, databases, code execution)
Maintain memory across multiple conversation turns
Delegate sub-tasks to other specialized agents
Self-correct based on observation and feedback

The ReAct Pattern: The Brain of Every Agent

The dominant design pattern for AI agents is ReAct (Reasoning + Acting). In a ReAct loop, the agent alternates between:

Thought — The LLM reasons about what to do next
Action — The agent calls a tool or takes an action
Observation — The agent receives the tool’s output and feeds it back into context

This loop continues until the agent produces a final answer or reaches a turn limit. The ReAct pattern makes agent decision-making transparent, debuggable, and highly effective for complex, multi-step tasks.

Thought: I need to find the current price of Bitcoin.
Action: web_search("Bitcoin price today")
Observation: Bitcoin is currently trading at $67,450.
Thought: I now have the answer.
Final Answer: Bitcoin is currently trading at approximately $67,450.

The Python AI Agent Ecosystem in 2025

The Python AI agent landscape has grown dramatically. Here are the key frameworks every developer should know:

1. LangChain + LangGraph

LangChain is the most widely adopted AI application framework, providing integrations with 1,000+ tools, models, and data sources. For production agents in 2025, the LangChain team recommends using LangGraph — a low-level orchestration layer that models agent logic as a graph of nodes and edges. LangGraph is trusted by companies including Klarna, Replit, and Elastic, and excels at:

Stateful, long-running agent workflows
Branching and conditional execution flows
Human-in-the-loop checkpoints
Durable execution (agents resume after failures)
Multi-agent coordination

2. OpenAI Agents SDK

OpenAI’s official Python agent framework is a lightweight, production-ready SDK with very few abstractions. It focuses on Agents, Tools, Guardrails, and Handoffs. It’s provider-agnostic (supporting 100+ LLMs beyond just OpenAI), includes built-in tracing, sessions, and voice agent support, and is ideal for teams already on the OpenAI stack who want to build fast.

3. CrewAI

CrewAI is a standalone Python framework (no LangChain dependency) built for role-based multi-agent collaboration. With over 100,000 certified developers in its community, CrewAI uses a crew metaphor: you define agents with specific roles, goals, and backstories, then assign them tasks. It supports both Crews (autonomous collaboration) and Flows (event-driven, production workflows). CrewAI supports any LLM provider via LiteLLM integration.

4. Google ADK for Python

Google’s Agent Development Kit (ADK) for Python is a code-first toolkit with tight integration to Gemini models, Vertex AI, and Google Cloud. It features the Agent2Agent (A2A) protocol for secure inter-agent communication, a built-in dev UI, and out-of-the-box support for 30+ databases via MCP Toolbox.

5. Other Notable Frameworks

AutoGen / AG2 — Microsoft’s event-driven, async multi-agent framework for conversation-based coordination
Agno — Extreme performance (microsecond agent instantiation, 50× less memory than LangGraph), great for high-throughput scenarios
Pydantic AI — Brings Pydantic’s type safety and validation to agent development; ideal for enterprise apps requiring strict output validation
Smolagents — HuggingFace’s lightweight, code-first agent framework for multimodal tasks

Setting Up Your Python Agent Environment

Before writing your first agent, let’s set up a clean Python environment. All major frameworks require Python 3.10 or higher.

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venvScriptsactivate

# Install the core frameworks
pip install openai-agents langchain langgraph crewai

# Set your API keys
export OPENAI_API_KEY=your_openai_key_here
export ANTHROPIC_API_KEY=your_anthropic_key_here  # optional

Building Your First Agent with OpenAI Agents SDK

The OpenAI Agents SDK is the fastest way to build a working agent. Here’s a minimal hello-world agent:

from agents import Agent, Runner

# Define your agent
agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant that answers questions clearly and concisely."
)

# Run it synchronously
result = Runner.run_sync(agent, "What are the top 3 benefits of using Python for AI?")
print(result.final_output)

Adding Tools to Your Agent

Tools are what transform a simple chatbot into a true agent. The @function_tool decorator wraps any Python function into a tool the agent can discover and call autonomously:

import asyncio
from agents import Agent, Runner, function_tool

@function_tool
def get_weather(city: str) -> str:
    """Returns the current weather for a given city."""
    # In production, call a real weather API here
    weather_data = {
        "London": "15°C, partly cloudy",
        "New York": "22°C, sunny",
        "Tokyo": "28°C, humid",
    }
    return weather_data.get(city, f"Weather data unavailable for {city}")

@function_tool
def calculate(expression: str) -> str:
    """Evaluates a mathematical expression and returns the result."""
    try:
        result = eval(expression, {"__builtins__": {}})
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

# Agent with multiple tools
weather_agent = Agent(
    name="WeatherBot",
    instructions="You are a helpful assistant. Use the get_weather tool to answer weather questions and calculate for math.",
    tools=[get_weather, calculate],
)

async def main():
    result = await Runner.run(
        weather_agent,
        "What's the weather in London? Also, what is 847 * 23?"
    )
    print(result.final_output)

asyncio.run(main())

Multi-Turn Conversations with Sessions

For conversational agents that need to remember context across multiple turns, use sessions:

import asyncio
from agents import Agent, Runner

agent = Agent(
    name="Tutor",
    instructions="You are a patient Python tutor. Remember what topics we've covered."
)

async def chat():
    # First turn
    result = await Runner.run(agent, "Explain what a list comprehension is.")
    print("Agent:", result.final_output)

    # Second turn - pass conversation history to maintain context
    result = await Runner.run(
        agent,
        "Can you give me a harder example?",
        input=result.to_input_list()  # Pass previous conversation
    )
    print("Agent:", result.final_output)

asyncio.run(chat())

Building a Stateful Agent with LangGraph

For production agents that need explicit state management, branching logic, and persistent memory, LangGraph is the recommended approach. LangGraph models your agent as a graph where nodes perform actions and edges control flow.

Core LangGraph Concepts

State — A TypedDict that flows through the graph, accumulating messages, tool outputs, and context
Nodes — Individual functions that receive state and return updates (call an LLM, execute a tool, etc.)
Edges — Connections between nodes; can be fixed or conditional based on state
Checkpointer — Persists state for memory across sessions and enables human-in-the-loop pauses

from typing import TypedDict, Annotated
from langchain.chat_models import init_chat_model
from langchain_core.tools import tool
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver

# --- 1. Define State ---
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

# --- 2. Define Tools ---
@tool
def search_web(query: str) -> str:
    """Search the web for current information on a topic."""
    # Replace with a real search API like Tavily or SerpAPI
    return f"Search results for '{query}': Python AI agents are becoming widely adopted in 2025."

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a ticker symbol."""
    # Replace with a real stock API
    prices = {"AAPL": "$189.50", "GOOGL": "$175.20", "MSFT": "$415.80"}
    return prices.get(ticker.upper(), f"Price unavailable for {ticker}")

tools = [search_web, get_stock_price]

# --- 3. Set Up LLM ---
llm = init_chat_model("openai:gpt-4o-mini")
llm_with_tools = llm.bind_tools(tools)

# --- 4. Define Nodes ---
def call_model(state: AgentState):
    """Node: calls the LLM with current messages."""
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState):
    """Edge: routes to tools if tool calls exist, otherwise ends."""
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END

# --- 5. Build the Graph ---
graph = StateGraph(AgentState)

graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode(tools))

graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")  # After tools run, go back to agent

# --- 6. Add Persistent Memory ---
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# --- 7. Run the Agent ---
config = {"configurable": {"thread_id": "session-001"}}

result = app.invoke(
    {"messages": [{"role": "user", "content": "What's the stock price of Apple?"}]},
    config=config
)

print(result["messages"][-1].content)

# Second turn — agent remembers previous context
result = app.invoke(
    {"messages": [{"role": "user", "content": "And what about Microsoft?"}]},
    config=config
)
print(result["messages"][-1].content)

Because we’re using MemorySaver with a thread_id, the agent maintains full conversation history across both turns. For production use, swap MemorySaver for SqliteSaver or PostgresSaver to persist state across restarts.

Building a Multi-Agent System with CrewAI

CrewAI shines when you need multiple specialized agents collaborating on a complex task. Think of it like assembling a team: a researcher, a writer, and an editor each doing what they do best.

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool

# Initialize web search tool
search_tool = SerperDevTool()

# --- Define Agents ---
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find and synthesize the most relevant and up-to-date information on {topic}",
    backstory="""You are an expert researcher with 10 years of experience. You are known
    for your ability to find accurate, credible sources and distill complex information
    into clear, actionable insights.""",
    tools=[search_tool],
    verbose=True,
    llm="openai/gpt-4o-mini"
)

writer = Agent(
    role="Content Writer",
    goal="Transform research findings into a compelling, reader-friendly article on {topic}",
    backstory="""You are a skilled technical writer who specializes in AI and technology.
    You excel at turning complex research into engaging content that educates and informs.""",
    verbose=True,
    llm="openai/gpt-4o"  # Use a more powerful model for writing
)

# --- Define Tasks ---
research_task = Task(
    description="""Conduct comprehensive research on {topic}.
    Find the latest developments, key players, tools, and trends.
    Provide a structured summary with sources.""",
    expected_output="A detailed research report with key findings, trends, and at least 5 credible sources.",
    agent=researcher
)

writing_task = Task(
    description="""Using the research provided, write a 1000-word article about {topic}.
    The article should be engaging, well-structured, and accessible to a technical audience.
    Include an introduction, key sections, and a conclusion.""",
    expected_output="A well-written, 1000-word article ready for publication.",
    agent=writer,
    context=[research_task]  # Writer gets researcher's output as context
)

# --- Assemble and Run the Crew ---
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,  # Tasks run one after another
    verbose=True
)

result = crew.kickoff(inputs={"topic": "AI agents in enterprise software 2025"})
print(result.raw)

Running Agents in Parallel

When tasks are independent of each other, use Process.hierarchical or parallel task execution to speed up your crew:

from crewai import Crew, Process

# Parallel execution for independent tasks
crew = Crew(
    agents=[researcher_1, researcher_2, researcher_3],
    tasks=[task_1, task_2, task_3],
    process=Process.parallel,  # All three research tasks run concurrently
    verbose=True
)

result = crew.kickoff(inputs={"topic": "Python AI frameworks"})
print(result.raw)

Memory and RAG for AI Agents

Memory is what makes agents useful over time. There are four main types of agent memory in Python frameworks:

In-context memory — The current conversation history held in the LLM’s context window. Fast but limited by the context size.
Short-term (buffer) memory — A sliding window of recent messages. Good for most conversational agents.
Long-term (vector) memory — Past interactions embedded and stored in a vector database. Retrieved semantically when needed.
RAG (Retrieval-Augmented Generation) — Connects the agent to a knowledge base of documents, PDFs, or database records.

Adding Long-Term Vector Memory with LangGraph

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# Build a simple knowledge base
docs = [
    Document(page_content="Python was created by Guido van Rossum in 1991."),
    Document(page_content="LangGraph is an agent orchestration framework by LangChain."),
    Document(page_content="CrewAI is a multi-agent framework that uses a crew metaphor."),
]

# Create a vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# Use as a tool in your agent
from langchain_core.tools import create_retriever_tool

knowledge_tool = create_retriever_tool(
    retriever,
    name="search_knowledge_base",
    description="Search the internal knowledge base for information about Python and AI frameworks."
)

# Add to your LangGraph agent's tool list
tools = [knowledge_tool, search_web]

Agent Handoffs: Building Multi-Agent Pipelines

One of the most powerful patterns in 2025 is agent handoffs — where one agent transfers control to another specialist. The OpenAI Agents SDK makes this trivial:

from agents import Agent, Runner

# Specialized agents
billing_agent = Agent(
    name="Billing Specialist",
    instructions="You handle billing inquiries, refunds, and payment issues. Be concise and helpful.",
)

technical_agent = Agent(
    name="Technical Support",
    instructions="You handle technical issues, bugs, and feature questions. Provide clear troubleshooting steps.",
)

# Triage agent that routes to the right specialist
triage_agent = Agent(
    name="Support Triage",
    instructions="""You are a customer support triage agent. Analyze the customer's issue and
    hand off to the appropriate specialist:
    - Billing Specialist: for payment, invoice, subscription issues
    - Technical Support: for bugs, errors, feature questions""",
    handoffs=[billing_agent, technical_agent],  # Registered specialists
)

# The triage agent will automatically route to the right specialist
result = Runner.run_sync(
    triage_agent,
    "I was charged twice for my subscription this month!"
)
print(result.final_output)
# Output will come from the Billing Specialist agent

Adding Guardrails and Safety Checks

Production agents need guardrails — validation layers that check inputs and outputs before they reach users. The OpenAI Agents SDK provides a clean guardrails API:

from agents import Agent, Runner, GuardrailFunctionOutput, input_guardrail
from pydantic import BaseModel

class SafetyCheck(BaseModel):
    is_safe: bool
    reason: str

@input_guardrail
async def content_safety_guardrail(ctx, agent, input):
    """Block inappropriate or harmful requests before they reach the main agent."""
    safety_agent = Agent(
        name="Safety Checker",
        instructions="Determine if the user's request is safe and appropriate. Respond in JSON.",
        output_type=SafetyCheck,
    )
    result = await Runner.run(safety_agent, input, context=ctx.context)
    check = result.final_output

    return GuardrailFunctionOutput(
        output_info=check,
        tripwire_triggered=not check.is_safe,  # Block if unsafe
    )

# Apply guardrail to your main agent
safe_agent = Agent(
    name="Safe Assistant",
    instructions="You are a helpful, safe assistant.",
    input_guardrails=[content_safety_guardrail],
)

result = Runner.run_sync(safe_agent, "Help me write a professional email.")
print(result.final_output)

Streaming Agent Responses

Users expect to see responses appear token by token, not wait for a complete response. Both LangGraph and the OpenAI SDK support streaming out of the box:

import asyncio
from agents import Agent, Runner

agent = Agent(
    name="Streaming Agent",
    instructions="You are a helpful assistant. Provide detailed, thoughtful answers."
)

async def stream_response():
    async with Runner.run_streamed(agent, "Explain the concept of AI agents in detail.") as stream:
        async for event in stream.stream_events():
            # Print each token as it arrives
            if hasattr(event, 'delta') and event.delta:
                print(event.delta, end="", flush=True)
    print()  # New line at end

asyncio.run(stream_response())

Production Best Practices

1. Always Set a Max Turn Limit

Without a turn limit, a confused agent can loop indefinitely, burning API tokens and budget. Set a max_turns parameter in every agent runner call — a sensible default is 10–20 turns for most tasks.

result = await Runner.run(agent, user_input, max_turns=15)

2. Use Async Everywhere

AI agents spend most of their time waiting for API responses. Using Python’s asyncio allows your application to handle many concurrent agent runs efficiently without blocking.

import asyncio

async def run_agents_concurrently(queries: list[str]):
    """Run multiple agent queries in parallel."""
    tasks = [Runner.run(agent, query) for query in queries]
    results = await asyncio.gather(*tasks)
    return [r.final_output for r in results]

# Process 10 queries concurrently instead of sequentially
answers = asyncio.run(run_agents_concurrently(user_queries))

3. Observability with LangSmith

Every tool call, LLM request, and agent handoff should be traceable. LangSmith provides deep visibility into your agent’s behavior, making it easy to debug issues and optimize performance. Enable it with just two environment variables:

export LANGSMITH_API_KEY=your_langsmith_key
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=my-agent-project

# All LangChain/LangGraph calls are now automatically traced

4. Use Structured Outputs for Reliability

Define your agent’s expected output as a Pydantic model to get guaranteed structured responses — no more parsing inconsistent LLM text outputs:

from pydantic import BaseModel
from agents import Agent, Runner

class ProductReview(BaseModel):
    sentiment: str          # "positive", "negative", "neutral"
    score: int              # 1-10
    summary: str            # Brief summary
    key_points: list[str]   # Main points from the review

analysis_agent = Agent(
    name="Review Analyzer",
    instructions="Analyze product reviews and extract structured insights.",
    output_type=ProductReview,  # Agent MUST return this structure
)

result = Runner.run_sync(
    analysis_agent,
    "I absolutely love this laptop! The battery lasts all day and the keyboard is fantastic. The only downside is it runs a bit warm."
)

review: ProductReview = result.final_output
print(f"Sentiment: {review.sentiment}")   # positive
print(f"Score: {review.score}/10")        # 8
print(f"Summary: {review.summary}")

5. Error Handling and Retry Logic

LLM APIs can fail. Always wrap agent calls in proper error handling with exponential backoff for transient failures:

import asyncio
from agents import Agent, Runner
from openai import RateLimitError, APIError

async def run_with_retry(agent, query, max_retries=3):
    """Run an agent with automatic retry on transient failures."""
    for attempt in range(max_retries):
        try:
            result = await Runner.run(agent, query, max_turns=10)
            return result.final_output
        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f"Rate limit hit. Retrying in {wait_time}s...")
            await asyncio.sleep(wait_time)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            print(f"API error: {e}. Retrying...")
            await asyncio.sleep(1)
    raise Exception(f"Agent failed after {max_retries} attempts")

Choosing the Right Framework

With so many great options, here’s a quick guide to picking the right Python agent framework for your use case:

OpenAI Agents SDK — Best for rapid prototyping and production on the OpenAI stack. Minimal abstraction, clean API, excellent tracing. Use when you want speed and simplicity.
LangGraph — Best for complex, stateful agents requiring branching logic, human-in-the-loop, and durable execution. The recommended choice for production agents with nuanced workflows.
CrewAI — Best for multi-agent collaboration with distinct roles. Intuitive for business use cases: research + write + review pipelines, customer support routing, data analysis teams.
Google ADK — Best when your stack is Google Cloud. Tight Gemini/Vertex AI integration and the A2A protocol make it powerful for enterprise Google deployments.
Pydantic AI — Best when type safety and output validation are paramount. Ideal for enterprise applications that cannot tolerate unpredictable LLM outputs.
Agno — Best for high-throughput applications that need to run thousands of agents simultaneously with minimal overhead.

Conclusion

2025 is the breakout year for Python AI agents. The ecosystem has matured from experimental LangChain demos to production-grade, enterprise-ready frameworks trusted by companies at scale. Whether you’re building a simple customer support bot with tool use or a sophisticated multi-agent research pipeline with memory, RAG, and human oversight, Python gives you everything you need.

Start simple: build a single agent with one or two tools using the OpenAI Agents SDK. Once you understand the ReAct loop and tool calling, level up to LangGraph for state management or CrewAI for multi-agent collaboration. Add memory and observability from day one, and you’ll be production-ready before you know it.

The agent era is here — and Python is leading the charge.