Newsletter: Things to know before building an AI agent

## Summary

Provide answers for someone in this scenario:

> I am building an AI agent into my product for the first time. I am an engineer. What are the things I need to know before starting? 

First post in our [AI agent series](https://github.com/PostHog/posthog.com/issues/14377).

## Headline options

- Things to know before building an AI agent
- Things to know before building AI agents
- Things we wish we knew before building an AI agent
- What we wished we knew before building AI agents

## What (if any) keywords are we targeting?

AI agents?

## Outline (optional)

- Do you need an AI agent?
    - Just because everyone is doing it doesn’t mean you should - Your mom
    - “When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed.” - https://www.anthropic.com/engineering/building-effective-agents
    - An agent is overkill for many use cases. Before building an agent, consider simpler alternatives:
    
    | Use Case | Solution |
    | --- | --- |
    | One-shot Q&A | Simple LLM call with good prompting |
    | Single task (SQL generation, code completion) | Specialized model + structured output |
    | Multi-step but predictable flow | Hardcoded workflow with LLM steps |
    | Complex, dynamic, multi-tool reasoning | Agent |
    - PostHog AI early was "Currently, Max's usage is mostly one-shot, with users asking data-related questions that lead to insights generation (Max's primary feature)."
    - Build an agent when:
        - The task requires dynamic tool selection based on context
        - Multi-step reasoning where each step informs the next
        - The user's intent is ambiguous and needs clarification
        - Cross-product/cross-system correlation is needed
    - Don't build an agent when:
        - The workflow is predictable and linear
        - You can template the solution with variables
        - Latency is critical (agents are slow)
- Where to put your agent, agent placement matters
    - You can’t just bolt on a chat panel with an LLM and some context and call it a day
        - https://www.news.aakashg.com/p/zach-lloyd-podcast?utm_source=post-email-title&publication_id=454003&post_id=174424783&utm_campaign=email-post-title&isFreemail=false&r=fyazu&triedRedirect=true
    - Agents are like adding a new UI to your product. This means they will split your attention if you have multiple.
        - Do you have capacity to do this? Maybe you need to go all-in and be agent first.
    - "SQL editor integration was the theme... almost all the interviewees described that as the most valuable AI feature."
    - Embed where users already work. Map friction points.
        - SQL editor → SQL generation (highest value for Max)
        - Session replay → filter editing
        - Don't force users to context-switch
    - Use page context to pre-configure
        - From engineering/2025-09-21-ai-platform.md:217-219:
        "Products can also be enabled by default depending on the page the user is currently working on."
    - Although an obvious spot for agents to start is a chat, they don’t end there. Agents fragment into inline actions, background jobs, suggestions, autocomplete, and more.
    - Form factos
        
        
        | Form Factor | Best For | Drawbacks |
        | --- | --- | --- |
        | Sidebar chat | Quick questions, multi-turn | Limited result display |
        | Modal/overlay | One-shot tasks | Interrupts workflow |
        | Inline suggestions | Code/query completion | Narrow scope |
        | Background + notification | Deep research, reports | User loses connection |
    - Scope of autonomy. Send emails, make purchases, modify files? Start conservative.
- Your context is your advantage
    - "the more we constrain the domain, the better the results" - [Jonathan Mieloo](https://posthog.slack.com/archives/C09G8Q32R6F/p1767894600056169?thread_ts=1767886893.482399&cid=C09G8Q32R6F)
    - PostHog's advantage comes from having "competences and expertise on agents that other companies don't have" because they're "building agents with production data at scale” - [Em](https://posthog.slack.com/archives/C06NZEZ7V3Q/p1762341683441919)
    - "The agent has very little understanding what is the customer's product. Even simple questions with a deep and complex context behind the question aren't possible to answer without retrieving more information."
    - You need:
        - Access to your product's taxonomy/schema
        - Have a schema for your data, validation
        - User context (who they are, what they've done)
        - Domain knowledge about your product
        - Idempotent endpoints
        - Dry-run / preview mode
    - Different types of context
        - Conversation preservation → Don't lose context on reload
        - Long-term memory → Facts learned from past interactions
        - RAG → Search existing data/insights
- Pick the right architecture
    - Because context is so important you need to make sure your architecture supports it
        - context loss between agents, the "black box problem”
    - You’re probably not going to win because you’ve created some genius new architecture for agents. Don’t use innovation points here.
    - You don’t need to invent this all yourself.
        - https://ampcode.com/how-to-build-an-agent
    - Single loop rather than multi loop. A single loop beats subagents. Context is everything.
    - To-dos are super powerful
    - "The core insight is that by adding todo-list tooling (a tool_write tool to write a todo list, a todo_read tool to check the current list), the agent is better suited to keep track of its work across long-running tasks."
    - LLMs are glue
    - Modular, extensible architecture with separate modes rather than monolithic systems.
        - https://github.com/PostHog/posthog/pull/40539 “Refactors the core AI agent into a modular graph and separate modes, making it simpler for engineers to build and extend various AI features like Deep Research or Product Analytics.”
        - Plan for specialized modes to preserve context and optimize performance within specific execution loops
        - “Tool expansion”
- Design tools well and beware tool explosion (could combine with architecture)
    - Group tools for jobs to be done
        - "The problem with this approach is that teams have to declare one tool at a time, but what they would rather need to do is declare a batch of tools that represent how their product solves the customer's job to be done."
    - Tools should have clear input/output
        - "We need to build tools that can interconnect in the right order, e.g. session summaries will require a cohort of users / filters as an input, which can be derived from a generated insight."
        - Use scorers like HogQL Completeness: 0 or 1 if HogQL compiles and SQL Completeness: 0 or 1 if SQL doesn't throw runtime exception
    - Keep tool count managable
        - "There isn't much literature about how well language models perform with 100+ tools, but we can reasonably assume performance could degrade. Every single request includes instructions for all tools, even completely irrelevant ones."
        - Dynamic tool loading based on context/intent.
- Observability and evaluation from day one
    - Evals and monitoring are critical to agent quality - [Georgiy](https://posthog.slack.com/archives/C06NZEZ7V3Q/p1764664482215589?thread_ts=1764661894.703289&cid=C06NZEZ7V3Q)
    - "Max is still not great when we assess its quality. There are numerous hallucinations, incorrect agent trajectories, incorrect contexts... we cannot quickly react to internal experiments."
    - "Teams currently lack the ability to systematically evaluate their AI system quality, detect regressions, and understand performance patterns."
    - You need:
        - Tracing for every LLM call (inputs, outputs, latency, cost)
        - Trace IDs that span the full conversation
        - Ability to replay/debug specific interactions
        - Curated datasets of real user queries
        - Automated scorers (LLM-as-judge, deterministic checks)
        - Baseline metrics before you ship
- Don’t lose sight of real user pain points (could combine with evaluation for section on “trust”)
    - Can you create an agent good enough to trust? Repeated failures kill trust. Overconfidence is worse than ignorance.
        - Show uncertainty, sources, steps.
        - "Show every step... A black box is uninspiring no matter how good the results are."
    - "The most common complaint was inconsistent performance and unexpected failures... 'Max seems like it suddenly got stupider'... Generic error messages without clear explanations."
    - "Users weren't sure what Max could and couldn't do, leading to failed attempts at complex queries: 'Not clear what data and tools does Max have exactly?'"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newsletter: Things to know before building an AI agent #14476

Summary

Headline options

What (if any) keywords are we targeting?

Outline (optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use Case	Solution
One-shot Q&A	Simple LLM call with good prompting
Single task (SQL generation, code completion)	Specialized model + structured output
Multi-step but predictable flow	Hardcoded workflow with LLM steps
Complex, dynamic, multi-tool reasoning	Agent

Form Factor	Best For	Drawbacks
Sidebar chat	Quick questions, multi-turn	Limited result display
Modal/overlay	One-shot tasks	Interrupts workflow
Inline suggestions	Code/query completion	Narrow scope
Background + notification	Deep research, reports	User loses connection

Newsletter: Things to know before building an AI agent #14476

Description

Summary

Headline options

What (if any) keywords are we targeting?

Outline (optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions