-
Notifications
You must be signed in to change notification settings - Fork 731
Description
Summary
Provide answers for someone in this scenario:
I am building an AI agent into my product for the first time. I am an engineer. What are the things I need to know before starting?
First post in our AI agent series.
Headline options
- Things to know before building an AI agent
- Things to know before building AI agents
- Things we wish we knew before building an AI agent
- What we wished we knew before building AI agents
What (if any) keywords are we targeting?
AI agents?
Outline (optional)
-
Do you need an AI agent?
- Just because everyone is doing it doesn’t mean you should - Your mom
- “When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed.” - https://www.anthropic.com/engineering/building-effective-agents
- An agent is overkill for many use cases. Before building an agent, consider simpler alternatives:
Use Case Solution One-shot Q&A Simple LLM call with good prompting Single task (SQL generation, code completion) Specialized model + structured output Multi-step but predictable flow Hardcoded workflow with LLM steps Complex, dynamic, multi-tool reasoning Agent - PostHog AI early was "Currently, Max's usage is mostly one-shot, with users asking data-related questions that lead to insights generation (Max's primary feature)."
- Build an agent when:
- The task requires dynamic tool selection based on context
- Multi-step reasoning where each step informs the next
- The user's intent is ambiguous and needs clarification
- Cross-product/cross-system correlation is needed
- Don't build an agent when:
- The workflow is predictable and linear
- You can template the solution with variables
- Latency is critical (agents are slow)
-
Where to put your agent, agent placement matters
-
You can’t just bolt on a chat panel with an LLM and some context and call it a day
-
Agents are like adding a new UI to your product. This means they will split your attention if you have multiple.
- Do you have capacity to do this? Maybe you need to go all-in and be agent first.
-
"SQL editor integration was the theme... almost all the interviewees described that as the most valuable AI feature."
-
Embed where users already work. Map friction points.
- SQL editor → SQL generation (highest value for Max)
- Session replay → filter editing
- Don't force users to context-switch
-
Use page context to pre-configure
- From engineering/2025-09-21-ai-platform.md:217-219:
"Products can also be enabled by default depending on the page the user is currently working on."
- From engineering/2025-09-21-ai-platform.md:217-219:
-
Although an obvious spot for agents to start is a chat, they don’t end there. Agents fragment into inline actions, background jobs, suggestions, autocomplete, and more.
-
Form factos
Form Factor Best For Drawbacks Sidebar chat Quick questions, multi-turn Limited result display Modal/overlay One-shot tasks Interrupts workflow Inline suggestions Code/query completion Narrow scope Background + notification Deep research, reports User loses connection -
Scope of autonomy. Send emails, make purchases, modify files? Start conservative.
-
-
Your context is your advantage
- "the more we constrain the domain, the better the results" - Jonathan Mieloo
- PostHog's advantage comes from having "competences and expertise on agents that other companies don't have" because they're "building agents with production data at scale” - Em
- "The agent has very little understanding what is the customer's product. Even simple questions with a deep and complex context behind the question aren't possible to answer without retrieving more information."
- You need:
- Access to your product's taxonomy/schema
- Have a schema for your data, validation
- User context (who they are, what they've done)
- Domain knowledge about your product
- Idempotent endpoints
- Dry-run / preview mode
- Different types of context
- Conversation preservation → Don't lose context on reload
- Long-term memory → Facts learned from past interactions
- RAG → Search existing data/insights
-
Pick the right architecture
- Because context is so important you need to make sure your architecture supports it
- context loss between agents, the "black box problem”
- You’re probably not going to win because you’ve created some genius new architecture for agents. Don’t use innovation points here.
- You don’t need to invent this all yourself.
- Single loop rather than multi loop. A single loop beats subagents. Context is everything.
- To-dos are super powerful
- "The core insight is that by adding todo-list tooling (a tool_write tool to write a todo list, a todo_read tool to check the current list), the agent is better suited to keep track of its work across long-running tasks."
- LLMs are glue
- Modular, extensible architecture with separate modes rather than monolithic systems.
- feat(max): agent abstraction posthog#40539 “Refactors the core AI agent into a modular graph and separate modes, making it simpler for engineers to build and extend various AI features like Deep Research or Product Analytics.”
- Plan for specialized modes to preserve context and optimize performance within specific execution loops
- “Tool expansion”
- Because context is so important you need to make sure your architecture supports it
-
Design tools well and beware tool explosion (could combine with architecture)
- Group tools for jobs to be done
- "The problem with this approach is that teams have to declare one tool at a time, but what they would rather need to do is declare a batch of tools that represent how their product solves the customer's job to be done."
- Tools should have clear input/output
- "We need to build tools that can interconnect in the right order, e.g. session summaries will require a cohort of users / filters as an input, which can be derived from a generated insight."
- Use scorers like HogQL Completeness: 0 or 1 if HogQL compiles and SQL Completeness: 0 or 1 if SQL doesn't throw runtime exception
- Keep tool count managable
- "There isn't much literature about how well language models perform with 100+ tools, but we can reasonably assume performance could degrade. Every single request includes instructions for all tools, even completely irrelevant ones."
- Dynamic tool loading based on context/intent.
- Group tools for jobs to be done
-
Observability and evaluation from day one
- Evals and monitoring are critical to agent quality - Georgiy
- "Max is still not great when we assess its quality. There are numerous hallucinations, incorrect agent trajectories, incorrect contexts... we cannot quickly react to internal experiments."
- "Teams currently lack the ability to systematically evaluate their AI system quality, detect regressions, and understand performance patterns."
- You need:
- Tracing for every LLM call (inputs, outputs, latency, cost)
- Trace IDs that span the full conversation
- Ability to replay/debug specific interactions
- Curated datasets of real user queries
- Automated scorers (LLM-as-judge, deterministic checks)
- Baseline metrics before you ship
-
Don’t lose sight of real user pain points (could combine with evaluation for section on “trust”)
- Can you create an agent good enough to trust? Repeated failures kill trust. Overconfidence is worse than ignorance.
- Show uncertainty, sources, steps.
- "Show every step... A black box is uninspiring no matter how good the results are."
- "The most common complaint was inconsistent performance and unexpected failures... 'Max seems like it suddenly got stupider'... Generic error messages without clear explanations."
- "Users weren't sure what Max could and couldn't do, leading to failed attempts at complex queries: 'Not clear what data and tools does Max have exactly?'"
- Can you create an agent good enough to trust? Repeated failures kill trust. Overconfidence is worse than ignorance.