Skip to content

Newsletter: Things to know before building an AI agent #14476

@ivanagas

Description

@ivanagas

Summary

Provide answers for someone in this scenario:

I am building an AI agent into my product for the first time. I am an engineer. What are the things I need to know before starting?

First post in our AI agent series.

Headline options

  • Things to know before building an AI agent
  • Things to know before building AI agents
  • Things we wish we knew before building an AI agent
  • What we wished we knew before building AI agents

What (if any) keywords are we targeting?

AI agents?

Outline (optional)

  • Do you need an AI agent?

    • Just because everyone is doing it doesn’t mean you should - Your mom
    • “When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed.” - https://www.anthropic.com/engineering/building-effective-agents
    • An agent is overkill for many use cases. Before building an agent, consider simpler alternatives:
    Use Case Solution
    One-shot Q&A Simple LLM call with good prompting
    Single task (SQL generation, code completion) Specialized model + structured output
    Multi-step but predictable flow Hardcoded workflow with LLM steps
    Complex, dynamic, multi-tool reasoning Agent
    • PostHog AI early was "Currently, Max's usage is mostly one-shot, with users asking data-related questions that lead to insights generation (Max's primary feature)."
    • Build an agent when:
      • The task requires dynamic tool selection based on context
      • Multi-step reasoning where each step informs the next
      • The user's intent is ambiguous and needs clarification
      • Cross-product/cross-system correlation is needed
    • Don't build an agent when:
      • The workflow is predictable and linear
      • You can template the solution with variables
      • Latency is critical (agents are slow)
  • Where to put your agent, agent placement matters

    • You can’t just bolt on a chat panel with an LLM and some context and call it a day

    • Agents are like adding a new UI to your product. This means they will split your attention if you have multiple.

      • Do you have capacity to do this? Maybe you need to go all-in and be agent first.
    • "SQL editor integration was the theme... almost all the interviewees described that as the most valuable AI feature."

    • Embed where users already work. Map friction points.

      • SQL editor → SQL generation (highest value for Max)
      • Session replay → filter editing
      • Don't force users to context-switch
    • Use page context to pre-configure

      • From engineering/2025-09-21-ai-platform.md:217-219:
        "Products can also be enabled by default depending on the page the user is currently working on."
    • Although an obvious spot for agents to start is a chat, they don’t end there. Agents fragment into inline actions, background jobs, suggestions, autocomplete, and more.

    • Form factos

      Form Factor Best For Drawbacks
      Sidebar chat Quick questions, multi-turn Limited result display
      Modal/overlay One-shot tasks Interrupts workflow
      Inline suggestions Code/query completion Narrow scope
      Background + notification Deep research, reports User loses connection
    • Scope of autonomy. Send emails, make purchases, modify files? Start conservative.

  • Your context is your advantage

    • "the more we constrain the domain, the better the results" - Jonathan Mieloo
    • PostHog's advantage comes from having "competences and expertise on agents that other companies don't have" because they're "building agents with production data at scale” - Em
    • "The agent has very little understanding what is the customer's product. Even simple questions with a deep and complex context behind the question aren't possible to answer without retrieving more information."
    • You need:
      • Access to your product's taxonomy/schema
      • Have a schema for your data, validation
      • User context (who they are, what they've done)
      • Domain knowledge about your product
      • Idempotent endpoints
      • Dry-run / preview mode
    • Different types of context
      • Conversation preservation → Don't lose context on reload
      • Long-term memory → Facts learned from past interactions
      • RAG → Search existing data/insights
  • Pick the right architecture

    • Because context is so important you need to make sure your architecture supports it
      • context loss between agents, the "black box problem”
    • You’re probably not going to win because you’ve created some genius new architecture for agents. Don’t use innovation points here.
    • You don’t need to invent this all yourself.
    • Single loop rather than multi loop. A single loop beats subagents. Context is everything.
    • To-dos are super powerful
    • "The core insight is that by adding todo-list tooling (a tool_write tool to write a todo list, a todo_read tool to check the current list), the agent is better suited to keep track of its work across long-running tasks."
    • LLMs are glue
    • Modular, extensible architecture with separate modes rather than monolithic systems.
      • feat(max): agent abstraction posthog#40539 “Refactors the core AI agent into a modular graph and separate modes, making it simpler for engineers to build and extend various AI features like Deep Research or Product Analytics.”
      • Plan for specialized modes to preserve context and optimize performance within specific execution loops
      • “Tool expansion”
  • Design tools well and beware tool explosion (could combine with architecture)

    • Group tools for jobs to be done
      • "The problem with this approach is that teams have to declare one tool at a time, but what they would rather need to do is declare a batch of tools that represent how their product solves the customer's job to be done."
    • Tools should have clear input/output
      • "We need to build tools that can interconnect in the right order, e.g. session summaries will require a cohort of users / filters as an input, which can be derived from a generated insight."
      • Use scorers like HogQL Completeness: 0 or 1 if HogQL compiles and SQL Completeness: 0 or 1 if SQL doesn't throw runtime exception
    • Keep tool count managable
      • "There isn't much literature about how well language models perform with 100+ tools, but we can reasonably assume performance could degrade. Every single request includes instructions for all tools, even completely irrelevant ones."
      • Dynamic tool loading based on context/intent.
  • Observability and evaluation from day one

    • Evals and monitoring are critical to agent quality - Georgiy
    • "Max is still not great when we assess its quality. There are numerous hallucinations, incorrect agent trajectories, incorrect contexts... we cannot quickly react to internal experiments."
    • "Teams currently lack the ability to systematically evaluate their AI system quality, detect regressions, and understand performance patterns."
    • You need:
      • Tracing for every LLM call (inputs, outputs, latency, cost)
      • Trace IDs that span the full conversation
      • Ability to replay/debug specific interactions
      • Curated datasets of real user queries
      • Automated scorers (LLM-as-judge, deterministic checks)
      • Baseline metrics before you ship
  • Don’t lose sight of real user pain points (could combine with evaluation for section on “trust”)

    • Can you create an agent good enough to trust? Repeated failures kill trust. Overconfidence is worse than ignorance.
      • Show uncertainty, sources, steps.
      • "Show every step... A black box is uninspiring no matter how good the results are."
    • "The most common complaint was inconsistent performance and unexpected failures... 'Max seems like it suddenly got stupider'... Generic error messages without clear explanations."
    • "Users weren't sure what Max could and couldn't do, leading to failed attempts at complex queries: 'Not clear what data and tools does Max have exactly?'"

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions