Skip to content

Session indexing: FTS search, sessions-index.json fast path, richer metadata #23

@danshapiro

Description

@danshapiro

Summary

Improve session discovery and search by adding a persistent index layer, leveraging Claude Code's pre-computed session index, and extracting richer metadata from JSONL files.

This spec is intentionally implementation-agnostic — it describes what should change, not how to wire it into the current codebase.


1. Full-Text Search Index

Problem

The current fullText search tier performs line-by-line regex on live JSONL files. This is slow for large session histories (1000+ sessions) and doesn't support ranked results.

Requirements

  • Index session content into a persistent store with full-text search capability (e.g. SQLite FTS5, or similar)
  • Indexed fields: first_prompt, summary, user message text
  • Search should return ranked results (relevance scoring)
  • Index must stay in sync as sessions are created/updated — either via the existing file watcher or periodic rescan
  • Incremental updates: only re-index sessions whose files have changed (mtime-based skip)
  • Index should be rebuildable from scratch if corrupted or deleted
  • Store index in ~/.freshell/ alongside existing config

Behavior

  • Title-tier search continues to work as today (fast client-side filter)
  • userMessages and fullText tiers hit the index instead of scanning files
  • Search results include match snippets (the line/context that matched)
  • Partial results should be clearly indicated if indexing is still in progress

2. sessions-index.json Fast Path

Problem

On startup (and rescan), every JSONL file is opened and partially parsed to extract metadata. Claude Code maintains a sessions-index.json file that contains pre-computed session metadata, but we don't use it.

Requirements

  • On startup/rescan, check for ~/.claude/projects/*/sessions-index.json
  • If present and newer than our cached state, read session metadata from it instead of parsing individual JSONL files
  • Fall back to JSONL parsing for sessions not present in the index (e.g. other providers, or if the index is stale/missing)
  • Document the expected schema of sessions-index.json based on what Claude Code actually writes

Behavior

  • Startup time should improve significantly for users with many sessions
  • No user-visible behavior change — same session list, same metadata
  • If sessions-index.json is malformed or missing, degrade gracefully to current behavior

3. Richer Metadata Extraction

Problem

We currently extract title, summary, message count, cwd, and timestamps from JSONL files. There's more useful metadata available that would improve browsing and filtering.

Requirements

Token usage:

  • Parse usage objects from assistant messages (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens)
  • Aggregate totals per session
  • Display in session cards and detail views
  • Useful for cost awareness and identifying expensive sessions

Git branch:

  • Extract git branch from session metadata (typically in early system/config entries)
  • Display as a badge on session cards
  • Enable filtering/grouping by branch

Model info:

  • Extract which model(s) were used during the session
  • Display formatted model name (strip claude- prefix and date suffix, e.g. claude-opus-4-5-20251101opus-4-5)

Duration:

  • Compute session duration from first to last entry timestamps
  • Display as human-readable duration

Tool usage summary:

  • Count tool invocations by type (Read, Write, Edit, Bash, Grep, etc.)
  • Useful for understanding what a session did at a glance (e.g. "heavy editing session" vs "mostly research")

Behavior

  • All new metadata fields are optional — sessions missing them display gracefully
  • Metadata is extracted during the existing parse pass (or from the FTS index if implemented)
  • New fields available in search results and session cards
  • No new API endpoints required — extend existing session metadata shape

Non-Goals (for this spec)

  • Session transcript viewer (full conversation rendering in a pane) — separate feature, builds on top of this indexing work
  • Conversation tree/branching model — separate feature
  • Conversation minimap — separate feature
  • UI redesign of HistoryView — separate; this spec focuses on the data layer

Inspiration

claude-session-viewer uses SQLite + FTS5 with mtime-based skip optimization and a sessions-index.json fast path. Their metadata includes token usage, git branch, model name, and subagent counts. Worth referencing for schema decisions.


Open Questions

  • Should the FTS index live in SQLite (proven, claude-session-viewer uses it) or something lighter (e.g. MiniSearch in-memory with serialization)?
  • How much of user message content should be indexed? Full text vs first N characters?
  • Should token costs be estimated in dollars (requires model pricing lookup) or just shown as raw token counts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions