|
| 1 | +# @aipexstudio/dom-snapshot |
| 2 | + |
| 3 | +A lightweight library for capturing DOM snapshots without relying on Chrome DevTools Protocol (CDP) Accessibility Tree (AXTree). This library provides a pure JavaScript/TypeScript solution for creating structured page snapshots that can be used for web automation, testing, and AI-powered browser agents. |
| 4 | + |
| 5 | +## Why Not CDP AXTree? |
| 6 | + |
| 7 | +Traditional approaches to capturing page structure often rely on CDP's Accessibility Tree, which has several limitations: |
| 8 | + |
| 9 | +- **Browser dependency**: Requires Chrome/Chromium with DevTools Protocol |
| 10 | +- **Performance overhead**: CDP communication adds latency |
| 11 | +- **Complex setup**: Needs browser debugging port configuration |
| 12 | +- **Limited portability**: Doesn't work in all browser contexts |
| 13 | + |
| 14 | +This library takes a different approach by directly traversing the DOM and building a semantic snapshot that mimics accessibility tree structure, but works in any browser environment with just JavaScript. |
| 15 | + |
| 16 | +## Features |
| 17 | + |
| 18 | +- **Pure DOM-based**: No CDP or browser extensions required |
| 19 | +- **Accessibility-aware**: Captures semantic roles, names, and states following ARIA patterns |
| 20 | +- **Interactive element focus**: Prioritizes buttons, links, inputs, and other actionable elements |
| 21 | +- **Hidden element filtering**: Automatically skips `aria-hidden`, `display:none`, `visibility:hidden`, and `inert` elements |
| 22 | +- **Stable node IDs**: Assigns persistent `data-aipex-nodeid` attributes for reliable element targeting |
| 23 | +- **Text content extraction**: Captures static text nodes for full page context |
| 24 | +- **Configurable options**: Control text length limits, hidden element inclusion, and text node capture |
| 25 | +- **Search functionality**: Built-in glob pattern search across snapshot text |
| 26 | + |
| 27 | +## Installation |
| 28 | + |
| 29 | +```bash |
| 30 | +npm install @aipexstudio/dom-snapshot |
| 31 | +# or |
| 32 | +pnpm add @aipexstudio/dom-snapshot |
| 33 | +``` |
| 34 | + |
| 35 | +## Usage |
| 36 | + |
| 37 | +### Basic Snapshot Collection |
| 38 | + |
| 39 | +```typescript |
| 40 | +import { collectDomSnapshot, collectDomSnapshotInPage } from '@aipexstudio/dom-snapshot'; |
| 41 | + |
| 42 | +// Collect snapshot from current page |
| 43 | +const snapshot = collectDomSnapshotInPage(); |
| 44 | + |
| 45 | +// Or specify a custom document |
| 46 | +const snapshot = collectDomSnapshot(document, { |
| 47 | + maxTextLength: 160, // Max characters for element text (default: 160, does not affect StaticText) |
| 48 | + includeHidden: false, // Include hidden elements (default: false) |
| 49 | + captureTextNodes: true, // Capture StaticText nodes (default: true) |
| 50 | +}); |
| 51 | + |
| 52 | +console.log(snapshot.totalNodes); // Total nodes captured |
| 53 | +console.log(snapshot.root); // Root node of the tree |
| 54 | +console.log(snapshot.idToNode); // Flat map of id -> node |
| 55 | +console.log(snapshot.metadata.url); // Page URL |
| 56 | +``` |
| 57 | + |
| 58 | +### Converting to Text Format |
| 59 | + |
| 60 | +```typescript |
| 61 | +import { collectDomSnapshot, DomSnapshotManager } from '@aipexstudio/dom-snapshot'; |
| 62 | + |
| 63 | +const manager = new DomSnapshotManager(); |
| 64 | + |
| 65 | +// Collect raw snapshot |
| 66 | +const serialized = collectDomSnapshot(document); |
| 67 | + |
| 68 | +// Convert to TextSnapshot format |
| 69 | +const textSnapshot = manager.buildTextSnapshot(serialized, { tabId: 1 }); |
| 70 | + |
| 71 | +// Format as readable text representation |
| 72 | +const formatted = manager.formatSnapshot(textSnapshot); |
| 73 | +console.log(formatted); |
| 74 | +``` |
| 75 | + |
| 76 | +Output example: |
| 77 | +``` |
| 78 | +→uid=dom_abc123 RootWebArea "My Page" <body> |
| 79 | + uid=dom_def456 button "Submit" <button> |
| 80 | + uid=dom_ghi789 textbox "Email" <input> desc="Enter your email" |
| 81 | + StaticText "Welcome to our site" |
| 82 | + *uid=dom_jkl012 link "Learn More" <a> |
| 83 | +``` |
| 84 | + |
| 85 | +Markers: |
| 86 | +- `*` - Currently focused element |
| 87 | +- `→` - Ancestor of focused element |
| 88 | +- ` ` (space) - Regular element |
| 89 | + |
| 90 | +### Searching Snapshots |
| 91 | + |
| 92 | +```typescript |
| 93 | +import { searchSnapshotText } from '@aipexstudio/dom-snapshot'; |
| 94 | + |
| 95 | +const formatted = manager.formatSnapshot(textSnapshot); |
| 96 | + |
| 97 | +// Simple text search |
| 98 | +const result = searchSnapshotText(formatted, 'Submit'); |
| 99 | + |
| 100 | +// Multiple terms with | separator |
| 101 | +const result = searchSnapshotText(formatted, '登录 | Login | Sign In'); |
| 102 | + |
| 103 | +// Glob pattern search |
| 104 | +const result = searchSnapshotText(formatted, 'button* | *submit*', { |
| 105 | + useGlob: true, |
| 106 | + contextLevels: 2, // Lines of context around matches |
| 107 | + caseSensitive: false, |
| 108 | +}); |
| 109 | + |
| 110 | +console.log(result.matchedLines); // Line numbers of matches |
| 111 | +console.log(result.contextLines); // All lines to display (with context) |
| 112 | +console.log(result.totalMatches); // Total match count |
| 113 | +``` |
| 114 | + |
| 115 | +## API Reference |
| 116 | + |
| 117 | +### `collectDomSnapshot(document, options?)` |
| 118 | + |
| 119 | +Collects a DOM snapshot from the specified document. |
| 120 | + |
| 121 | +**Parameters:** |
| 122 | +- `document` - The Document to snapshot |
| 123 | +- `options` - Optional configuration: |
| 124 | + - `maxTextLength` (number, default: 160) - Maximum text length for element nodes (does not affect StaticText nodes which preserve full content) |
| 125 | + - `includeHidden` (boolean, default: false) - Include hidden elements |
| 126 | + - `captureTextNodes` (boolean, default: true) - Capture text nodes as StaticText |
| 127 | + |
| 128 | +**Returns:** `SerializedDomSnapshot` |
| 129 | + |
| 130 | +### `collectDomSnapshotInPage(options?)` |
| 131 | + |
| 132 | +Convenience function that calls `collectDomSnapshot` with the current `document`. |
| 133 | + |
| 134 | +### `DomSnapshotManager` |
| 135 | + |
| 136 | +Manager class for converting and formatting snapshots. |
| 137 | + |
| 138 | +**Methods:** |
| 139 | +- `buildTextSnapshot(source, options?)` - Convert serialized snapshot to TextSnapshot |
| 140 | +- `formatSnapshot(snapshot)` - Format TextSnapshot as readable text |
| 141 | + |
| 142 | +### `searchSnapshotText(text, query, options?)` |
| 143 | + |
| 144 | +Search snapshot text with optional glob patterns. |
| 145 | + |
| 146 | +**Parameters:** |
| 147 | +- `text` - The formatted snapshot text |
| 148 | +- `query` - Search query (use `|` to separate multiple terms) |
| 149 | +- `options`: |
| 150 | + - `contextLevels` (number, default: 1) - Lines of context around matches |
| 151 | + - `caseSensitive` (boolean, default: false) - Case-sensitive search |
| 152 | + - `useGlob` (boolean, auto-detect) - Enable glob pattern matching |
| 153 | + |
| 154 | +## Node Structure |
| 155 | + |
| 156 | +Each captured node includes: |
| 157 | + |
| 158 | +```typescript |
| 159 | +interface DomSnapshotNode { |
| 160 | + id: string; // Unique node identifier |
| 161 | + role: string; // Semantic role (button, link, textbox, etc.) |
| 162 | + name?: string; // Accessible name |
| 163 | + value?: string; // Current value (for inputs) |
| 164 | + description?: string; // Additional description |
| 165 | + children: DomSnapshotNode[]; // Child nodes |
| 166 | + tagName?: string; // HTML tag name |
| 167 | + |
| 168 | + // State properties |
| 169 | + checked?: boolean | 'mixed'; // Checkbox/radio state |
| 170 | + pressed?: boolean | 'mixed'; // Toggle button state |
| 171 | + disabled?: boolean; // Disabled state |
| 172 | + focused?: boolean; // Focus state |
| 173 | + selected?: boolean; // Selection state |
| 174 | + expanded?: boolean; // Expanded state |
| 175 | + |
| 176 | + // Additional properties |
| 177 | + placeholder?: string; // Input placeholder |
| 178 | + href?: string; // Link URL |
| 179 | + title?: string; // Element title |
| 180 | + textContent?: string; // Text content |
| 181 | + inputType?: string; // Input type attribute |
| 182 | +} |
| 183 | +``` |
| 184 | + |
| 185 | +## Role Mapping |
| 186 | + |
| 187 | +The library maps HTML elements to semantic roles: |
| 188 | + |
| 189 | +| HTML Element | Role | |
| 190 | +|-------------|------| |
| 191 | +| `<button>` | button | |
| 192 | +| `<a href="...">` | link | |
| 193 | +| `<input type="text">` | textbox | |
| 194 | +| `<input type="checkbox">` | checkbox | |
| 195 | +| `<input type="radio">` | radio | |
| 196 | +| `<input type="range">` | slider | |
| 197 | +| `<select>` | combobox | |
| 198 | +| `<textarea>` | textbox | |
| 199 | +| `<img>` | image | |
| 200 | +| Elements with `contenteditable` | textbox | |
| 201 | + |
| 202 | +Explicit `role` attributes are respected and take precedence. |
| 203 | + |
| 204 | +## Skipped Elements |
| 205 | + |
| 206 | +The following are automatically excluded from snapshots: |
| 207 | + |
| 208 | +- `<script>`, `<style>`, `<noscript>`, `<template>`, `<svg>`, `<head>`, `<meta>`, `<link>` |
| 209 | +- Elements with `aria-hidden="true"` |
| 210 | +- Elements with `hidden` attribute |
| 211 | +- Elements with `inert` attribute |
| 212 | +- Elements with `display: none` |
| 213 | +- Elements with `visibility: hidden` |
| 214 | + |
| 215 | +## Use Cases |
| 216 | + |
| 217 | +- **Web Automation**: Provide page context to AI agents for browser automation |
| 218 | +- **Testing**: Capture page state for snapshot testing |
| 219 | +- **Accessibility Auditing**: Analyze semantic structure of pages |
| 220 | +- **Content Extraction**: Extract meaningful content from web pages |
| 221 | +- **Browser Extensions**: Build tools that need page structure without CDP |
| 222 | + |
| 223 | +## License |
| 224 | + |
| 225 | +MIT |
0 commit comments