fstsed

Search and enrich at near-ripgrep speed

fstsed is a high-performance search-and-enrichment tool for cases where every match needs its own context. It's not a faster ripgrep -- instead, fstsed brings ripgrep-class performance to a different problem:

Searching for many strings at once, where each string carries its own structured metadata.

fstsed is built on BurntSushi's fst crate and is inspired by the blog post Index 1,600,000,000 Keys with Automata and Rust and of course ripgrep itself.

My other tool, geoipsed, uses regexes to match only IP addresses and enrich with only the metadata from vendors MaxMind and Spur. In contrast, fstsed is completely flexible, allowing you to match any string and bundle any metadata you want.

Why Inline Enrichment Matters

When investigating logs, reports, or documents, raw matches are never enough. You need context.

Before

2024-01-10 14:23:45 Connection established to 192.168.1.105
2024-01-10 14:23:51 DNS query: flowersbyirene.com
2024-01-10 14:24:02 HTTP request to suspicious-domain.net

After enrichment with fstsed

2024-01-10 14:23:45 Connection established to 192.168.1.105 (High-priority APT indicator, last seen: Jan 8)
2024-01-10 14:23:51 DNS query: flowersbyirene.com (Attributed to APT99, confidence: high)
2024-01-10 14:24:02 HTTP request to suspicious-domain.net (Phishing infrastructure)

Inline enrichment eliminates context switching. You read enriched output directly, even when working with tens or hundreds of thousands of indicators.

Use Cases

Tool	Patterns	Enrichment	Scaling
`grep -f searchterms.txt`	Regex and fixed strings	No capability	Linear with keys
`rg -f searchterms.txt --replace "replacement"`	Regex and fixed strings	One replacement string for all searchterms (although that replacement string can include named capture groups from the regex search)	Search optimized
`fstsed -f searchterms.fst`	Fixed strings	Replacement text per search term	Constant time complexity

fstsed trades a small amount of raw speed (vs pure ripgrep) to gain rich, structured, per-indicator context.

Key Features

Per-indicator metadata
Each search term has a full JSON object of metadata
Template-driven output
Use --template to control which fields from the JSON metadata object appear in output
Predictable performance at scale
Search time remains stable from 10 to 100,000+ indicators
JSON-aware search mode
Optionally search only inside JSON string values, with proper decoding and re-encoding
Word-boundary-aware matching
Prevents partial matches (apple won't match pineapples)
Nested metadata access
Reference deeply nested fields using JSON Pointer (RFC 6901)
Compact storage
FST databases are Zstd-compressed for minimal disk usage

Quick Start

Build once, reuse everywhere.

# Build an FST database from JSON threat intelligence.
# All the values from .indicator_key can be enriched with any part of the json record
# {"indicator_key": "indicator_value", "threat_actor": "APT99", "severity": "high", "alias": "Operation Red"}
fstsed --build -f intel.fst -k indicator_key threat_intel.json

# Enrich logs with selected fields
cat logs.txt | fstsed -f intel.fst \
  --template "{key} | {threat_actor} | severity: {severity}"

# Same database, different analysis context
fstsed -f intel.fst logs.txt \
  --template "{key} ({campaign})"

# JSON-only search mode (search within JSON string values)
fstsed -f intel.fst --json events.json \
  --template "{key} | remediate: {remediation_steps}"

Templates are fstsed's core abstraction to adapt output without rebuilding your database.

Usage Reference

Find and replace/decorate text at scale using finite state transducers (fst)

Usage: fstsed [OPTIONS] -f <FST> [PATH]...

Arguments:
  [PATH]...
          Input file(s) or directory(s) to process. Leave empty or use "-" to read from stdin. (In build mode, only the first path is used)

Options:
  -o, --only-matching
          Show only nonempty parts of lines that match

      --color [<WHEN>]
          This flag controls when to use colors to highlight matched (non-empty) strings and the rendered template.

          fstsed will suppress color output by default in some other
          circumstances as well. These include, but are not limited to:

          • When the NO_COLOR environment variable is set (regardless of value).

          Possible values:
          - always: Always use color highlighting
          - never:  Never use color highlighting
          - auto:   Use color highlighting only when writing to a terminal (default)

          [default: auto]

  -f <FST>
          Specify fst db to use in search or to create in build mode

      --build
          Build mode. Build a fst from json data instead of querying one. Specify output path with the -f --fst
          parameter. Only first file input parameter or stdin is used to make the fst

  -k, --key <KEY>
          When building a fst, extract the given json field to use as the key in the fst database. Key may also be
          provided as a jsonpointer, e.g. /obj/array/1/item

          [default: key]

      --sorted
          When building a fst, set this if the keys of input json are already lexicographically sorted. This will
          make build construction much faster. If this is set but the keys are not sorted, the fst creation will
          error

  -t, --template <TEMPLATE>
          Specify the format of the fstsed match decoration. Field names are enclosed in {}, for example "{field1}
          any fixed string {field2} & {field3}". Fields may be json keys or jsonpointers {/obj/array/1/item}

  -j, --json
          Json search mode. Fstsed will treat input as json, searching only inside quoted strings. All strings are
          deserialized/decoded before json before searching, and all template decorations are properly json-encoded
          in the output for subsequent processing

  -w, --threads <NUM>
          The number of threads to use for searching

  -u, --no-ignore
          Do not respect ignore files (.gitignore, .ignore, etc.)

      --hidden
          For recursive directory scanning, search hidden files and directories

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Examples:
  Build an FST database from JSON:
    fstsed --build -f data.fst -k key data.json

  Basic find and replace:
    echo "match" | fstsed -f data.fst

  Use a template for decoration:
    echo "match" | fstsed -f data.fst --template "{key} ({info})"

  JSON search mode (only search inside quoted JSON strings):
    fstsed -f data.fst --json input.json

Template Syntax

Templates control how matches are rendered in output.

Variables

Syntax	Description
`{key}`	The matched search term
`{value}`	Full JSON payload (entire record)
`{fieldname}`	Top-level JSON field value
`{/path/to/field}`	Nested field via JSON Pointer (RFC 6901)

Examples

# Simple replacement
--template "{key}"

# Key with single field
--template "{key} ({threat_actor})"

# Multiple fields with formatting
--template "[{severity}] {key} - {description}"

# Nested JSON access
--template "{key}: {/metadata/attribution/actor}"

# Include literal braces by doubling them
--template "{{literal braces}} {key}"

Default Template

When no template is specified, fstsed uses:

<{key}|{value}>

This outputs the matched key and its full JSON payload, useful for debugging or when you need all metadata.

Performance

fstsed's defining characteristic is predictable, ripgrep-scale performance as key counts grow.

Benchmarks below were run against a 100MB realistic log corpus (~1% positive match rate) on a modern multicore system.

Tool	Keys	Time (s)	Throughput (MB/s)
fstsed	10	0.41	245
fstsed	100,000	0.92	110
rg -F -w	10	0.08	1314
rg -F -w	100,000	0.98	103
grep -F -w	100	102	~1

Takeaways

ripgrep remains the fastest general-purpose search tool, especially for small pattern sets
fstsed converges with ripgrep throughput at large key counts
grep becomes impractical beyond ~100 patterns
fstsed shines when you need custom per-match transformations that ripgrep can't provide

Examples

IOC Enrichment with Real Threat Intel (Volexity)

This example demonstrates building a real-world IOC database and enriching text with source-aware metadata.

1. Get the data

git clone https://github.com/volexity/threat-intel.git

uvx --with pandas ipython

2. Convert CSV files to JSON

import pandas as pd
import glob

csvs = glob.glob("threat-intel/**/*.csv", recursive=True)

def conv(csv):
    df = pd.read_csv(csv)
    df["path"] = csv
    return df.to_json(orient="records", lines=True)

iocjson = "\n".join(map(conv, csvs))
with open("volexity.json", "w") as f:
    f.write(iocjson)

3. Build FST database

fstsed --build -f volexity.fst -k value volexity.json

# or pipe from stdin
cat volexity.json | fstsed --build -f volexity.fst -k value

4. Enrich and analyze

Basic search (default template shows full JSON record):

$ echo "test of avsvmcloud.com metadata" | fstsed -f volexity.fst

test of <avsvmcloud.com|{"value":"avsvmcloud.com","type":"hostname","notes":null,"path":"2020/2020-12-14 - DarkHalo Leverages SolarWinds Compromise to Breach Organizations/indicators/indicators.csv"}> metadata

Custom template for cleaner output:

$ echo "test of avsvmcloud.com metadata" | fstsed -f volexity.fst \
    --template "{key} (a {type} from {path} report)"

test of avsvmcloud.com (a hostname from 2020/2020-12-14 - DarkHalo Leverages SolarWinds Compromise to Breach Organizations/indicators/indicators.csv report) metadata

More Examples

MITRE ATT&CK Technique Tagging

Enrich code or logs with ATT&CK technique context:

# Build from ATT&CK data
cat attack_patterns.json | fstsed --build -f attack.fst -k pattern

# Tag findings with technique IDs and tactics
fstsed -f attack.fst suspicious_script.ps1 \
  --template "{key} [ATT&CK: {technique_id} - {tactic}]"

Translation and Highlighting

fstsed is not limited to infosec use cases. Here's a translation highlighting example:

# Build translation database
echo '{"key":"ဗိုလ်ချုပ်မှူးကြီး","translated":"Senior General of Myanmar Army"}' \
  | fstsed -f myanmar.fst --build -k key

# Highlight and translate in foreign text
fstsed -f myanmar.fst article.txt --template "<{key}> ({translated})"

Then, taking the lede from a BBC article as a test case, the output shows matched terms with inline translations:

fstsed -f myanmar.fst bbc.txt --template "<{key}> ({translated})"

လွန်ခဲ့တဲ့ ၅ နှစ်က တပ်မတော်ကာကွယ်ရေးဦးစီးချုပ် ရဲ့သက်တမ်းဟာ အကန့်အသတ်မရှိတဲ့ သဘောဖြစ်နေလို့ ၆၅ နှစ်ကန့်သတ်ပြီးပြင်ခဲ့တယ်လို့ <ဗိုလ်ချုပ်မှူးကြီး> (Senior General of Myanmar Army) မင်းအောင်လှိုင်က ပြောခဲ့ပြီး သူ့အသက် ၆၅ နှစ်ပြည့်ဖို့ လပိုင်းအလိုမှာ အာဏာသိမ်းကာ အဲ့ဒီ့ကန့်သတ်ချက်ကို ပယ်ဖျက်လိုက်တဲ့ အတွက် တပ်မတော်ကာကွယ်ရေးဦးစီးချုပ်သက်တမ်းဟာ အကန့်အသတ်မဲ့ ပြန်ဖြစ်သွားပါတယ်။

Even if I can't read any of the Burmese, I still know which key phrase matched, what that phrase means in my native tongue, and where generally the match occurred in the document.

When to Use fstsed

Use fstsed when:

You have many search terms (100s to 100,000s)
Each term has associated metadata (threat intel, translations, annotations)
Inline context matters more than raw matching speed
You want to reuse the same database with different output templates

Use ripgrep when:

You need the fastest possible literal search
You have a handful of patterns
You don't need per-match metadata
You need regex or capture group substitution

Limitations

Current Constraints

Note

fstsed is optimized for literal string matching with rich metadata. The following constraints are by design.

Word boundaries only
Matches must start and end at word boundaries. apple won't match inside pineapple or even in apples.
Literal strings only
Patterns are exact strings, not regular expressions. Use ripgrep for regex needs.
No null bytes in keys
Keys must not contain null bytes (\0). The SENTINEL character is used internally to separate keys from values.
Immutable databases
FST databases cannot be updated incrementally. Any changes require a full rebuild.

Matching Semantics & Build Notes

Boundary Matching

Important

Keys beginning or ending with word characters ([a-zA-Z0-9_]) must be bounded by non-word characters in input text.

Key	Input	Matches?
`apple`	`an apple`	✅ Yes
`apple`	`an apple.`	✅ Yes
`apple`	`an apple,`	✅ Yes
`apple`	`pineapple`	❌ No
`apple`	`apples`	❌ No
`192.168.1.1`	`ip:192.168.1.1`	✅ Yes
`192.168.1.1`	`192.168.1.15`	❌ No
`10.10.1.1`	`110.10.1.1`	❌ No

Shadowed Keys (Prefix Handling)

Important: Due to word boundary requirements, purely alphanumeric keys like abc and abcde do NOT shadow each other. They match independently when word-bounded.

True shadowing occurs when a shorter key followed by a non-word character forms a longer key:

Shorter Key	Longer Key	Shadowing?	Reason
`API`	`API-KEY`	✅ Yes	Hyphen is non-word char
`user`	`user@domain`	✅ Yes	`@` is non-word char
`file`	`file.txt`	✅ Yes	Dot is non-word char
`abc`	`abcde`	❌ No	Both need word boundaries; match independently
`test`	`testing`	❌ No	Both need word boundaries; match independently

Examples with shadowed keys:

Input	Match	Why
`use API-KEY here`	`API-KEY`	`API` is shadowed, never matches
`send user@domain`	`user@domain`	`user` is shadowed, never matches
`read file.txt`	`file.txt`	`file` is shadowed, never matches

Examples WITHOUT shadowing:

Input	Match	Why
`hello abc test`	`abc`	Word-bounded match
`foo abcde test`	`abcde`	Word-bounded match
`abc and abcde`	Both	Each matches independently when word-bounded

Keys can contain internal non-word characters:

User-Agent strings: Mozilla/5.0
IP addresses: 192.168.1.1
File paths: C:\\Windows\\System32
API endpoints: api-v2-endpoint
Email patterns: user@example.com

These are valid keys because the non-word characters are internal, not at the boundaries where matching occurs.

Build Process

Input JSON records are parsed
The specified key field (-k) becomes the search term
The entire JSON record becomes the metadata payload
Records are sorted and compiled into an FST
The FST is Zstd-compressed and written to disk

Database Inspection

# Optional: install fst-bin for database inspection
cargo install fst-bin

# Dump all keys in an FST
fst range your.fst

# Check if a specific key exists
fst grep your.fst "exact-key"

Installation

From Source

git clone https://github.com/erichutchins/fstsed.git
cd fstsed
cargo build --release

# Install to ~/.cargo/bin
cargo install --path .

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgements

fstsed is built on BurntSushi's excellent fst crate and is inspired by the performance principles behind ripgrep. This project would not exist without that foundational work.

This project was developed with significant assistance from Claude 4.5 Sonnet and Google Gemini, via Zed and AntiGravity editors, respectively.

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
.github		.github
benches		benches
benchmarks		benchmarks
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fstsed

Table of Contents

Why Inline Enrichment Matters

Use Cases

Key Features

Quick Start

Variables

Examples

Default Template

Performance

Takeaways

Examples

IOC Enrichment with Real Threat Intel (Volexity)

MITRE ATT&CK Technique Tagging

Translation and Highlighting

When to Use fstsed

Current Constraints

Boundary Matching

Shadowed Keys (Prefix Handling)

Build Process

Database Inspection

Installation

From Source

See Also

Contributing

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

erichutchins/fstsed

Folders and files

Latest commit

History

Repository files navigation

fstsed

Table of Contents

Why Inline Enrichment Matters

Use Cases

Key Features

Quick Start

Variables

Examples

Default Template

Performance

Takeaways

Examples

IOC Enrichment with Real Threat Intel (Volexity)

MITRE ATT&CK Technique Tagging

Translation and Highlighting

When to Use fstsed

Current Constraints

Boundary Matching

Shadowed Keys (Prefix Handling)

Build Process

Database Inspection

Installation

From Source

See Also

Contributing

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages