Grep your dataframes
Search and filter dataframes with grep-like patterns. Works with pandas, polars, and any backend supported by Narwhals.
# Find what you're looking for
df.grep("active") # Simple search
df.grep("@gmail.com") # Find patterns
df.grep(r"^\d{3}-\d{4}$") # Regex support- π Familiar - grep-like interface for row-based dataframe filtering
- π Fast - Backend-agnostic, works with your preferred library
- π― Simple - Three ways to use: function, pipe, or accessor
- β‘ Efficient - Lazy evaluation with polars/daft for large datasets
uv add nwgrepfrom nwgrep import nwgrep
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Eve"],
"status": ["active", "locked", "active"],
})
# Find all rows containing "active"
result = nwgrep(df, "active")
# βββββββββ¬βββββββββ
# β name β status β
# β --- β --- β
# β str β str β
# βββββββββͺβββββββββ‘
# β Alice β active β
# β Eve β active β
# βββββββββ΄βββββββββChoose the style that fits your workflow:
from nwgrep import nwgrep
result = nwgrep(df, "active")result = (
df
.pipe(nwgrep, "active")
.pipe(nwgrep, "@example.com", columns=["email"])
)For Polars and Pandas backends, you can use the accessor method to add .grep function directly to the DataFrame:
from nwgrep import register_grep_accessor
register_grep_accessor()
df.grep("active") # Search all columns
df.grep("ALICE", case_sensitive=False) # Case-insensitive
df.grep("example.com", columns=["email"]) # Specific columns# Case-insensitive search
df.grep("ACTIVE", case_sensitive=False)
# Invert match (like grep -v)
df.grep("test", invert=True)
# Regex patterns
df.grep(r".*@example\.com", regex=True)
# Multiple patterns (OR logic)
df.grep(["Alice", "Bob"])
# Whole word matching
df.grep("active", whole_word=True)
# Column-specific search
df.grep("pattern", columns=["name", "email"])
# Highlight matching cells in notebooks (pandas/polars)
df.grep("error", highlight=True) # Returns styled output with highlighted cellsSearch parquet, feather, and other binary formats directly:
# Install cli
uv tool install "nwgrep[cli]"
# Basic search
nwgrep "error" logfile.parquet
# Case insensitive + regex
nwgrep -i -E "warn(ing)?" data.feather
# Column-specific search
nwgrep --columns email "@gmail.com" users.parquet
# Count matching rows
nwgrep --count "pattern" data.parquet
# List files with matches (like grep -l)
nwgrep -l "error" *.parquet
# Show only matching values (like grep -o)
nwgrep -o "error" data.parquet
# Stream as NDJSON (lazy evaluation)
nwgrep --format ndjson "pattern" huge_file.parquetWorks seamlessly with any dataframe library thanks to Narwhals:
| Backend | Support | Notes |
|---|---|---|
| pandas | β | Full support |
| polars | β | DataFrame and LazyFrame |
| pyarrow | β | Table support |
| dask | β | Distributed dataframes |
| daft | β | Lazy evaluation |
| cuDF | β | GPU acceleration |
| modin | β | Parallel pandas |
Same code, any backend. Switch freely without rewriting your filters.
Basic installation:
uv add nwgrep
# or
pip install nwgrepWith specific backends:
uv add nwgrep # core library
uv add nwgrep[cli] # CLI for searching parquet/feather files using polars
uv add nwgrep[notebook] # highlighting in notebooks (pandas/polars)
uv add nwgrep[all] # include all features (cli + notebook)Note: nwgrep is designed to be added to an existing environment with a dataframe library (pandas, polars, etc.) already installed. It does not install these backends by default, except for polars when installing the [cli] extra.
- π Backend agnostic: Write once, run on any dataframe library
- π Multiple search modes: Literal, regex, case-sensitive/insensitive
- π Column filtering: Search all columns or specific ones
- β‘ Lazy evaluation: Efficient with large datasets (polars/daft)
- π― Familiar interface: grep-like flags and behavior (
-i,-v,-E) - π§ Type safe: Full type hints with ty type checking
- π¨ Flexible API: Function, pipe, or accessor - your choice
- π₯οΈ CLI included: Search binary formats from the command line
Full documentation available at erichutchins.github.io/nwgrep
- Installation Guide - Setup for all backends
- Usage Examples - Comprehensive examples
- API Reference - Complete function reference
- CLI Reference - Command-line usage
users = df.grep("active", columns=["status"])gmail_users = df.grep("@gmail.com", columns=["email"])errors = df.grep(["ERROR", "CRITICAL"], columns=["level"])# Find rows without email addresses
missing_email = df.grep(r"\w+@\w+\.\w+", regex=True, invert=True)result = (
df
.grep("active", columns=["status"]) # Active users
.grep("@company.com", columns=["email"]) # Company emails
.grep("admin", invert=True) # Exclude admins
)nwgrep is a certified Narwhals plugin, enabling truly backend-agnostic code:
import narwhals as nw
from nwgrep import nwgrep
def process_any_dataframe(df_native):
"""Works with pandas, polars, pyarrow, or any Narwhals-supported backend"""
df = nw.from_native(df_native)
result = nwgrep(df, "pattern")
return nw.to_native(result)Contributions welcome! See CONTRIBUTING.md for development setup and guidelines.
MIT License - see LICENSE file for details.
Built with Narwhals