Skip to content

erichutchins/nwgrep

Repository files navigation

nwgrep

Grep your dataframes

Search and filter dataframes with grep-like patterns. Works with pandas, polars, and any backend supported by Narwhals.

Documentation uv ruff ty License: MIT Claude Gemini

At a Glance

# Find what you're looking for
df.grep("active")              # Simple search
df.grep("@gmail.com")          # Find patterns
df.grep(r"^\d{3}-\d{4}$")      # Regex support

Why nwgrep?

  • πŸ” Familiar - grep-like interface for row-based dataframe filtering
  • πŸš€ Fast - Backend-agnostic, works with your preferred library
  • 🎯 Simple - Three ways to use: function, pipe, or accessor
  • ⚑ Efficient - Lazy evaluation with polars/daft for large datasets

Quick Start

uv add nwgrep
from nwgrep import nwgrep
import polars as pl

df = pl.DataFrame({
    "name": ["Alice", "Bob", "Eve"],
    "status": ["active", "locked", "active"],
})

# Find all rows containing "active"
result = nwgrep(df, "active")

# β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
# β”‚ name  ┆ status β”‚
# β”‚ ---   ┆ ---    β”‚
# β”‚ str   ┆ str    β”‚
# β•žβ•β•β•β•β•β•β•β•ͺ════════║
# β”‚ Alice ┆ active β”‚
# β”‚ Eve   ┆ active β”‚
# β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three Ways to Use

Choose the style that fits your workflow:

1. Direct Function

from nwgrep import nwgrep
result = nwgrep(df, "active")

2. Pipe Method

result = (
    df
    .pipe(nwgrep, "active")
    .pipe(nwgrep, "@example.com", columns=["email"])
)

3. Accessor Method

For Polars and Pandas backends, you can use the accessor method to add .grep function directly to the DataFrame:

from nwgrep import register_grep_accessor
register_grep_accessor()

df.grep("active")                    # Search all columns
df.grep("ALICE", case_sensitive=False)  # Case-insensitive
df.grep("example.com", columns=["email"])  # Specific columns

Powerful Search Options

# Case-insensitive search
df.grep("ACTIVE", case_sensitive=False)

# Invert match (like grep -v)
df.grep("test", invert=True)

# Regex patterns
df.grep(r".*@example\.com", regex=True)

# Multiple patterns (OR logic)
df.grep(["Alice", "Bob"])

# Whole word matching
df.grep("active", whole_word=True)

# Column-specific search
df.grep("pattern", columns=["name", "email"])

# Highlight matching cells in notebooks (pandas/polars)
df.grep("error", highlight=True)  # Returns styled output with highlighted cells

Command Line Interface

Search parquet, feather, and other binary formats directly:

# Install cli
uv tool install "nwgrep[cli]"

# Basic search
nwgrep "error" logfile.parquet

# Case insensitive + regex
nwgrep -i -E "warn(ing)?" data.feather

# Column-specific search
nwgrep --columns email "@gmail.com" users.parquet

# Count matching rows
nwgrep --count "pattern" data.parquet

# List files with matches (like grep -l)
nwgrep -l "error" *.parquet

# Show only matching values (like grep -o)
nwgrep -o "error" data.parquet

# Stream as NDJSON (lazy evaluation)
nwgrep --format ndjson "pattern" huge_file.parquet

Backend Support

Works seamlessly with any dataframe library thanks to Narwhals:

Backend Support Notes
pandas βœ… Full support
polars βœ… DataFrame and LazyFrame
pyarrow βœ… Table support
dask βœ… Distributed dataframes
daft βœ… Lazy evaluation
cuDF βœ… GPU acceleration
modin βœ… Parallel pandas

Same code, any backend. Switch freely without rewriting your filters.

Installation

Basic installation:

uv add nwgrep
# or
pip install nwgrep

With specific backends:

uv add nwgrep             # core library
uv add nwgrep[cli]        # CLI for searching parquet/feather files using polars
uv add nwgrep[notebook]   # highlighting in notebooks (pandas/polars)
uv add nwgrep[all]        # include all features (cli + notebook)

Note: nwgrep is designed to be added to an existing environment with a dataframe library (pandas, polars, etc.) already installed. It does not install these backends by default, except for polars when installing the [cli] extra.

Features

  • πŸš€ Backend agnostic: Write once, run on any dataframe library
  • πŸ” Multiple search modes: Literal, regex, case-sensitive/insensitive
  • πŸ“Š Column filtering: Search all columns or specific ones
  • ⚑ Lazy evaluation: Efficient with large datasets (polars/daft)
  • 🎯 Familiar interface: grep-like flags and behavior (-i, -v, -E)
  • πŸ”§ Type safe: Full type hints with ty type checking
  • 🎨 Flexible API: Function, pipe, or accessor - your choice
  • πŸ–₯️ CLI included: Search binary formats from the command line

Documentation

Full documentation available at erichutchins.github.io/nwgrep

Quick Examples

Find Active Users

users = df.grep("active", columns=["status"])

Email Domain Search

gmail_users = df.grep("@gmail.com", columns=["email"])

Log Analysis

errors = df.grep(["ERROR", "CRITICAL"], columns=["level"])

Data Quality Checks

# Find rows without email addresses
missing_email = df.grep(r"\w+@\w+\.\w+", regex=True, invert=True)

Pipeline Filtering

result = (
    df
    .grep("active", columns=["status"])     # Active users
    .grep("@company.com", columns=["email"]) # Company emails
    .grep("admin", invert=True)              # Exclude admins
)

Narwhals Integration

nwgrep is a certified Narwhals plugin, enabling truly backend-agnostic code:

import narwhals as nw
from nwgrep import nwgrep

def process_any_dataframe(df_native):
    """Works with pandas, polars, pyarrow, or any Narwhals-supported backend"""
    df = nw.from_native(df_native)
    result = nwgrep(df, "pattern")
    return nw.to_native(result)

Contributing

Contributions welcome! See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE file for details.


Built with Narwhals

About

Grep your dataframes

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •