Skip to content

Commit

Permalink
Add mkdocs configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
pavelzw committed Jul 13, 2024
1 parent 4282163 commit 5c60bee
Show file tree
Hide file tree
Showing 14 changed files with 1,487 additions and 214 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Docs
on:
pull_request:
push:
branches:
- main
permissions:
contents: write
jobs:
docs:
runs-on: ubuntu-latest
steps:
- name: Checkout branch
uses: actions/checkout@v4
- name: Set up pixi
uses: prefix-dev/setup-pixi@ba3bb36eb2066252b2363392b7739741bb777659
with:
environments: docs
- name: Build docs
run: pixi run -e docs docs-build
- name: Deploy docs
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: pixi run -e docs mkdocs gh-deploy --force
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# multiregex

[![CI](https://img.shields.io/github/actions/workflow/status/quantco/multiregex/ci.yml?style=flat-square&branch=main)](https://github.com/quantco/multiregex/actions/workflows/ci.yml)
[![Documentation](https://img.shields.io/badge/docs-7E56C2?style=flat-square)](https://quantco.github.io/multiregex)
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/multiregex?logoColor=white&logo=conda-forge&style=flat-square)](https://prefix.dev/channels/conda-forge/packages/multiregex)
[![pypi-version](https://img.shields.io/pypi/v/multiregex.svg?logo=pypi&logoColor=white&style=flat-square)](https://pypi.org/project/multiregex)
[![python-version](https://img.shields.io/pypi/pyversions/multiregex?logoColor=white&logo=python&style=flat-square)](https://pypi.org/project/multiregex)
Expand Down
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

3 changes: 3 additions & 0 deletions docs/api-documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# API Documentation

::: multiregex
13 changes: 0 additions & 13 deletions docs/changelog.rst

This file was deleted.

107 changes: 0 additions & 107 deletions docs/conf.py

This file was deleted.

5 changes: 5 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# multiregex

Quickly match many regexes against a string. Provides 2-10x speedups over naïve regex matching.

[API Documentation](api-documentation.md)
17 changes: 0 additions & 17 deletions docs/index.rst

This file was deleted.

35 changes: 0 additions & 35 deletions docs/make.bat

This file was deleted.

58 changes: 58 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
site_name: multiregex
site_description: Quickly match many regexes against a string. Provides 2-10x speedups over naïve regex matching.
site_url: https://quantco.github.io/multiregex
theme:
name: material
palette:
# Palette toggle for automatic mode
- media: "(prefers-color-scheme)"
toggle:
icon: material/brightness-auto
name: Switch to light mode
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
primary: deep purple
# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to system preference
primary: deep purple
features:
- content.action.edit
- search.suggest
- search.highlight
- content.code.annotate
- content.code.copy
icon:
repo: fontawesome/brands/github-alt
edit: material/pencil
repo_name: quantco/multiregex
repo_url: https://github.com/quantco/multiregex
edit_uri: edit/main/docs/
plugins:
- search
- mkdocstrings:
handlers:
python:
options:
unwrap_annotated: true
show_symbol_type_heading: true
docstring_style: numpy
docstring_section_style: spacy
separate_signature: true
merge_init_into_class: true

nav:
- index.md
- api-documentation.md
markdown_extensions:
- admonition
- pymdownx.highlight
- pymdownx.superfences
- pymdownx.inlinehilite
38 changes: 20 additions & 18 deletions multiregex/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
r"""Speed up regex matching with non-regex substring "prematchers", similar to
Bloom filters.
r"""Speed up regex matching with non-regex substring "prematchers", similar to Bloom filters.
For each regex pattern we use a list of simple (non-regex) substring prematchers.
When evaluating regex patterns on a string, we use the prematchers to restrict
the set of regex patterns to be run. Hence, the prematchers _must_ match each string
unless it's impossible for the corresponding regex to match, similar to Bloom filters.
Examples:
r"\bfoo\b" -> ["foo"]
r"(foo|bar) \s*" -> ["foo ", "bar "]
r"Gemäß Richtlinie" -> ["gemäß richtlinie"]
# Prematchers are all-lowercase (to support re.IGNORECASE).
Examples
--------
```python
r"\bfoo\b" -> ["foo"]
r"(foo|bar) \s*" -> ["foo ", "bar "]
r"Gemäß Richtlinie" -> ["gemäß richtlinie"]
# Prematchers are all-lowercase (to support re.IGNORECASE).
```
Prematchers are attempted to be automatically generated from the regexes, see
`RegexMatcher.generate_prematchers`. You must provide a handcrafted list of
Expand All @@ -31,6 +33,7 @@
import sre_constants
import sre_parse
from typing import (
Callable,
Dict,
Iterable,
List,
Expand Down Expand Up @@ -68,21 +71,21 @@ def __init__(
patterns: Iterable[
Union[PatternOrStr, Tuple[PatternOrStr, Optional[Iterable[str]]]]
],
count_prematcher_false_positives=False,
count_prematcher_false_positives: bool = False,
):
"""
"""Create a new `RegexMatcher` instance.
Parameters
----------
patterns : list of patterns or (pattern, prematchers) tuples
patterns: list of patterns or (pattern, prematchers) tuples
The patterns to match against. Patterns may either be instances of
`re.Pattern` (results from `re.compile`) or strings.
If given as list of `(pattern, prematchers)` tuples, `prematchers`
are custom prematchers (iterables of strings) or `None` for automatic
prematchers using `generate_prematchers`. To disable prematchers for
a specific pattern (ie., always run the "slow" matcher without any
prematching), use a `(pattern, []`) tuple.
count_prematcher_false_positives : bool, default: False
If true, enable "profiling" to check the effectiveness of prematchers on
count_prematcher_false_positives: If true, enable "profiling" to check the effectiveness of prematchers on
the input strings given to ``search``, ``match``, and ``fullmatch``.
Use ``format_prematcher_false_positives`` to retrieve the profile.
"""
Expand Down Expand Up @@ -163,16 +166,16 @@ def _make_automaton(enumerated_patterns):
)
return _ahocorasick_make_automaton(pattern_candidates_by_prematchers)

def run(self, match_func, s, enable_prematchers=True):
def run(self, match_func: Callable[[Pattern, str], re.Match], s: str, enable_prematchers: bool = True):
"""Quickly run `match_func` against `s` for all patterns.
Parameters
----------
match_func : Callable[str] -> Match
match_func
The base matching function, eg. `re.search`.
s : str
s
The string to match against.
enable_prematchers : bool (default True)
enable_prematchers
If false, do not use prematchers; use `match_func` only.
"""
if enable_prematchers:
Expand Down Expand Up @@ -322,8 +325,7 @@ def _sre_find_terminals(sre_ast):


def _ahocorasick_make_automaton(words: Dict[str, V]) -> "ahocorasick.Automaton[V]":
"""Make an ahocorasick automaton from a dictionary of `needle -> value`
items."""
"""Make an ahocorasick automaton from a dictionary of `needle -> value` items."""
automaton = ahocorasick.Automaton() # type: ahocorasick.Automaton[V]
for word, value in words.items():
_ahocorasick_ensure_successful(automaton.add_word(word, value))
Expand Down
Loading

0 comments on commit 5c60bee

Please sign in to comment.