Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ doc/sg_execution_times.rst
.DS_Store
doc/_templates/demo_table_report_generated.html
doc/reference/*.rst
doc/benchmark_indications.rst

# Pkl files for benchmarks
benchmarks/*.pkl
Expand Down
56 changes: 36 additions & 20 deletions benchmarks/README.md → benchmarks/benchmark_indications.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,43 @@
# skrub benchmarks
Benchmarks
==========

## Objectives
Objectives
----------

The benchmark folder is by the skrub maintainers to:

This folder contains benchmarks used by the skrub maintainers to:
- Experiment on new algorithms
- Validate decisions based on empirical evidence
- Tune (hyper)parameters in the library

These benchmarks do not aim at replacing the tests within skrub.

## Implementing a benchmark
Implementing a benchmark
------------------------

A mini-framework consisting of a few functions is made available under `utils`.
A mini-framework consisting of a few functions is made available under ``utils``.

Check out other benchmarks to see how they are used.

## Launching a benchmark
Launching a benchmark
------------------------

.. note::

> Launching a benchmark is usually something you don't want to do as a user.
Launching a benchmark is usually something you don't want to do as a user.
Benchmarks are long and expensive to run. Their code is provided for reproducibility.

Each one implements a standard command-line interface with the at least the two
commands ``--run`` and ``--plot``.

Although, before launching, you should make sure the environment is properly setup.
First, install the required packages -- we recommend installing the latest versions
for everything (skip `--upgrade` if you don't want to):
for everything (skip ``--upgrade`` if you don't want to):

.. code:: sh

pip install -e --upgrade .[benchmarks]

```bash
pip install -e --upgrade .[benchmarks]
```

It has also been reported that Python >=3.9 is required.

Expand All @@ -38,21 +46,29 @@ docstring to see if it requires any additional setup.
Usually, you will find a date, which might be relevant, and sometimes, a commit
hash. You can use it to checkout the code at the time the benchmark was run:

```bash
git checkout <commit_hash>
```
.. code:: sh

git checkout <commit_hash>

Finally, you can launch the benchmark with the ``--run`` command:

```bash
python bench_tablevectorizer_tuning.py --run
```
.. code:: sh

### Analyzing results
python bench_tablevectorizer_tuning.py --run

The results of the benchmarks ran by maintainers are pushed in the `results/`
folder in a `parquet` format.

Analyzing results
~~~~~~~~~~~~~~~~~

The results of the benchmarks ran by maintainers are pushed in the ``results/``
folder in a ``parquet`` format.

As mentioned earlier, benchmarks implement a ``--plot`` option used to display
the results visually. Using ``--plot`` without ``--run`` allows you to plot
the saved results without re-running the benchmark.

Format
------

Results are saved with the format ``<name>-<YYYYMMDD>.parquet`` in the subfolder
``results``.
5 changes: 0 additions & 5 deletions benchmarks/results/README.md

This file was deleted.

1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
shutil.copyfile("../RELEASE_PROCESS.rst", "RELEASE_PROCESS.rst")
shutil.copyfile("../CHANGES.rst", "CHANGES.rst")
shutil.copyfile("../CONTRIBUTING.rst", "CONTRIBUTING.rst")
shutil.copyfile("../benchmarks/benchmark_indications.rst", "benchmark_indications.rst")

# -- General configuration ------------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions doc/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ facilitate learning on databases.
CONTRIBUTING
tutorial_example
RELEASE_PROCESS
benchmark_indications
2 changes: 0 additions & 2 deletions examples/README.txt

This file was deleted.