diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
new file mode 100644
index 0000000..9c54df5
--- /dev/null
+++ b/.github/workflows/deploy.yml
@@ -0,0 +1,52 @@
+name: Deploy
+on:
+ push:
+ branches:
+ - doc # TODO: change to tag only
+
+jobs:
+ deploy:
+ runs-on: ubuntu-latest
+ permissions:
+ contents: write # To push a branch
+ pages: write # To push to a GitHub Pages site
+ id-token: write # To update the deployment status
+ steps:
+ - uses: actions/checkout@v4
+ with:
+ fetch-depth: 0
+ - name: Install latest mdbook
+ run: |
+ tag=$(curl 'https://api.github.com/repos/rust-lang/mdbook/releases/latest' | jq -r '.tag_name')
+ url="https://github.com/rust-lang/mdbook/releases/download/${tag}/mdbook-${tag}-x86_64-unknown-linux-gnu.tar.gz"
+ mkdir mdbook
+ curl -sSL $url | tar -xz --directory=./mdbook
+ echo `pwd`/mdbook >> $GITHUB_PATH
+ # - name: Install latest mdbook-pagetoc
+ # run: |
+ # tag=$(curl 'https://api.github.com/repos/slowsage/mdbook-pagetoc/releases/latest' | jq -r '.tag_name')
+ # url="https://github.com/slowsage/mdbook-pagetoc/releases/download/${tag}/mdbook-pagetoc-${tag}-x86_64-unknown-linux-gnu.tar.gz"
+ # curl -sSL $url | tar -xz --directory=./mdbook
+ - name: Install latest mdbook-pagetoc
+ uses: baptiste0928/cargo-install@v2
+ with:
+ crate: mdbook-pagetoc
+ locked: false
+ - name: Run tests
+ run: |
+ cd doc
+ mdbook test
+ - name: Build Book
+ run: |
+ cd doc
+ mdbook build
+ - name: Setup Pages
+ uses: actions/configure-pages@v2
+ - name: Upload artifact
+ uses: actions/upload-pages-artifact@v1
+ with:
+ # Upload entire repository
+ path: 'doc/book'
+ - name: Deploy to GitHub Pages
+ id: deployment
+ uses: actions/deploy-pages@v1
\ No newline at end of file
diff --git a/README.md b/README.md
index 3a0d204..9ff6444 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1 @@
# Sitemap Web Scraper
-
-## Bash completion
-
-Source the completion script in your `~/.bashrc` file:
-
-```bash
-echo 'source <(sws completion)' >> ~/.bashrc
-```
diff --git a/doc/.gitignore b/doc/.gitignore
new file mode 100644
index 0000000..927206b
--- /dev/null
+++ b/doc/.gitignore
@@ -0,0 +1,4 @@
+book
+theme/index.hbs
+theme/pagetoc.css
+theme/pagetoc.js
\ No newline at end of file
diff --git a/doc/book.toml b/doc/book.toml
new file mode 100644
index 0000000..d155b15
--- /dev/null
+++ b/doc/book.toml
@@ -0,0 +1,12 @@
+[book]
+authors = ["Romain Leroux"]
+language = "en"
+multilingual = false
+src = "src"
+title = "Sitemap Web Scraper"
+
+# https://crates.io/crates/mdbook-pagetoc
+[preprocessor.pagetoc]
+[output.html]
+additional-css = ["theme/pagetoc.css"]
+additional-js = ["theme/pagetoc.js"]
\ No newline at end of file
diff --git a/doc/src/README.md b/doc/src/README.md
new file mode 100644
index 0000000..c4aaf31
--- /dev/null
+++ b/doc/src/README.md
@@ -0,0 +1,39 @@
+# Introduction
+
+Sitemap Web Scraper, or [sws][], is a tool for simple, flexible, and yet performant web
+pages scraping. It consists of a [CLI][] that executes a [Lua][] [JIT][lua-jit] script
+and outputs a [CSV][] file.
+
+All the logic for crawling/scraping is defined in Lua and executed on a multiple threads
+in [Rust][]. The actual parsing of HTML is done in Rust. Standard [CSS
+selectors][css-sel] are also implemented in Rust (using Servo's [html5ever][] and
+[selectors][]). Both functionalities are accessible through a Lua API for flexible
+scraping logic.
+
+As for the crawling logic, multiple seeding options are available: [robots.txt][robots],
+[sitemaps][], or a custom HTML pages list. By default, sitemaps (either provided or
+extracted from `robots.txt`) will be crawled recursively and the discovered HTML pages
+will be scraped with the provided Lua script. It's also possible to dynamically add page
+links to the crawling queue when scraping an HTML page. See the [crawl][sub-crawl]
+subcommand and the [Lua scraper][lua-scraper] for more details.
+
+Besides, the Lua scraping script can be used on HTML pages stored as local files,
+without any crawling. See the [scrap][sub-scrap] subcommand doc for more details.
+
+Furthermore, the CLI is composed of `crates` that can be used independently in a custom
+Rust program.
+
+[sws]: https://github.com/lerouxrgd/sws
+[cli]: https://en.wikipedia.org/wiki/Command-line_interface
+[rust]: https://www.rust-lang.org/
+[lua]: https://www.lua.org/
+[lua-jit]: https://luajit.org/
+[csv]: https://en.wikipedia.org/wiki/Comma-separated_values
+[css-sel]: https://www.w3schools.com/cssref/css_selectors.asp
+[html5ever]: https://crates.io/crates/html5ever
+[selectors]: https://crates.io/crates/selectors
+[robots]: https://en.wikipedia.org/wiki/Robots.txt
+[sitemaps]: https://www.sitemaps.org/
+[sub-crawl]: ./crawl_overview.html
+[sub-scrap]: ./scrap_overview.html
+[lua-scraper]: ./lua_scraper.html
diff --git a/doc/src/SUMMARY.md b/doc/src/SUMMARY.md
new file mode 100644
index 0000000..3176c5e
--- /dev/null
+++ b/doc/src/SUMMARY.md
@@ -0,0 +1,13 @@
+# Summary
+
+[Introduction](README.md)
+
+[Getting Started](getting_started.md)
+
+- [Subcommand: crawl](./crawl_overview.md)
+ - [Crawler Configuration](./crawl_config.md)
+
+- [Subcommand: scrap](./scrap_overview.md)
+
+- [Lua Scraper](./lua_scraper.md)
+ - [Lua API Overview](./lua_api_overview.md)
diff --git a/doc/src/crawl_config.md b/doc/src/crawl_config.md
new file mode 100644
index 0000000..395b7f1
--- /dev/null
+++ b/doc/src/crawl_config.md
@@ -0,0 +1,83 @@
+# Crawler Config
+
+The crawler configurable parameters are:
+
+| Parameter | Default | Description |
+|----------------|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| user_agent | "SWSbot" | The `User-Agent` header that will be used in all HTTP requests |
+| page_buffer | 10_000 | The size of the pages download queue. When the queue is full new downloads are on hold. This parameter is particularly relevant when using concurrent throttling. |
+| throttle | `Concurrent(100)` if `robot` is `None`
Otherwise `Delay(N)` where `N` is read from `robots.txt` field `Crawl-delay: N` | A throttling strategy for HTML pages download.
`Concurrent(N)` means at max `N` downloads at the same time, `PerSecond(N)` means at max `N` downloads per second, `Delay(N)` means wait for `N` seconds betwen downloads |
+| num_workers | max(1, num_cpus-2) | The number of CPU cores that will be used for scraping page in parallel using the provided Lua script. |
+| on_dl_error | `SkipAndLog` | Behaviour when an error occurs while downloading an HTML page. Other possible value is `Fail`. |
+| on_xml_error | `SkipAndLog` | Behaviour when an error occurs while processing a XML sitemap. Other possible value is `Fail`. |
+| on_scrap_error | `SkipAndLog` | Behaviour when an error occurs while scraping an HTML page in Lua. Other possible value is `Fail`. |
+| robot | `None` | An optional `robots.txt` URL used to retrieve a specific `Throttle::Delay`.
⚠ Conflicts with `seedRobotsTxt` in [Lua Scraper][lua-scraper], meaning that when `robot` is defined the `seed` cannot be a robot too. |
+
+These parameters can be changed through Lua script or CLI arguments.
+
+The priority order is: `CLI (highest priority) > Lua > Default values`
+
+[lua-scraper]: ./lua_scraper.html#seed-definition
+
+## Lua override
+
+You can override parameters in Lua through the global variable `sws.crawlerConfig`.
+
+| Parameter | Lua name | Example Lua value |
+|----------------|--------------|-------------------------------------|
+| user_agent | userAgent | "SWSbot" |
+| page_buffer | pageBuffer | 10000 |
+| throttle | throttle | { Concurrent = 100 } |
+| num_workers | numWorkers | 4 |
+| on_dl_error | onDlError | "SkipAndLog" |
+| on_xml_error | onXmlError | "Fail" |
+| on_scrap_error | onScrapError | "SkipAndLog" |
+| robot | robot | "https://www.google.com/robots.txt" |
+
+
+Here is an example of crawler configuration parmeters set using Lua:
+
+```lua
+-- You don't have to specify all parameters, only the ones you want to override.
+sws.crawlerConfig = {
+ userAgent = "SWSbot",
+ pageBuffer = 10000,
+ throttle = { Concurrent = 100 }, -- or: { PerSecond = 100 }, { Delay = 2 }
+ numWorkers = 4,
+ onDlError = "SkipAndLog", -- or: "Fail"
+ onXmlError = "SkipAndLog",
+ onScrapError = "SkipAndLog",
+ robot = nil,
+}
+```
+
+## CLI override
+
+You can override parameters through the CLI arguments.
+
+| Parameter | CLI argument name | Example CLI argument value |
+|----------------------|-------------------|-------------------------------------|
+| user_agent | --user-agent | 'SWSbot' |
+| page_buffer | --page-buffer | 10000 |
+| throttle (Concurent) | --conc-dl | 100 |
+| throttle (PerSecond) | --rps | 10 |
+| throttle (Delay) | --delay | 2 |
+| num_workers | --num-workers | 4 |
+| on_dl_error | --on-dl-error | skip-and-log |
+| on_xml_error | --on-xml-error | fail |
+| on_scrap_error | --on-scrap-error | skip-and-log |
+| robot | --robot | 'https://www.google.com/robots.txt' |
+
+Here is an example of crawler configuration parmeters set using CLI arguments:
+
+```sh
+sws --script path/to/scrape_logic.lua -o results.csv \
+ --user-agent 'SWSbot' \
+ --page-buffer 10000 \
+ --conc-dl 100 \
+ --num-workers 4 \
+ --on-dl-error skip-and-log \
+ --on-xml-error fail \
+ --on-scrap-error skip-and-log \
+ --robot 'https://www.google.com/robots.txt' \
+```
diff --git a/doc/src/crawl_overview.md b/doc/src/crawl_overview.md
new file mode 100644
index 0000000..109b56f
--- /dev/null
+++ b/doc/src/crawl_overview.md
@@ -0,0 +1,23 @@
+# Subcommand: crawl
+
+```text
+Crawl sitemaps and scrap pages content
+
+Usage: sws crawl [OPTIONS] --script