Skip to content

Commit

Permalink
doc: Update readme, add contributing.md, add git cliff to generate ch…
Browse files Browse the repository at this point in the history
…angelog, add pre-commit to automatically fmt and update changelog
  • Loading branch information
vemonet committed Dec 21, 2023
1 parent 8391056 commit 61da9f9
Show file tree
Hide file tree
Showing 7 changed files with 300 additions and 86 deletions.
12 changes: 1 addition & 11 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,3 @@
# Generated by Cargo
# will have compiled files and executables
/target/

# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here http://doc.crates.io/guide.html#cargotoml-vs-cargolock
Cargo.lock

# These are backup files generated by rustfmt
**/*.rs.bk
/target/
**/*.rs.bk
Cargo.lock
tarpaulin-report.html
34 changes: 34 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-added-large-files
name: 🐘 Check for added large files
- id: check-toml
name: ✔️ Check TOML
- id: check-yaml
name: ✔️ Check YAML
args:
- --unsafe
- id: end-of-file-fixer
name: 🪚 Fix end of files
- id: trailing-whitespace
name: ✂️ Trim trailing whitespaces
- repo: local
hooks:
- id: rustfmt
name: 🦀 Format Rust files
description: Check if all files follow the rustfmt style
entry: cargo fmt
language: system
pass_filenames: false
- id: git-cliff
name: 🏔️ Update changelog
entry: git cliff -o CHANGELOG.md
language: system
pass_filenames: false
ci:
autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
autoupdate_commit_msg: ⬆ [pre-commit.ci] pre-commit autoupdate
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# 📜 Changelog

All notable changes to this project will be documented in this file.

## [unreleased]

### ⛰️ Features

- Add functions to find_prefixes, find_postfixes, and find_longest_prefix. Rename files - ([ccbae67](https://github.com/vemonet/ptrie/commit/ccbae673304c0d052e8625f2040b2a2005afc408))

### 🧪 Testing

- Improve tests, add GitHub actions workflows for testing and releasing, remove travis CI, update benchmark script - ([8391056](https://github.com/vemonet/ptrie/commit/839105644ff00e1ac9a8fee08bf0c5f6eb2fddf8))

## [0.4.0](https://github.com/vemonet/ptrie/compare/0.3.0..0.4.0) - 2018-07-09

### ⚡ Performance

- Extracts values to a vector from nodes - ([8032921](https://github.com/vemonet/ptrie/commit/8032921117659093525956f35b0bee8c2b508b5b))
- Improves the perfomance - ([1542fd9](https://github.com/vemonet/ptrie/commit/1542fd90728d6e4c5123af031b635d9c7e282e81))
- Fixes the case of existing value overriding in the trie - ([89c08ad](https://github.com/vemonet/ptrie/commit/89c08ad74d7994efc97f307f46c78b537e80a3c2))

### 🎨 Styling

- Fixes formatting with rust-fmt - ([57b8acf](https://github.com/vemonet/ptrie/commit/57b8acf6ddaed88c391a7548982fcef8fa7eb491))

## [0.3.0](https://github.com/vemonet/ptrie/compare/0.2.1..0.3.0) - 2017-12-19

### ⚙️ Miscellaneous Tasks

- Improves the performance by keys localization in memory

Previous version of the TrieNode structure caused cache miss on each
comparison iteration.

Placing the child key in the node itself makes these comparisons much
faster because they keys are localized in CPU cache
- ([2cc8e88](https://github.com/vemonet/ptrie/commit/2cc8e882f32e99044b8e6a89a236de4accb9f5b0))

## [0.2.1](https://github.com/vemonet/ptrie/compare/0.2.0..0.2.1) - 2017-12-19

## [0.2.0](https://github.com/vemonet/ptrie/compare/0.1.2..0.2.0) - 2017-12-17

## [0.1.2](https://github.com/vemonet/ptrie/compare/0.1.1..0.1.2) - 2017-12-12

## [0.1.1](https://github.com/vemonet/ptrie/tree/0.1.1) - 2017-12-12

<!-- generated by git-cliff -->
77 changes: 77 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# 🧑‍💻 Contributing

The usual process to make a contribution is to:

1. Check for existing related [issues on GitHub](https://github.com/vemonet/ptrie/issues)
2. [Fork](https://github.com/vemonet/ptrie/fork) the repository and create a new branch
3. Make your changes
4. Make sure formatting, linting and tests passes.
5. Add tests if possible to cover the lines you added.
6. Commit, and send a Pull Request.

## 🛠️ Development

Install dependencies:

```bash
rustup update
rustup toolchain install nightly
rustup component add rustfmt clippy
cargo install cargo-tarpaulin git-cliff cargo-outdated
pipx install pre-commit
pre-commit install
```

### 🧪 Tests

Run tests:

```bash
cargo test
```

Tests with coverage:

```bash
cargo tarpaulin -p ptrie --doc --tests --out html
```

> Start web server for the cov report: `python -m http.server`
### 📚 Docs

Generate docs locally:

```bash
cargo doc --all --all-features
```

> Start web server for the generated docs: `python -m http.server --directory target/doc`
### ⏱️ Benchmark

Running benchmarks requires to enable rust nightly: `rustup default nightly`

```bash
cargo bench
```

## 🏷️ New release

Publishing artifacts will be done by the `build.yml` workflow, make sure you have set the following tokens as secrets for this repository: `CRATES_IO_TOKEN`, `CODECOV_TOKEN`

1. Make sure dependencies have been updated:

```bash
cargo update
cargo outdated
```

2. Bump the version in the `Cargo.toml` file, create a new tag with `git`, and update changelog using [`git-cliff`](https://git-cliff.org):

```bash
git tag -a 0.5.0 -m "v0.5.0"
git cliff -o CHANGELOG.md
```

3. Commit, and push. The `release.yml` workflow will automatically create the release on GitHub, and publish to crates.io.
138 changes: 65 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,94 @@
# GTrie
<h1 align="center">
🎄 Prefix Trie
</h1>

[![Build Status](https://travis-ci.org/aserebryakov/trie-rs.svg?branch=master)](https://travis-ci.org/aserebryakov/trie-rs)
<p align="center">
<a href="https://crates.io/crates/ptrie">
<img alt="Crates.io" src="https://img.shields.io/crates/v/ptrie" />
</a>
<a href="https://github.com/vemonet/ptrie/actions/workflows/test.yml">
<img alt="Test" src="https://github.com/vemonet/ptrie/actions/workflows/test.yml/badge.svg" />
</a>
<a href="https://github.com/vemonet/ptrie/actions/workflows/release.yml">
<img alt="Release" src="https://github.com/vemonet/ptrie/actions/workflows/release.yml/badge.svg" />
</a>
<a href="https://docs.rs/ptrie">
<img alt="Documentation" src="https://docs.rs/ptrie/badge.svg" />
</a>
<a href="https://codecov.io/gh/vemonet/ptrie/branch/main">
<img src="https://codecov.io/gh/vemonet/ptrie/branch/main/graph/badge.svg" alt="Codecov status" />
</a>
<a href="https://github.com/vemonet/ptrie/blob/main/LICENSE">
<img alt="MIT license" src="https://img.shields.io/badge/License-MIT-brightgreen.svg" />
</a>
</p>

Trie is the library that implements the [trie](https://en.wikipedia.org/wiki/Trie).
`PTrie` is a versatile implementation of the [trie data structure](https://en.wikipedia.org/wiki/Trie), tailored for efficient prefix searching within a collection of objects, such as strings, with no dependencies.

Trie is a generic data structure, written `Trie<T, U>` where `T` is node key type and `U` is a
value type.
The structure is defined as `Trie<K, V>`, where `K` represents the type of keys in each node, and `V` is the type of the associated values.

## 💭 Motivation

# Motivation
The trie is particularly effective for operations involving common prefix identification and retrieval, making it a good choice for applications that require fast and efficient prefix-based search functionalities.

Trie may be faster than other data structures in some cases.
## 🚀 Usage

For example, `Trie` may be used as a replacement for `std::HashMap` in case of a dictionary where
the number of words in dictionary is significantly less than number of different words in the
input and matching probability is low.
### ✨ Find prefixes


# Usage
PTrie can return all prefixes in the trie corresponding to a given string, sorted in ascending order of their length.

```rust
use gtrie::Trie;

let mut t = Trie::new();

t.insert("this".chars(), 1);
t.insert("trie".chars(), 2);
t.insert("contains".chars(), 3);
t.insert("a".chars(), 4);
t.insert("number".chars(), 5);
t.insert("of".chars(), 6);
t.insert("words".chars(), 7);

assert_eq!(t.contains_key("number".chars()), true);
assert_eq!(t.contains_key("not_existing_key".chars()), false);
assert_eq!(t.get_value("words".chars()), Some(7));
assert_eq!(t.get_value("none".chars()), None);
```
use ptrie::Trie;

# Benchmarks
let mut trie = Trie::new();

Benchmark `std::HashMap<String, String>` vs `gtrie::Trie` shows that `Trie` is
significantly faster in the case of key mismatch but significantly slower in the case of
matching key.
trie.insert("a".bytes(), "A");
trie.insert("ab".bytes(), "AB");
trie.insert("abc".bytes(), "ABC");
trie.insert("abcde".bytes(), "ABCDE");

let prefixes = trie.find_prefixes("abcd".bytes());
assert_eq!(prefixes, vec!["A", "AB", "ABC"]);
```
$ cargo bench
test hash_map_massive_match ... bench: 150,127 ns/iter (+/- 12,986)
test hash_map_massive_mismatch_on_0 ... bench: 93,246 ns/iter (+/- 5,108)
test hash_map_massive_mismatch_on_0_one_symbol_key ... bench: 93,706 ns/iter (+/- 5,908)
test hash_map_match ... bench: 24 ns/iter (+/- 3)
test hash_map_mismatch ... bench: 20 ns/iter (+/- 0)
test trie_massive_match ... bench: 231,343 ns/iter (+/- 4,940)
test trie_massive_mismatch_on_0 ... bench: 28,743 ns/iter (+/- 8,401)
test trie_massive_mismatch_on_1 ... bench: 28,734 ns/iter (+/- 1,839)
test trie_massive_mismatch_on_2 ... bench: 28,760 ns/iter (+/- 2,582)
test trie_massive_mismatch_on_3 ... bench: 28,829 ns/iter (+/- 2,504)
test trie_match ... bench: 10 ns/iter (+/- 1)
test trie_mismatch ... bench: 5 ns/iter (+/- 0)
```

## Important

Search performance is highly dependent on the data stored in `Trie` and may be
as significantly faster than `std::HashMap` as significantly slower.


# Contribution

Source code and issues are hosted on GitHub:

https://github.com/aserebryakov/trie-rs
### 🔍 Find postfixes

PTrie can also find all strings in the trie that begin with a specified prefix.

# License

[MIT License](https://opensource.org/licenses/MIT)
```rust
use ptrie::Trie;

let mut trie = Trie::new();

# Changelog
trie.insert("app".bytes(), "App");
trie.insert("apple".bytes(), "Apple");
trie.insert("applet".bytes(), "Applet");
trie.insert("apricot".bytes(), "Apricot");

#### 0.4.0
let strings = trie.find_postfixes("app".bytes());
assert_eq!(strings, vec!["App", "Apple", "Applet"]);
```

* Significant performance improvement due to switch to data oriented model
### 🔑 Key-based Retrieval Functions

#### 0.3.0
PTrie provides functions to check for the existence of a key and to retrieve the associated value.

* Significantly improved performance of the key mismatch case
* API is updated to be closer to `std::HashMap`
```rust
use ptrie::Trie;

#### 0.2.1
let mut trie = Trie::new();
trie.insert("app".bytes(), "App");

* Benchmarks are improved
assert!(trie.contains_key("app".bytes()));
assert!(!trie.contains_key("not_existing_key".bytes()));
assert_eq!(trie.get_value("app".bytes()), Some("App"));
assert_eq!(trie.get_value("none".bytes()), None);
```

#### 0.2.0
## 🏷️ Features

* API is updated to be closer to `std::HashMap`
The `serde` feature adds Serde `Serialize` and `Deserialize` traits to the `Trie` and `TrieNode` struct.

#### 0.1.1
## 📜 License

* Basic trie implentation
[MIT License](https://opensource.org/licenses/MIT)
3 changes: 1 addition & 2 deletions benches/benchmark.rs
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,6 @@ fn trie_postfixes_match(b: &mut Bencher) {
})
}


#[bench]
fn trie_prefix_longest_match(b: &mut Bencher) {
let mut t = ptrie::Trie::new();
Expand All @@ -242,4 +241,4 @@ fn trie_prefix_longest_match(b: &mut Bencher) {
assert!(t.find_longest_prefix(key.bytes()).is_some());
}
})
}
}
Loading

0 comments on commit 61da9f9

Please sign in to comment.