-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find keywords using perfect hashing #1590
base: main
Are you sure you want to change the base?
Conversation
This updates the keyword lookup algorthm to use the phf crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for thsi @davisp
]; | ||
}; | ||
pub fn lookup(keyword: &str) -> Keyword { | ||
let keyword = keyword.to_ascii_uppercase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to optimize lookup I think we should also avoid this call to to_ascii_upprcase
as it allocates / copies the string which I believe is significiantly more costly than some comparisons
https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html#tymethod.to_ascii_uppercase
https://doc.rust-lang.org/std/primitive.str.html#method.to_uppercase
Since all keywords are ASCII, what if we did the binary search character by character
Here is some brain 🤮 :
let first_char = string.first().to_upper()
// find range of keywords within the
let range = binary_search(keywords, first_char, 0);
// if not unique location, proceed to second character, etc
...
use std::path::Path; | ||
|
||
fn read_keywords() -> Vec<(String, Option<String>)> { | ||
let path = Path::new("src").join("keywords.txt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite clever
As much as I love not rewriting code, I am quite loath to take on a new dependency in this crate. My concerns are
That being said, Are you wiling to try the "compare without allocating approach" described in #1591 (comment)? |
Also an open question regarding maintenance of the phf crate rust-phf/rust-phf#318 |
This updates the keyword lookup algorthm to use the phf crate.
This is one possible approach to solving #1588 at the cost of pulling in a new dependency on phf. I will note that phf is rather small and is being used in a
no_std
configuration, but it is a new dependency which may be enough to not go this route.Local measuring with the sqlparser_bench shows roughly a 5% speedup. I'll open a second PR for comparison with my other approach to just scope the binary search to words starting with the same first letter.