Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export func insert_sig, query_one, query_sig, query_sig_return_distance,tokens2signature,iter,size #25

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Ox0400
Copy link

@Ox0400 Ox0400 commented Jul 11, 2023

No description provided.

@Ox0400 Ox0400 changed the title export func insert_sig, query_one, query_sig, query_sig_return_distance export func insert_sig, query_one, query_sig, query_sig_return_distance,tokens2signature,iter,size Jul 11, 2023
@serega
Copy link
Owner

serega commented Jul 14, 2023

Thanks for the pull request @Ox0400 . I will take a look during the weekend.

@serega
Copy link
Owner

serega commented Jul 23, 2023

Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration.
My intention for Python version to use the python wrappers simhash.py and minhash.py, where you can provide your own tokenizer. Are you using gaoya.gaoya.simhash.SimHash128StringIntIndex directly ?

@Ox0400
Copy link
Author

Ox0400 commented Aug 16, 2023

Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration.你好。好奇心,您打算如何使用新方法?就我个人而言,我只使用项目的 Rust 部分,而不使用 Python 绑定。我实现它们是为了学习 Python/Rust 集成。 My intention for Python version to use the python wrappers simhash.py and minhash.py, where you can provide your own tokenizer. Are you using gaoya.gaoya.simhash.SimHash128StringIntIndex directly ?我希望 Python 版本使用 python 包装器 simhash.pyminhash.py ,您可以在其中提供自己的分词器。您直接使用 gaoya.gaoya.simhash.SimHash128StringIntIndex 吗?

IM

Hi. Curios, how do you plan on using the new methods ? Personally, I only use Rust part of the project, and do not use the Python bindings. I implemented them to learn Python/Rust integration. My intention for Python version to use the python wrappers simhash.py and minhash.py, where you can provide your own tokenizer. Are you using gaoya.gaoya.simhash.SimHash128StringIntIndex directly ?

Hi, yes, I was using SimHash64StringIntIndex, like this.

from gaoya.simhash import SimHashStringIndex

class SimHashTool(SimHashStringIndex):
    def size(self) -> int:
        return self.index.size()

    def iter(self) -> List[Tuple[int, int]]:
        # [(100, 879782272769711604), (101, 879782272769711604)]
        return self.index.iter()

    def par_bulk_tokens2signatures(self, tokens_list: List[List[str]]) -> List[int]:
        return self.index.par_bulk_tokens2signatures(tokens_list)

    def par_bulk_insert_sig_pairs(self, id_sig_pairs: List[Tuple[int, int]]) -> int:
        self.index.par_bulk_insert_sig_pairs(id_sig_pairs)
        return self.size()

    def query_tokens_return_distance(self, tokens: List[str]) -> List[Tuple[int,int]]:
        return self.index.query_tokens_return_distance(tokens)

    def insert_tokens(self, doc_id: int, tokens: List[str]) -> None:
        self.index.insert_tokens(doc_id, tokens)

    def par_bulk_insert_tokens_pairs(self, id_tokens_pairs: List[Tuple[int, List[str]]]) -> int:
        self.index.par_bulk_insert_tokens_pairs(id_tokens_pairs)
        return self.size()
    # more functions ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants