Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python bindings #948

Open
wants to merge 119 commits into
base: master
Choose a base branch
from
Open

python bindings #948

wants to merge 119 commits into from

Conversation

erip
Copy link
Collaborator

@erip erip commented Jul 18, 2022

Description

Adds python bindings using pybind11 for high-level bindings.

Added dependencies: pybind11

How to test

python3 -m venv .venv && source .venv/bin/activate
pip install pybind11
cd python
python -m pip install .

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

CMakeLists.txt Show resolved Hide resolved
@alvations alvations marked this pull request as ready for review July 19, 2022 21:27
@@ -3,7 +3,10 @@ add_subdirectory(3rd_party)
include_directories(.)
include_directories(3rd_party)
include_directories(3rd_party/SQLiteCpp/include)
include_directories(3rd_party/sentencepiece)
# include_directories(3rd_party/sentencepiece)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented out? I am not sure why spm_encode isn't found in CI on OS X, but I suspect this is the culprit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I've updated GitHub workflows recently, which may fix issues on macOS. I will rebase this branch with the current master and re-run builds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @snukky! It still seems like there's something fishy happening on the spm front...

@snukky
Copy link
Member

snukky commented Nov 2, 2022

Is this still WIP as the title suggests or is it ready for reviews as it's not marked as a draft?

@erip erip changed the title [WIP] python bindings python bindings Nov 2, 2022
@erip
Copy link
Collaborator Author

erip commented Nov 2, 2022

This is ready for review. There are some conflicts, but I won't be able to address them until early next week. The CI failure seems to be related to disk on the OS X build. We'll need to follow a similar recipe as Windows to clear intermediate build directories and this should be OK.

@erip
Copy link
Collaborator Author

erip commented Nov 12, 2022

It seems like the OS X error here is due to 3pp zstr here. I'm not sure why OS X is falling into this branch since L30 should cover the OS X case... Is there a missing cmake flag somewhere?

Copy link
Member

@snukky snukky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks Elijah for this PR again. As it was forever since this PR was open, I'm happy to take over, rebase with the current master and introduce the changes I'm proposing. I had mostly comments on improving documentation of the code.

I think other remaining tasks related to this code would be 1) fixing MacOS compilation, and 2) adding some regression tests on actual models, but we can take care of them in separate PRs.

Thanks!

@@ -49,5 +49,16 @@ jobs:
./marian --version
./marian-decoder --version
./marian-scorer --version
./spm_encode --version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for this to get removed?

@@ -115,6 +115,7 @@ jobs:
-DCOMPILE_CPU=${{ matrix.cpu }} \
-DCOMPILE_CUDA=${{ matrix.gpu }} \
-DCOMPILE_EXAMPLES=${{ matrix.examples }} \
-DUSE_TCMALLOC=OFF \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python bindings doesn't work with TCMalloc? Why? Would be great to have it fixed or at least commented/documented.

Comment on lines -142 to -144
./marian-scorer --version
./marian-server --version
./spm_encode --version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, why do we need to remove these checks?

Comment on lines +137 to +138
cd ..
rd /s /q build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for debugging? I would suggest adding a short comment above explaining what it does and why it is done.

@@ -1,4 +1,3 @@
# Config files from CMake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is still valid, isn't it?

@@ -0,0 +1,27 @@
import sys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a short comment at the top of this file what this script does and a usage example.


public:
virtual ~TranslateService() {}

TranslateService(const std::string& cliString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider adding a short docstring explaining what it is used for.

@@ -0,0 +1,27 @@
import sys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any of those files used for regression tests? Could it easily be?

}
}

std::string run(const std::string& input) override {
std::vector<std::string> run(const std::vector<std::string>& inputs, const std::string& yamlOverridesStr="") override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second parameter deserves a short description in a docstring.

return utils::split(translations, "\n", /*keepEmpty=*/true);
}

std::string run(const std::string& input, const std::string& yamlOverridesStr="") override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function also could receive a docstring.

@emjotde
Copy link
Member

emjotde commented Aug 21, 2023

@snukky let's go ahead with this. We can also recruit @mjpost and @rjai to help.

@emjotde
Copy link
Member

emjotde commented Aug 21, 2023

Oh, and @thammegowda of course

@emjotde
Copy link
Member

emjotde commented Aug 21, 2023

A first thing that I see is that this should use the python package of SentencePiece from src/3rdparty

emjotde pushed a commit that referenced this pull request Aug 5, 2024
* This code is same as  [public github repo tg/pybind-new branch](#1013). Git histories seems slightly different between public and private repo so we are seeing a lot of commits
* This builds on top of work by Elijah #948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants