Skip to content

Commit

Permalink
feat: add windows support and refactor setup (#11)
Browse files Browse the repository at this point in the history
* ci: add win

* feat: add setup.py download and build

* style: fix with black and isort and ruff

* ci: fix windows commands

* ci: add win to package build

* ci: add `shell: bash` for commands to execute on win

* fix: use str(lang_so_file)

* docs: add windows mentioned
  • Loading branch information
k4black authored Nov 17, 2023
1 parent 8a944ea commit beb64bb
Show file tree
Hide file tree
Showing 11 changed files with 105 additions and 66 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
uses: ./.github/workflows/reusable-build.yml
with:
CIBW_SKIP: "pp* cp36-* cp37-*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux* cp*-win*"
VERSION: ${{ github.ref_name }}
secrets: inherit

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/reusable-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
build-wheels:
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
fail-fast: false
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
Expand Down
15 changes: 10 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
with:
python-version: '3.12'
cache: 'pip' # caching pip dependencies
- name: Install dependencies
- name: Install lib from source and dependencies
run: |
python -m pip install -e .[test]
- name: Run tests
Expand All @@ -52,15 +52,15 @@ jobs:
uses: ./.github/workflows/reusable-build.yml
with:
CIBW_SKIP: "pp* cp36-* cp37-*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux* cp*-win*"
secrets: inherit

full-tests-python:
needs: [fast-tests-python, external-build-workflow]
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
fail-fast: false
name: Test wheel on ${{ matrix.os }} and Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
Expand All @@ -72,16 +72,21 @@ jobs:
path: dist
- name: Show dist files
run: ls -lah ./dist
shell: bash
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip' # caching pip dependencies
- name: Install dependencies
- name: Remove sdist package to force install wheel later
run: |
rm -rf ./dist/*.tar.gz
shell: bash
- name: Install lib and dependencies
run: |
# force install package from local dist directory
pip uninstall -y codebleu || true
# TODO: check the sdist package is not installed
rm -rf ./dist/*.tar.gz
pip install --upgrade --no-deps --no-index --find-links=./dist codebleu
# install dependencies for the package and tests
pip install .[test]
Expand Down
16 changes: 9 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ codebleu/parser/*.so
codebleu/parser/*.dylib
codebleu/parser/*.dll

/codebleu/parser/tree-sitter-c-sharp/
/codebleu/parser/tree-sitter-go/
/codebleu/parser/tree-sitter-java/
/codebleu/parser/tree-sitter-javascript/
/codebleu/parser/tree-sitter-php/
/codebleu/parser/tree-sitter-python/
/codebleu/parser/tree-sitter-ruby/
/tree_sitter/
codebleu/*.so


# Byte-compiled / optimized / DLL files
Expand Down Expand Up @@ -166,10 +175,3 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
/codebleu/parser/tree-sitter-c-sharp/
/codebleu/parser/tree-sitter-go/
/codebleu/parser/tree-sitter-java/
/codebleu/parser/tree-sitter-javascript/
/codebleu/parser/tree-sitter-php/
/codebleu/parser/tree-sitter-python/
/codebleu/parser/tree-sitter-ruby/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![PyPI version](https://badge.fury.io/py/codebleu.svg)](https://badge.fury.io/py/codebleu)


This repository contains an unofficial `CodeBLEU` implementation that supports Linux and MacOS. It is available through `PyPI` and the `evaluate` library.
This repository contains an unofficial `CodeBLEU` implementation that supports `Linux`, `MacOS` and `Windows`. It is available through `PyPI` and the `evaluate` library.

The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS, and multiple improvements have been made to enhance usability

Expand All @@ -28,7 +28,7 @@ The metric has shown higher correlation with human evaluation than `BLEU` and `a
## Installation

As this library require `so` file compilation it is platform dependent.
Currently available for `Linux` (manylinux) and `MacOS` with Python 3.8+.
Currently available for `Linux` (manylinux), `MacOS` and `Windows` with Python 3.8+.

The metrics is available as [pip package](https://pypi.org/project/codebleu/) and can be installed as indicated above:
```bash
Expand Down
6 changes: 3 additions & 3 deletions codebleu/codebleu.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def calc_codebleu(
weights: Tuple[float, float, float, float] = (0.25, 0.25, 0.25, 0.25),
tokenizer: Optional[Callable] = None,
keywords_dir: Path = PACKAGE_DIR / "keywords",
lang_so_file: Path = PACKAGE_DIR / "parser" / "my-languages.so",
lang_so_file: Path = PACKAGE_DIR / "my-languages.so",
) -> Dict[str, float]:
"""Calculate CodeBLEU score
Expand Down Expand Up @@ -69,10 +69,10 @@ def make_weights(reference_tokens, key_word_list):
weighted_ngram_match_score = weighted_ngram_match.corpus_bleu(tokenized_refs_with_weights, tokenized_hyps)

# calculate syntax match
syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang, lang_so_file)
syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang, str(lang_so_file))

# calculate dataflow match
dataflow_match_score = dataflow_match.corpus_dataflow_match(references, hypothesis, lang, lang_so_file)
dataflow_match_score = dataflow_match.corpus_dataflow_match(references, hypothesis, lang, str(lang_so_file))

alpha, beta, gamma, theta = weights
code_bleu_score = (
Expand Down
20 changes: 0 additions & 20 deletions codebleu/parser/build.py

This file was deleted.

11 changes: 0 additions & 11 deletions codebleu/parser/build.sh

This file was deleted.

12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
[build-system]
requires = ["setuptools>=61.0.0", "wheel", "tree-sitter>=0.20.0,<1.0.0"]
requires = ["setuptools>=61.0.0", "wheel", "tree-sitter>=0.20.0,<1.0.0", "requests>=2.0.0,<3.0.0"]
build-backend = "setuptools.build_meta"


[project]
name = "codebleu"
description = "Unofficial CodeBLEU implementation that supports Linux and MacOS available on PyPI."
description = "Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI."
readme = "README.md"
license = {text = "MIT License"}
authors = [
{name = "Konstantin Chernyshev", email = "[email protected]"},
]
keywords = ["codebleu", "code", "bleu", "nlp", "natural language processing", "programming", "evaluate", "evaluation", "code generation", "matrics"]
keywords = ["codebleu", "code", "bleu", "nlp", "natural language processing", "programming", "evaluate", "evaluation", "code generation", "metrics"]
dynamic = ["version"]

requires-python = ">=3.8"
Expand Down Expand Up @@ -77,7 +77,7 @@ warn_redundant_casts = true
warn_unused_ignores = true
warn_unreachable = true
allow_untyped_decorators = true
exclude = ["codebleu/parser/tree-sitter", "codebleu/parser/tree-sitter/python"]
exclude = ["codebleu/parser/tree-sitter", "codebleu/parser/tree-sitter/python", "tree_sitter"]

[tool.pytest.ini_options]
minversion = "6.0"
Expand All @@ -86,7 +86,7 @@ python_files = "test_*.py"
addopts = "--cov=codebleu/ --cov-report term-missing"

[tool.coverage.run]
omit = ["tests/*", "codebleu/parser/tree-sitter/*"]
omit = ["tests/*", "codebleu/parser/tree-sitter/*", "tree_sitter"]


[tool.isort]
Expand All @@ -95,7 +95,7 @@ src_paths = ["codebleu", "tests"]
known_first_party = ["codebleu", "tests"]
line_length = 120
combine_as_imports = true
skip = ["build", "dist", ".venv", ".eggs", ".mypy_cache", ".pytest_cache", ".git", ".tox", ".nox", "codebleu/parser"]
skip = ["build", "dist", ".venv", ".eggs", ".mypy_cache", ".pytest_cache", ".git", ".tox", ".nox", "codebleu/parser", "tree_sitter"]

[tool.black]
line_length=120
Expand Down
75 changes: 69 additions & 6 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,87 @@
import subprocess
from __future__ import annotations

import io
import shutil
import zipfile
from pathlib import Path

import requests
from setuptools import setup
from setuptools.dist import Distribution

from tree_sitter import Language

ROOT = Path(__file__).parent


subprocess.run(
["bash", "build.sh"],
cwd=ROOT / "codebleu" / "parser",
check=True,
tree_sitter_languages = {
"go": "https://github.com/tree-sitter/tree-sitter-go/archive/refs/tags/v0.20.0.zip",
"javascript": "https://github.com/tree-sitter/tree-sitter-javascript/archive/refs/tags/v0.20.1.zip",
"python": "https://github.com/tree-sitter/tree-sitter-python/archive/refs/tags/v0.20.4.zip",
"ruby": "https://github.com/tree-sitter/tree-sitter-ruby/archive/refs/tags/v0.19.0.zip",
"php": "https://github.com/tree-sitter/tree-sitter-php/archive/refs/tags/v0.19.0.zip",
"java": "https://github.com/tree-sitter/tree-sitter-java/archive/refs/tags/v0.20.2.zip",
"c-sharp": "https://github.com/tree-sitter/tree-sitter-c-sharp/archive/refs/tags/v0.20.0.zip",
"c": "https://github.com/tree-sitter/tree-sitter-c/archive/refs/tags/v0.20.6.zip",
"cpp": "https://github.com/tree-sitter/tree-sitter-cpp/archive/refs/tags/v0.20.3.zip",
}


def download_tree_sitter_languages(languages: dict[str, str], languages_folder: Path) -> list[str]:
if languages_folder.exists():
shutil.rmtree(languages_folder)
languages_folder.mkdir(parents=True)

extracted_folders: list[str] = []
for lang, url in languages.items():
# Download the ZIP file
response = requests.get(url)
response.raise_for_status()

# Extract the ZIP file
with zipfile.ZipFile(io.BytesIO(response.content)) as zip_f:
zip_f.extractall(languages_folder)
extracted_folders.append(zip_f.namelist()[0]) # get the name of the extracted folder

return extracted_folders


def build_tree_sitter_languages(languages: dict[str, str], languages_folder: Path, target_lib_file: Path) -> str:
extracted_folders = download_tree_sitter_languages(languages, languages_folder)

Language.build_library(
str(target_lib_file),
[str(languages_folder / lang_folder) for lang_folder in extracted_folders],
)

return str(target_lib_file)


build_tree_sitter_languages(
tree_sitter_languages,
ROOT / "tree_sitter",
ROOT / "codebleu" / "my-languages.so",
)


# tree_sitter_extension = Extension(
# 'codebleu.tree_sitter',
# sources=[],
# include_dirs=[],
# libraries=[],
# extra_objects=[
#
# ],
# )


class PlatformSpecificDistribution(Distribution):
"""Distribution which always forces a binary package with platform name"""

def has_ext_modules(self):
return True


setup(distclass=PlatformSpecificDistribution)
setup(
distclass=PlatformSpecificDistribution,
)
8 changes: 4 additions & 4 deletions tests/test_codebleu.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,17 +65,17 @@ def test_error_when_input_length_mismatch() -> None:
["public static int Sign ( double d ) { return ( float ) ( ( d == 0 ) ? 0 : ( c < 0.0 ) ? - 1 : 1) ; }"],
["public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? - 1 : 1) ; }"],
0.7846,
11/19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2/3,
11 / 19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2 / 3,
0.7019, # Should be 0.7238 if AST=13/21 in the paper, however at the moment tee-sitter AST is 11/19
),
# https://arxiv.org/pdf/2009.10297.pdf "3.4 Two Examples" at the page 4
(
["public static int Sign ( double d ) { return ( float ) ( ( d == 0 ) ? 0 : ( c < 0.0 ) ? - 1 : 1) ;"],
["public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? - 1 : 1) ; }"],
0.7543,
11/19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2/3,
11 / 19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2 / 3,
0.6873, # Should be 0.6973 if AST=13/21 in the paper, however at the moment tee-sitter AST is 11/19
),
# https://arxiv.org/pdf/2009.10297.pdf "3.4 Two Examples" at the page 4
Expand Down

0 comments on commit beb64bb

Please sign in to comment.