Skip to content

Commit

Permalink
Merge pull request AntonOsika#719 from pbharrin/moretests
Browse files Browse the repository at this point in the history
  • Loading branch information
ATheorell authored Sep 19, 2023
2 parents 2058edb + 38dd734 commit dd1b94e
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 2 deletions.
17 changes: 17 additions & 0 deletions evals/EVAL_NEW_CODE_RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,20 @@
|:---------------------------|:-------------|:------------------------------------|:-------|
| projects/password_gen_eval | password_gen | check_executable_exits_normally ||
| projects/password_gen_eval | password_gen | check_executable_satisfies_function ||
## 2023-09-18

### Existing Code Evaluation Summary:

| Project | Evaluation | All Tests Pass |
|:----------------------------|:-------------------|:-----------------|
| projects/currency_converter | currency_converter ||
| projects/password_gen_eval | password_gen ||

### Detailed Test Results:

| Project | Evaluation | Test | Pass |
|:----------------------------|:-------------------|:------------------------------------|:-------|
| projects/currency_converter | currency_converter | check_executable_exits_normally ||
| projects/currency_converter | currency_converter | check_executable_satisfies_function ||
| projects/password_gen_eval | password_gen | check_executable_exits_normally ||
| projects/password_gen_eval | password_gen | check_executable_satisfies_function ||
2 changes: 1 addition & 1 deletion evals/eval_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def check_executable_satisfies_function(eval_d: dict) -> bool:
output_satisfies: "tf = lambda a : len(a) == 10"
"""
process = run_executable(eval_d=eval_d)
process_output = process.communicate()[0].strip()
process_output = str(process.communicate()[0].strip(), "utf-8")

exec(eval_d["output_satisfies"])
checking_function_ref = locals().get("tf")
Expand Down
2 changes: 1 addition & 1 deletion evals/evals_new_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def single_evaluate(eval_ob: dict) -> list[bool]:
process.wait() # we want to wait until it finishes.

print("running tests on the newly generated code")
# TODO: test the code we should have an executable name
# test the code with the executable name in the config file
evaluation_results = []
for test_case in eval_ob["expected_results"]:
print(f"checking: {test_case['type']}")
Expand Down
11 changes: 11 additions & 0 deletions evals/new_code_eval.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,15 @@
evaluations:
- name: currency_converter
project_root: "projects/currency_converter"
code_prompt: "Build a currency converter CLI tool in Python using an API for exchange rates. The currency converter should be a python program named currency.py with three required arguments: base currency symbol, target currency symbol and base currency amount. The currency converter will convert the amount in base currency amount to the target currency. The output of the program should only be the amount of target currency. For example the following command: `python currency.py USD CNY 1` should return a number like 7.5."
expected_results:
- type: check_executable_exits_normally
executable_name: "python currency.py"
executable_arguments: "USD CAD 10"
- type: check_executable_satisfies_function
executable_name: "python currency.py"
executable_arguments: "USD CAD 10"
output_satisfies: "tf = lambda a : a.replace('.', '').isnumeric()"
- name: password_gen
project_root: "projects/password_gen_eval"
code_prompt: "Create a password generator CLI tool in Python that generates strong, random passwords based on user-specified criteria, such as length and character types (letters, numbers, symbols). The password generator should be a python program named passwordgenerator.py with two arguments: length, and character types. The character types argument can be one or more of the the following: l for lowercase, u for uppercase, d for digits, and s for symbols."
Expand Down

0 comments on commit dd1b94e

Please sign in to comment.