Skip to content

Commit

Permalink
add jsonl filepaths
Browse files Browse the repository at this point in the history
Signed-off-by: Khaled Sulayman <[email protected]>
  • Loading branch information
khaledsulayman committed Sep 26, 2024
1 parent a5a8ae6 commit 598d8ce
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions src/instructlab/sdg/utils/chunking.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,10 +138,11 @@ def chunk_pdfs(pdf_docs: List, leaf_node_path: Path):
parsed_pdfs = converter.convert(pdf_docs)
parsed_dicts = [p.render_as_dict() for p in parsed_pdfs]

docling_jsons_path = Path("TODO")
docling_jsons_path = Path("~/docling-jsonls")

for pd in parsed_dicts:
fp = docling_jsons_path / "TODO.jsonl"
# TODO name files better
for i, pd in enumerate(parsed_dicts):
fp = docling_jsons_path / f"docling_{i}.jsonl"

with open(fp, "w") as jsonl_file:

Check warning on line 147 in src/instructlab/sdg/utils/chunking.py

View workflow job for this annotation

GitHub Actions / pylint

W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
for entry in pd:
Expand Down

0 comments on commit 598d8ce

Please sign in to comment.