-
Notifications
You must be signed in to change notification settings - Fork 346
Description
Hi! I am having issues using the Code Map Operation in the docetl python sdk..
I can't import it from docetl.api bacause it is not exported there and this import below is so problematic. I have to provide the runner and the config and all.
from docetl.operations.code_operations import CodeMapOperation
Is there a more graceful way of using the docetl codemap operations?
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:265 in │
│ │
│ 262 │ │ traceback.print_exc() │
│ 263 │
│ 264 if name == "main": │
│ ❱ 265 │ build_and_run_pipeline() │
│ 266 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:182 in │
│ build_and_run_pipeline │
│ │
│ 179 │ │ │ ), │
│ 180 │ │ │ │
│ 181 │ │ │ # 5. CODE OP (Sequential IDs) │
│ ❱ 182 │ │ │ CodeMapOperation( │
│ 183 │ │ │ │ name="assign_ids", │
│ 184 │ │ │ │ type="code_map", │
│ 185 │ │ │ │ inputs=["flatten_clauses"], │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseOperation.init() missing 4 required positional arguments: 'runner', 'config', 'default_model', and 'max_threads'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:267 in │
│ │
│ 264 │ │ traceback.print_exc() │
│ 265 │
│ 266 if name == "main": │
│ ❱ 267 │ build_and_run_pipeline() │
│ 268 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:182 in │
│ build_and_run_pipeline │
│ │
│ 179 │ │ │ ), │
│ 180 │ │ │ │
│ 181 │ │ │ # 5. CODE OP (Sequential IDs) │
│ ❱ 182 │ │ │ CodeMapOperation( │
│ 183 │ │ │ │ name="assign_ids", │
│ 184 │ │ │ │ type="code_map", │
│ 185 │ │ │ │ inputs=["flatten_clauses"], │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseOperation.init() missing 2 required positional arguments: 'runner' and 'config'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:269 in │
│ │
│ 266 │ │ traceback.print_exc() │
│ 267 │
│ 268 if name == "main": │
│ ❱ 269 │ build_and_run_pipeline() │
│ 270 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:189 in │
│ build_and_run_pipeline │
│ │
│ 186 │ │ │ │ default_model="gemini/gemini-2.5-pro", │
│ 187 │ │ │ │ max_threads=os.cpu_count(), │
│ 188 │ │ │ │ config={}, │
│ ❱ 189 │ │ │ │ runner=DSLRunner(), │
│ 190 │ │ │ │ code=""" │
│ 191 │ │ │ │ │ def transform(df): │
│ 192 │ │ │ │ │ │ # Deterministic ID generation for the whole batch │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: DSLRunner.init() missing 1 required positional argument: 'config'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗
# 5. CODE OP (Sequential IDs) CodeMapOperation( name="assign_ids", type="code_map", inputs=["flatten_clauses"], default_model="gemini/gemini-2.5-pro", max_threads=os.cpu_count(), config={}, runner=DSLRunner(), code=""" def transform(df): # Deterministic ID generation for the whole batch df['c_id'] = [f"C-{i+1:05d}" for i in range(len(df))] return df """ ),