Skip to content

Code Map Operations for sdk not found #461

@Bukunmi2108

Description

@Bukunmi2108

Hi! I am having issues using the Code Map Operation in the docetl python sdk..

I can't import it from docetl.api bacause it is not exported there and this import below is so problematic. I have to provide the runner and the config and all.

from docetl.operations.code_operations import CodeMapOperation

Is there a more graceful way of using the docetl codemap operations?

(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:265 in │
│ │
│ 262 │ │ traceback.print_exc() │
│ 263 │
│ 264 if name == "main": │
│ ❱ 265 │ build_and_run_pipeline() │
│ 266 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:182 in │
│ build_and_run_pipeline │
│ │
│ 179 │ │ │ ), │
│ 180 │ │ │ │
│ 181 │ │ │ # 5. CODE OP (Sequential IDs) │
│ ❱ 182 │ │ │ CodeMapOperation( │
│ 183 │ │ │ │ name="assign_ids", │
│ 184 │ │ │ │ type="code_map", │
│ 185 │ │ │ │ inputs=["flatten_clauses"], │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseOperation.init() missing 4 required positional arguments: 'runner', 'config', 'default_model', and 'max_threads'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:267 in │
│ │
│ 264 │ │ traceback.print_exc() │
│ 265 │
│ 266 if name == "main": │
│ ❱ 267 │ build_and_run_pipeline() │
│ 268 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:182 in │
│ build_and_run_pipeline │
│ │
│ 179 │ │ │ ), │
│ 180 │ │ │ │
│ 181 │ │ │ # 5. CODE OP (Sequential IDs) │
│ ❱ 182 │ │ │ CodeMapOperation( │
│ 183 │ │ │ │ name="assign_ids", │
│ 184 │ │ │ │ type="code_map", │
│ 185 │ │ │ │ inputs=["flatten_clauses"], │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: BaseOperation.init() missing 2 required positional arguments: 'runner' and 'config'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗ PYTHONPATH=. python app/core/large_contract_group_clauses.py
Found 5 potential Articles in source text.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:269 in │
│ │
│ 266 │ │ traceback.print_exc() │
│ 267 │
│ 268 if name == "main": │
│ ❱ 269 │ build_and_run_pipeline() │
│ 270 │
│ │
│ /home/bukunmi/qanooni/docetl_pipeline/app/core/large_contract_group_clauses.py:189 in │
│ build_and_run_pipeline │
│ │
│ 186 │ │ │ │ default_model="gemini/gemini-2.5-pro", │
│ 187 │ │ │ │ max_threads=os.cpu_count(), │
│ 188 │ │ │ │ config={}, │
│ ❱ 189 │ │ │ │ runner=DSLRunner(), │
│ 190 │ │ │ │ code=""" │
│ 191 │ │ │ │ │ def transform(df): │
│ 192 │ │ │ │ │ │ # Deterministic ID generation for the whole batch │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: DSLRunner.init() missing 1 required positional argument: 'config'
(docetl-pipeline) ➜ docetl_pipeline git:(master) ✗

# 5. CODE OP (Sequential IDs) CodeMapOperation( name="assign_ids", type="code_map", inputs=["flatten_clauses"], default_model="gemini/gemini-2.5-pro", max_threads=os.cpu_count(), config={}, runner=DSLRunner(), code=""" def transform(df): # Deterministic ID generation for the whole batch df['c_id'] = [f"C-{i+1:05d}" for i in range(len(df))] return df """ ),

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions