fix: fix predict_proba with context data #254

abon-mostly · 2025-12-12T15:16:59Z

No description provided.

Copilot

Pull request overview

This PR fixes a bug in predict_proba when using context data where the input X contains only the join key. The fix ensures correct sample counts are maintained throughout the probability computation pipeline.

Key Changes

Added explicit n_samples parameter to _generate_marginal_probs function to avoid deriving sample count from potentially concatenated/batched data
Refactored DataFrame construction in multi-target probability computation to use explicit indexing and dict-based approach
Added comprehensive test coverage for single and multi-target probability prediction with context-only input

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File	Description
uv.lock	Version bump from 2.3.1 to 2.3.3 reflecting the bug fix
mostlyai/engine/_tabular/probability.py	Core fix: added `n_samples` parameter and refactored DataFrame construction to handle context-only scenarios correctly
tests/end_to_end/test_tabular_interface.py	Added test case validating predict_proba behavior when X contains only join keys with context data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T16:23:42Z

tests/end_to_end/test_tabular_interface.py

+            X=test_df[["id"]], ctx_data=test_df[["id", "ctx_a"]], target=["target_b", "target_c"]
+        )
+        assert proba_multi.shape[0] == 10
+        assert proba_multi.shape[1] == 16  # 4 × 4 = 16 combinations


The comment states "4 × 4 = 16 combinations" but the test data defines both target_b and target_c with only 3 unique values each (["b1", "b2", "b3"] and ["c1", "c2", "c3"]). The expected shape should be 3 × 3 = 9 combinations, not 16. However, the actual model may include additional tokens (like special tokens for unknown values, padding, etc.), so the assertion value of 16 may be correct depending on the encoding. The comment should either be corrected to reflect the actual number of combinations being tested, or use a >= comparison like the single target test does.

Suggested change

assert proba_multi.shape[1] == 16 # 4 × 4 = 16 combinations

assert proba_multi.shape[1] >= 9 # 3 × 3 = 9 combinations (may include extra tokens)

abon-mostly added 2 commits December 12, 2025 16:15

wip

8e651f2

wip

58a12cd

mplatzer changed the title ~~Fix predict_proba with context data~~ fix: fix predict_proba with context data Dec 12, 2025

abon-mostly added 2 commits December 12, 2025 16:37

wip

5281600

wip

cb96430

mplatzer requested a review from Copilot December 12, 2025 16:21

Copilot started reviewing on behalf of mplatzer December 12, 2025 16:21 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

wip

a336bd2

mplatzer merged commit f1a884f into main Dec 12, 2025
7 checks passed

mplatzer deleted the fix-predict-proba-with-ctx branch December 12, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix predict_proba with context data #254

fix: fix predict_proba with context data #254

Uh oh!

abon-mostly commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	assert proba_multi.shape[1] == 16 # 4 × 4 = 16 combinations
	assert proba_multi.shape[1] >= 9 # 3 × 3 = 9 combinations (may include extra tokens)

fix: fix predict_proba with context data #254

fix: fix predict_proba with context data #254

Uh oh!

Conversation

abon-mostly commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants