Skip to content

Conversation

@rcap107
Copy link
Member

@rcap107 rcap107 commented Jan 19, 2026

This PR is improving the dataset fetcher functions. It addresses #1422 by adding the path to the dataset file to the Bunch object returned by the fetcher.

I am also adding the default data folder of skrub to the configuration file, and I'm deprecating the original name SKRUB_DATA_DIRECTORY in favor of SKB_DATA_DIRECTORY, to follow the same format as the other environment variables set by skrub.

@rcap107
Copy link
Member Author

rcap107 commented Jan 20, 2026

Something I did not consider is that some datasets have multiple paths (such as fetch_plane_delays).

For the moment I am returning a list of paths, but then the result is that then single datasets become clunky to load, like

data = fetch_employee_salaries()
df = pd.read_csv(data["paths"][0])

I'm not sure what's the best way to deal with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant