-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shuffle
kwarg to GroupBy.map
#9706
Draft
dcherian
wants to merge
47
commits into
pydata:main
Choose a base branch
from
dcherian:groupby-map-shuffle
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+610
−46
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* main: Revise (pydata#9366) Fix rechunking to a frequency with empty bins. (pydata#9364) whats-new entry for dropping python 3.9 (pydata#9359) drop support for `python=3.9` (pydata#8937) Revise (pydata#9357) try to fix scheduled hypothesis test (pydata#9358)
* main: Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
* main: Adds copy parameter to __array__ for numpy 2.0 (pydata#9393) `numpy 2` compatibility in the `pydap` backend (pydata#9391) pyarrow dependency added to doc environment (pydata#9394) Extend padding functionalities (pydata#9353) refactor GroupBy internals (pydata#9389) Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274) passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377) Fix tests on big-endian systems (pydata#9380) Improve error message on `ds['x', 'y']` (pydata#9375)
* main: Accessibility: Add keyboard handling for XArray HTML view (pydata#9412) [pre-commit.ci] pre-commit autoupdate (pydata#9316) [skip-ci] Speed up docs build by limiting toctrees (pydata#9395) fix the failing `pre-commit.ci` runs (pydata#9411) Update benchmarks.yml (pydata#9406) GroupBy(multiple groupers) (pydata#9372) Encode/decode property tests use variables() (pydata#9401)
This reverts commit 7a99c8f.
* main: (29 commits) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) DataTree should not be "Generic" (pydata#9445) Disallow passing a DataArray as data into the DataTree constructor (pydata#9444) Support additional dtypes in `resample` (pydata#9413) Shallow copy parent and children in DataTree constructor (pydata#9297) Bump minimum versions for dependencies (pydata#9434) Always include at least one category in random test data (pydata#9436) Avoid deep-copy when constructing groupby codes (pydata#9429) ...
* main: Opt out of floor division for float dtype time encoding (pydata#9497) fixed formatting for whats-new (pydata#9493) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483)
* main: Turn off survey banner (pydata#9512) Stateful test: silence DeprecationWarning from drop_dims (pydata#9508)
* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...
* main: update mypy to 1.13 (pydata#9687)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
whats-new.rst
api.rst
When
shuffle=True
, we call.shuffle()
and then apply the UDF usingmap_blocks
. This turns out to be a bit involved:template
is not trivialmap_blocks
requires that any new dimension that is added must be of the same size in all blocks. This does not work for e.g.groupby('label').mean()
where the result has a new dimensionlabel
that may be chunked in the output.TODO:
template