-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update deps #14
Open
kayibal
wants to merge
87
commits into
kayibal:master
Choose a base branch
from
datarevenue-berlin:update-deps
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Update deps #14
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
distributed assign support
Add property columns and index to dask.SparseFrame and increase version to 0.8.0
…ly method for various data types
…ules downgrade to 0.19 until dask releases patch
Hotfix/add signature
Added fillna method
This is useful mainly to avoid dask processes sharing really big arrays in case the categories get really big
This is useful mainly to avoid dask processes sharing really big arrays in case the categories get really big
was broken if loc returned a single location or integers were used as indexers
Removes some deprecation warning by updating calls from _keys() to __dask_keys__() as well as updating th import from dask.optimize to dask.optimization
) * Update indexer instantiation. Allow loc on index with duplicates. * Support latest versions of pandas (>=0.23.0) * Update circleci configuration to v2 * fix indexing error with older scipy versions (<1.0.0) * Support column indexing in _xs method * raise error if sparse frame is indexed (__getitem__) with None
This resolves problems that appeared after changing drtools' FileSystems behaviour. Eventually this should be handled more elegantly. Currently there's some duplicated code which is the same as in filesystem module in drtools. Maybe we should make FileSystems a separate package (opensource) and use it both in sparsity and drtools?
* Raise error when initializing with unaligned indices
Now it detects whether pandas appended 2 description rows at the end and removes them only if necessary.
Previously original DataFrame's index/columns would be preserved and passed index/columns would be ignored. Now passed index/columns are used but a SyntaxWarning is issued. Fixes #52.
`data` currently can't be a list anyway. Its `.shape` attribute is used at the very beginning of init method, so it has to be array-like.
And add a better docstring.
- column names are preserved in groupby_agg - when groupby_agg is used with Multiindex and level=, resulting index has values only for specified level - when grouping by column, this column is not present in result Fixes #58.
* enable tracking on documentation page * Update documentation link.
* Implement distributed groupby sum and apply_concat_apply function for SparseFrame * add test for different index datatypes * implement sort_index * implement __len__ * implement rename, optimize groupby_sum and join implements distributed rename method and adds quicker routines to groupby_sum if divisions are known. Adds support for joining sp.SparseFrames onto a distributed SparseFrame. * implement distributed set_index * number of line ouput in __repr__ changed. * Create folders when writing to local filesystem * Fix empty dtype * Implement distributed drop. * Always add npz extension when writing SparseFrame to npz format * Fix metadata handling on set_index method * Add method for dask SparseFrame and tuple divisions type * Support empty divisions * Pass on divisions on sort_index * More restrictive pandas version as .drop method fails with pandas==0.20.3 * Fix bug where empty dataframe would create wrongly sized shuffle array * Fix bug where join with in memory sparse frame would return rows from meta_nonempty * Update dask version in setup.py * Update deprecated set_options call * Fix moto and boto versions * Update test dependencies
Fix behaviour when passing Index to __getitem__ Fixes #74
* Rename io modules to io_ and fix some version conflicts Numpy 1.16.* is not compatible with sparsity 0.20.* thus we need to fix the setup.py. When using Scipy<1.0.0 empty column access does not work, thus the dependency had to be adjusted here as well. This also renames the io_ modules to avoid issues with pythons internal module. * Fix incompatibility with numpy>=1.16.0, potential security issue. Due to a security issue (CVE-2019-6446) numpy changed the default value of allow_pickle in np.load to True, this led to error when reading sparse frames from npz archives. This commit fixes it by allowing pickled objects, thus reading sparse frames from unkown sources is still a security risk.
* Add support for dask persist This adds support for dask persist method. * Test persist functionality * PRETTY rename import
* Check for type of meta in `apply_and_enforce` It was possible that although computed type is SparseFrame, other type is returned (if meta was not a SparseFrame). Imports are not changed, just reorganized. * Simple __getindex__ for dask SparseFrames. Support for dsp[index] syntax. Doesn't aim to work the same as in pandas, just maps __getitem__ onto partitions. * Add getitem test with empty frame * todense() returns Series when there is one empty column Previously it returned DataFrame, even though in case of 1-column non-empty SparseFrame, it returned Series. Imports are only re-organized. * Add .todense() method to Dask SparseFrame It works by mapping SparseFrame.todense onto partitions. It as necessary to allow `map_partitions` to return other types then SparseFrame, so kwarg `cls` was added. It implies that one cannot use `cls` kwarg as an argument to mapped function (because it will be consumed by `map_partitions` and not passed to a mapped function). * Support reindex in case of empty frame * More elegant way to implement todense function. (#80) This leverages the dask.delayed object api to achieve the same result which was previously a hack between map_partitions and initializing dd.DataFrame directy.
* Bugfix: sf['missing column'] raises KeyError Previously it returned last column. * Add test for dask version
This changes add support for pandas>0.23, including 0.24 and 0.25.0
Other cases import a name without underscore.
There is no such argument in pandas 0.23.4
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates sparsity to work with latest pandas version 0.25.0 and latest Dask version 2.2.0. Be sure to have latest versions of filesystem packages installed too.