Releases: SebKrantz/collapse
collapse version 1.6.4
collapse 1.6.4
A patch for 1.6.0 which fixes (minor) issues flagged by CRAN and adds a few handy extras.
Bug Fixes
-
Puts examples using the new base pipe
|>
inside\donttest{}
so that they don't fail CRAN tests on older R versions. -
Fixes a LTO issue caused by a small mistake in a header file (which does not have any implications to the user but was detected by CRAN checks).
-
Checks on the gcc11 compiler flagged an additional issue with a pointer pointing to element -1 of a C array (which I had done on purpose to index it with an R integer vector).
-
Fixes a valgrind issue because of comparing an uninitialized value to something.
Additions
-
Added a function
fcomputev
, which allows selecting columns and transforming them with a function in one go. Thekeep
argument can be used to add columns to the selection that are not transformed. -
Added a function
setLabels
as a wrapper aroundvlabels<-
to facilitate setting variable labels inside pipes. -
Function
rm_stub
now has an argumentregex = TRUE
which triggers a call togsub
and allows general removing of character sequences in column names on the fly.
Improvements
vlabels<-
andsetLabels
now support list of variable labels or other attributes (i.e. thevalue
is internally subset using[[
, not[
). Thus they are now general functions to attach a vector or list of attributes to columns in a list / data frame.
Other Changes
- CRAN maintainers have asked me to remove a line in a Makevars file intended to reduce the size of Rcpp object files (which has been there since version 1.4). So the installed size of the package may now be larger.
collapse version 1.6.0
collapse 1.6.0
collapse 1.6.0, released end of June 2021, presents some significant improvements in the user-friendliness, compatibility and programmability of the package, as well as a few function additions.
Changes to Functionality
-
ffirst
,flast
,fnobs
,fsum
,fmin
andfmax
were rewritten in C. The former three now also support list columns (whereNULL
or empty list elements are considered missing values whenna.rm = TRUE
), and are extremely fast for grouped aggregation ifna.rm = FALSE
. The latter three also support and return integers, with significant performance gains, even compared to base R. Code using these functions expecting an error for list-columns or expecting double output even if the input is integer should be adjusted. -
collapse now directly supports sf data frames through functions like
fselect
,fsubset
,num_vars
,qsu
,descr
,varying
,funique
,roworder
,rsplit
,fcompute
etc., which will take along the geometry column even if it is not explicitly selected (mirroring dplyr methods for sf data frames). This is mostly done internally at C-level, so functions remain simple and fast. Existing code that explicitly selects the geometry column is unaffected by the change, but code of the formsf_data %>% num_vars %>% qDF %>% ...
, where columns excluding geometry were selected and the object later converted to a data frame, needs to be rewritten assf_data %>% qDF %>% num_vars %>% ...
. A short vignette was added describing the integration of collapse and sf. -
I've received several requests for increased namespace consistency. collapse functions were named to be consistent with base R, dplyr and data.table, resulting in names like
is.Date
,fgroup_by
orsettransformv
. To me this makes sense, but I've been convinced that a bit more consistency is advantageous. Towards that end I have decided to eliminate the '.' notation of base R and to remove some unexpected capitalizations in function names giving some people the impression I was using camel-case. The following functions are renamed:
fNobs
->fnobs
,fNdistinct
->fndistinct
,pwNobs
->pwnobs
,fHDwithin
->fhdwithin
fHDbetween
->fhdbetween
,as.factor_GRP
->as_factor_GRP
,as.factor_qG
->as_factor_qG
,is.GRP
->is_GRP
,is.qG
->is_qG
,is.unlistable
->is_unlistable
,is.categorical
->is_categorical
,is.Date
->is_date
,as.numeric_factor
->as_numeric_factor
,as.character_factor
->as_character_factor
,
Date_vars
->date_vars
.
This is done in a very careful manor, the others will stick around for a long while (end of 2022), and the generics offNobs
,fNdistinct
,fHDbetween
andfHDwithin
will be kept in the package for an indeterminate period, but their core methods will not be exported beyond 2022. I will start warning about these renamed functions in 2022. In the future I will undogmatically stick to a function naming style with lowercase function names and underslashes where words need to be split. Other function names will be kept. To say something about this: The quick-conversion functionsqDF
qDT
,qM
,qF
,qG
are consistent and in-line with data.table (setDT
etc.), and similarly the operatorsL
,F
,D
,Dlog
,G
,B
,W
,HDB
,HDW
. I'll keepGRP
,BY
andTRA
, for lack of better names, parsimony and because they are central to the package. The camel case will be kept in helper functionssetDimnames
etc. because they work like statssetNames
and do not modify the argument by reference (likesettransform
orsetrename
and various data.table functions). FunctionscopyAttrib
andcopyMostAttrib
are exports of like-named functions in the C API and thus kept as they are. Finally, I want to keepfFtest
the way it is because the F-distribution is widely recognized by a capital F. -
I've updated the
wlddev
dataset with the latest data from the World Bank, and also added a variable giving the total population (which may be useful e.g. for population-weighted aggregations across regions). The extra column could invalidate codes used to demonstrate something (I had to adjust some examples, tests and code in vignettes).
Additions
-
Added a function
fcumsum
(written in C), permitting flexible (grouped, ordered) cumulative summations on matrix-like objects (integer or double typed) with extra methods for grouped data frames and panel series and data frames. Apart from the internal grouping, and an ordering argument allowing cumulative sums in a different order than data appear,fcumsum
has 2 options to deal with missing values. The default (na.rm = TRUE
) is to skip (preserve) missing values, whereas settingfill = TRUE
allows missing values to be populated with the previous value of the cumulative sum (starting from 0). -
Added a function
alloc
to efficiently generate vectors initialized with any value (faster thanrep_len
). -
Added a function
pad
to efficiently pad vectors / matrices / data.frames with a value (default isNA
). This function was mainly created to make it easy to expand results coming from a statistical model fitted on data with missing values to the original length. For example letdata <- na_insert(mtcars); mod <- lm(mpg ~ cyl, data)
, then we can dosettransform(data, resid = pad(resid(mod), mod$na.action))
, or we could dopad(model.matrix(mod), mod$na.action)
orpad(model.frame(mod), mod$na.action)
to receive matrices and data frames from model data matching the rows ofdata
.pad
is a general function that will also work with mixed-type data. It is also possible to pass a vector of indices matching the rows of the data topad
, in which casepad
will fill gaps in those indices with a value/row in the data.
Improvements
-
Full data.table support, including reference semantics (
set*
,:=
)!! There is some complex C-level programming behind data.table's operations by reference. Notably, additional (hidden) column pointers are allocated to be able to add columns without taking a shallow copy of the data.table, and an".internal.selfref"
attribute containing an external pointer is used to check if any shallow copy was made using base R commands like<-
. This is done to avoid even a shallow copy of the data.table in manipulations using:=
(and is in my opinion not worth it as even large tables are shallow copied by base R (>=3.1.0) within microseconds and all of this complicates development immensely). Previously, collapse treated data.table's like any other data frame, using shallow copies in manipulations and preserving the attributes (thus ignoring how data.table works internally). This produced a warning whenever you wanted to use data.table reference semantics (set*
,:=
) after passing the data.table through a collapse function such ascollap
,fselect
,fsubset
,fgroup_by
etc. From v1.6.0, I have adopted essential C code from data.table to do the overallocation and generate the".internal.selfref"
attribute, thus seamless workflows combining collapse and data.table are now possible. This comes at a cost of about 2-3 microseconds per function, as to do this I have to shallow copy the data.table again and add extra column pointers and an".internal.selfref"
attribute telling data.table that this table was not copied (it seems to be the only way to do it for now). This integration encompasses all data manipulation functions in collapse, but not the Fast Statistical Functions themselves. Thus you can doagDT <- DT %>% fselect(id, col1:coln) %>% collap(~id, fsum); agDT[, newcol := 1]
, but you would need to do add aqDT
after a function likefsum
if you want to use reference semantics without incurring a warning:agDT <- DT %>% fselect(id, col1:coln) %>% fgroup_by(id) %>% fsum %>% qDT; agDT[, newcol := 1]
. collapse appears to be the first package that attempts to account for data.table's internal working without importing data.table, andqDT
is now the fastest way to create a fully functional data.table from any R object. A global option"collapse_DT_alloccol"
was added to regulate how many columns collapse overallocates when creating data.table's. The default is 100, which is lower than the data.table default of 1024. This was done to increase efficiency of the additional shallow copies, and may be changed by the user. -
Programming enabled with
fselect
andfgroup_by
(you can now pass vectors containing column names or indices). Note that instead offselect
you should useget_vars
for standard eval programming. -
fselect
andfsubset
support in-place renaming e.g.fselect(data, newname = var1, var3:varN)
,
fsubset(data, vark > varp, newname = var1, var3:varN)
. -
collap
supports renaming columns in the custom argument, e.g.collap(data, ~ id, custom = list(fmean = c(newname = "var1", "var2"), fmode = c(newname = 3), flast = is_date))
. -
Performance improvements:
fsubset
/ss
return the data or perform a simple column subset without deep copying the data if all rows are selected through a logical expression.fselect
andget_vars
,num_vars
etc. are slightly faster through data frame subsetting done fully in C.ftransform
/fcompute
usealloc
instead ofbase::rep
to replicate a scalar value which is slightly more efficient. -
fcompute
now has akeep
argument, to preserve several existing columns when computing columns on a data frame. -
replace_NA
now has acols
argument, so we can doreplace_NA(data, cols = is.numeric)
, to replaceNA
's in numeric columns. I note that for big numeric datadata.table::setnafill
is the most efficient solution. -
fhdbetween
andfhdwithin
have aneffect
argument in plm methods, allowing centering on selected identifiers. The default is still to center on all panel identifiers.
...
collapse version 1.5.3
Changes to Functionality
-
The first argument of
ftransform
was renamed to.data
fromX
. This was done to enable the user to transform columns named "X". For the same reason the first argument offrename
was renamed to.x
fromx
(not.data
to make it explicit that.x
can be any R object with a "names" attribute). It is not possible to depreciateX
andx
without at the same time undoing the benefits of the argument renaming, thus this change is immediate and code breaking in rare cases where the first argument is explicitly set. -
The function
is.regular
to check whether an R object is atomic or list-like is depreciated and will be removed before the end of the year. This was done to avoid a namespace clash with the zoo package (#127).
Bug Fixes
-
For reasons of efficiency, most statistical and transformation functions used the C macro
SHALLOW_DUPLICATE_ATTRIB
to copy column attributes in a data frame. Since this macro does not copy S4 object bits, it caused some problems with S4 object columns such as POSIXct (e.g. computing lags/leads, first and last values on these columns). This is now fixed, all statistical functions (apart fromfvar
andfsd
) now useDUPLICATE_ATTRIB
and thus preserve S4 object columns (#91). -
unlist2d
produced a subsetting error if an empty list was present in the list-tree. This is now fixed, empty orNULL
elements in the list-tree are simply ignored (#99).
Additions
-
A function
fsummarise
was added to facilitate translating dplyr / data.table code to collapse. Likecollap
, it is only very fast when used with the Fast Statistical Functions. -
A function
t_list
is made available to efficiently transpose lists of lists.
Improvements
- C files are compiled -O3 on Windows, which gives a boost of around 20% for the grouping mechanism applied to character data.
collapse version 1.5.1
A small patch for 1.5.0 that:
-
Fixes a numeric precision issue when grouping doubles (e.g. before
qF(wlddev$LIFEEX)
gave an error, now it works). -
Fixes a minor issue with
fHDwithin
when applied to pseries andfill = FALSE
.
collapse version 1.5.0
collapse 1.5.0, released early January 2021, presents important refinements and some additional functionality.
Back to CRAN
- I apologize for inconveniences caused by the temporal archival of collapse from December 19, 2020. This archival was caused by the archival of the important lfe package on the 4th of December. collapse depended on lfe for higher-dimensional centering, providing the
fHDbetween / fHDwithin
functions for generalized linear projecting / partialling out. To remedy the damage caused by the removal of lfe, I had to rewritefHDbetween / fHDwithin
to take advantage of the demeaning algorithm provided by fixest, which has some quite different mechanics. Beforehand, I made some significant changes tofixest::demean
itself to make this integration happen. The CRAN deadline was the 18th of December, and I realized too late that I would not make this. A request to CRAN for extension was declined, so collapse got archived on the 19th. I have learned from this experience, and collapse is now sufficiently insulated that it will not be taken off CRAN even if all suggested packages were removed from CRAN.
Bug Fixes
- Segfaults in several Fast Statistical Functions when passed
numeric(0)
are fixed (thanks to @eshom and @acylam, #101). The default behavior is that all collapse functions returnnumeric(0)
again, except forfNobs
,fNdistinct
which return0L
, andfvar
,fsd
which returnNA_real_
.
Changes to Functionality
-
Functions
fHDwithin / HDW
andfHDbetween / HDB
have been reworked, delivering higher performance and greater functionality: For higher-dimensional centering and heterogenous slopes, thedemean
function from the fixest package is imported (conditional on the availability of that package). The linear prediction and partialling out functionality is now built aroundflm
and also allows for weights and different fitting methods. -
In
collap
, the default behavior ofgive.names = "auto"
was altered when used together with thecustom
argument. Before the function name was always added to the column names. Now it is only added if a column is aggregated with two different functions. I apologize if this breaks any code dependent on the new names, but this behavior just better reflects most common use (applying only one function per column), as well as STATA's collapse. -
For list processing functions like
get_elem
,has_elem
etc. the default for the argumentDF.as.list
was changed fromTRUE
toFALSE
. This means if a nested lists contains data frame's, these data frame's will not be searched for matching elements. This default also reflects the more common usage of these functions (extracting entire data frame's or computed quantities from nested lists rather than searching / subsetting lists of data frame's). The change also delivers a considerable performance gain.
- Vignettes were outsourced to the website, and also made available as PDF versions for download there. This nearly halves the size of the source package, and should induce users to appreciate the built-in documentation. The website also makes for much more convenient reading and navigation of these book-style vignettes.
Additions
-
Added a set of 10 operators
%rr%
,%r+%
,%r-%
,%r*%
,%r/%
,%cr%
,%c+%
,%c-%
,%c*%
,%c/%
to facilitate and speed up row- and column-wise arithmetic operations involving a vector and a matrix / data frame / list. For exampleX %r*% v
efficiently multiplies every row ofX
withv
. Note that more advanced functionality is already provided inTRA()
,dapply()
and the Fast Statistical Functions, but these operators are intuitive and very convenient to use in matrix or matrix-style code, or in piped expressions. -
Added function
missing_cases
(opposite ofcomplete.cases
and faster for data frame's / lists). -
Added function
allNA
for atomic vectors. -
New vignette about using collapse together with data.table, available online.
Improvements
- Time series functions and operators
flag / L / F
,fdiff / D / Dlog
andfgrowth / G
now natively support irregular time series and panels, and feature a 'complete approach' i.e. values are shifted around taking full account of the underlying time-dimension!
-
Functions
pwcor
andpwcov
can now compute weighted correlations on the pairwise or complete observations, supported by C-code that is (conditionally) imported from the weights package. -
fFtest
now also supports weights. -
collap
now provides an easy workaround to aggregate some columns using weights and others without. The user may simply append the names of Fast Statistical Functions with_uw
to disable weights. Example:collapse::collap(mtcars, ~ cyl, custom = list(fmean_uw = 3:4, fmean = 8:10), w = ~ wt)
aggregates columns 3 through 4 using a simple mean and columns 8 through 10 using the weighted mean. -
The parallelism in
collap
usingparallel::mclapply
has been reworked to operate at the column-level, and not at the function level as before. It is still not available for Windows though. The default number of cores was set tomc.cores = 2L
, which now gives an error on windows ifparallel = TRUE
. -
function
recode_char
now has additional optionsignore.case
andfixed
(passed togrepl
), for enhanced recoding character data based on regular expressions. -
rapply2d
now hasclasses
argument permitting more flexible use. -
na_rm
and some other internal functions were rewritten in C.na_rm
is now 2x faster thanx[!is.na(x)]
with missing values and 10x faster without missing values.
collapse version 1.4.2
collapse 1.4.2, released mid November 2020, presents some important refinements, particularly in the domain of attribute handling, as well as some additional functionality. The changes make collapse smarter, more broadly compatible and more secure, and should not break existing code.
Changes to Functionality
-
Deep Matrix Dispatch / Extended Time Series Support: The default methods of all statistical and transformation functions dispatch to the matrix method if
is.matrix(x) && !inherits(x, "matrix")
evaluates toTRUE
. This specification avoids invoking the default method on classed matrix-based objects (such as multivariate time series of the xts / zoo class) not inheriting a 'matrix' class, while still allowing the user to manually call the default method on matrices (objects with implicit or explicit 'matrix' class). The change implies that collapse's generic statistical functions are now well suited to transform xts / zoo and many other time series and matrix-based classes. -
Fully Non-Destructive Piped Workflow:
fgroup_by(x, ...)
now only adds a class grouped_df, not classes table_df, tbl, grouped_df, and preserves all classes ofx
. This implies that workflows such asx %>% fgroup_by(...) %>% fmean
etc. yields an objectxAG
of the same class and attributes asx
, not a tibble as before. collapse aims to be as broadly compatible, class-agnostic and attribute preserving as possible.
- Thorough and Controlled Object Conversions: Quick conversion functions
qDF
,qDT
andqM
now have additional argumentskeep.attr
andclass
providing precise user control over object conversions in terms of classes and other attributes assigned / maintained. The default (keep.attr = FALSE
) yields hard conversions removing all but essential attributes from the object. E.g. beforeqM(EuStockMarkets)
would just have returnedEuStockMarkets
(becauseis.matrix(EuStockMarkets)
isTRUE
) whereas now the time series class and 'tsp' attribute are removed.qM(EuStockMarkets, keep.attr = TRUE)
returnsEuStockMarkets
as before.
-
Smarter Attribute Handling: Drawing on the guidance given in the R Internals manual, the following standards for optimal non-destructive attribute handling are formalized and communicated to the user:
-
The default and matrix methods of the Fast Statistical Functions preserve attributes of the input in grouped aggregations ('names', 'dim' and 'dimnames' are suitably modified). If inputs are classed objects (e.g. factors, time series, checked by
is.object
), the class and other attributes are dropped. Simple (non-grouped) aggregations of vectors and matrices do not preserve attributes, unlessdrop = FALSE
in the matrix method. An exemption is made in the default methods of functionsffirst
,flast
andfmode
, which always preserve the attributes (as the input could well be a factor or date variable). -
The data frame methods are unaltered: All attributes of the data frame and columns in the data frame are preserved unless the computation result from each column is a scalar (not computing by groups) and
drop = TRUE
(the default). -
Transformations with functions like
flag
,fwithin
,fscale
etc. are also unaltered: All attributes of the input are preserved in the output (regardless of whether the input is a vector, matrix, data.frame or related classed object). The same holds for transformation options modifying the input ("-", "-+", "/", "+", "*", "%%", "-%%") when usingTRA()
function or theTRA = "..."
argument to the Fast Statistical Functions. -
For
TRA
'replace' and 'replace_fill' options, the data type of the STATS is preserved, not of x. This provides better results particularly with functions likefNobs
andfNdistinct
. E.g. previouslyfNobs(letters, TRA = "replace")
would have returned the observation counts coerced to character, becauseletters
is character. Now the result is integer typed. For attribute handling this means that the attributes of x are preserved unless x is a classed object and the data types of x and STATS do not match. An exemption to this rule is made if x is a factor and an integer (non-factor) replacement is offered to STATS. In that case the attributes of x are copied exempting the 'class' and 'levels' attribute, e.g. so thatfNobs(iris$Species, TRA = "replace")
gives an integer vector, not a (malformed) factor. In the unlikely event that STATS is a classed object, the attributes of STATS are preserved and the attributes of x discarded.
-
- Reduced Dependency Burden: The dependency on the lfe package was made optional. Functions
fHDwithin
/fHDbetween
can only perform higher-dimensional centering if lfe is available. Linear prediction and centering with a single factor (among a list of covariates) is still possible without installing lfe. This change means that collapse now only depends on base R and Rcpp and is supported down to R version 2.10.
Additions
-
Added function
rsplit
for efficient (recursive) splitting of vectors and data frames. -
Added function
fdroplevels
for very fast missing level removal + added argumentdrop
toqF
andGRP.factor
, the default isdrop = FALSE
. The addition offdroplevels
also enhances the speed of thefFtest
function. -
fgrowth
supports annualizing / compounding growth rates through addedpower
argument. -
A function
flm
was added for barebones (weighted) linear regression fitting using different efficient methods: 4 from base R (.lm.fit
,solve
,qr
,chol
), usingfastLm
from RcppArmadillo (if installed), orfastLm
from RcppEigen (if installed). -
Added function
qTBL
to quickly convert R objects to tibble. -
helpers
setAttrib
,copyAttrib
andcopyMostAttrib
exported for fast attribute handling in R (similar toattributes<-()
, these functions return a shallow copy of the first argument with the set of attributes replaced, but do not perform checks for attribute validity likeattributes<-()
. This can yield large performance gains with big objects). -
helper
cinv
added wrapping the expressionchol2inv(chol(x))
(efficient inverse of a symmetric, positive definite matrix via Choleski factorization). -
A shortcut
gby
is now available to abbreviate the frequently usedfgroup_by
function. -
Adds a method
[.GRP_df
that allows robust subsetting of grouped objects created withfgroup_by
(thanks to Patrice Kiener for flagging this). -
A print method for grouped data frames of any class was added.
Improvements
- Faster internal methods for factors for
funique
,fmode
andfNdistinct
.
-
The grouped_df methods for
flag
,fdiff
,fgrowth
now also support multiple time variables to identify a panel e.g.data %>% fgroup_by(region, person_id) %>% flag(1:2, list(month, day))
. -
More security features for
fsubset.data.frame
/ss
,ss
is now internal generic and also supports subsetting matrices. -
In some functions (like
na_omit
), passing double values (e.g.1
instead of integer1L
) or negative indices to thecols
argument produced an error or unexpected behavior. This is now fixed in all functions. -
Fixed a bug in helper function
all_obj_equal
occurring if objects are not all equal. -
Some performance improvements through increased use of pointers and C API functions.
-
Some documentation updates by Kevin Tappe.
collapse version 1.3.2
collapse 1.3.2, released mid September 2020, is a minor update:
-
Fixed a small bug in
fNdistinct
for grouped distinct value counts on logical vectors. -
Additional security for
ftransform
, which now efficiently checks the names of the data and replacement arguments for uniqueness, and also allows computing and transforming list-columns. -
Added function
ftransformv
to facilitate transforming selected columns with function - a very efficient replacement fordplyr::mutate_if
anddplyr::mutate_at
. -
frename
now allows additional arguments to be passed to a renaming function.
collapse version 1.3.1
collapse 1.3.1, released end of August 2020, is a minor patch for 1.3.0:
- Adjusted unit tests that fail on certain operating systems (mostly because of numeric precision issues). This update contains no changes to code or functionality.
collapse version 1.3.0
collapse 1.3.0, released mid August 2020, is another major update:
Changes to Functionality
-
dapply
andBY
now drop all unnecessary attributes ifreturn = "matrix"
orreturn = "data.frame"
are explicitly requested (the defaultreturn = "same"
still seeks to preserve the input data structure). -
unlist2d
now saves integer rownames ifrow.names = TRUE
and a list of matrices without rownames is passed, andid.factor = TRUE
generates a normal factor not an ordered factor. It is however possible to writeid.factor = "ordered"
to get an ordered factor id. -
fdiff
argumentlogdiff
renamed tolog
, and taking logs is now done in R (reduces size of C++ code and does not generate as many NaN's).logdiff
may still be used, but it may be deactivated in the future. Also in the matrix and data.frame methods forflag
,fdiff
andfgrowth
, columns are only stub-renamed if more than one lag/difference/growth rate is computed.
Additions
-
Added
fnth
for fast (grouped, weighted) n'th element/quantile computations. -
Added
roworder(v)
andcolorder(v)
for fast row and column reordering. -
Added
frename
andsetrename
for fast and flexible renaming (by reference). -
Added function
fungroup
, as replacement fordplyr::ungroup
, intended for use withfgroup_by
. -
The shortcut
gvr
was created forget_vars(..., regex = TRUE)
. Also a helper.c
was introduced for non-standard concatenation (i.e..c(a, b) == c("a", "b")
).
Improvements
-
fmedian
now supports weights, computing a decently fast (grouped) weighted median based on radix ordering. -
fmode
now has the option to compute min and max mode, the default is still simply the first mode. -
fwithin
now supports quasi-demeaning (added argumenttheta
) and can thus be used to manually estimate random-effects models. -
fmode
andfNdistinct
have become a bit faster. -
fgroup_by
now preserves data.table's. -
funique
is now generic with a default vector and data.frame method, providing fast unique values and rows of data. The default was changed tosort = FALSE
. -
ftransform
now also supports a data.frame as replacement argument, which automatically replaces matching columns and adds unmatched ones. Alsoftransform<-
was created as a more formal replacement method for this feature. -
collap
columns selected throughcols
argument are returned in the order selected ifkeep.col.order = FALSE
. Argumentsort.row
is depreciated, and replace by argumentsort
. In addition thedecreasing
andna.last
arguments were added and handed down toGRP.default
. -
radixorder
'sorted' attribute is now always attached. -
stats::D
which is masked when collapse is attached, is now preserved through methodsD.expression
andD.call
. -
GRP
optioncall = FALSE
to omit a call tomatch.call
-> minor performance improvement. -
Several small performance improvements through rewriting some internal helper functions in C and reworking some R code.
-
Performance improvements for some helper functions,
setRownames
/setColnames
,na_insert
etc. -
Increased scope of testing statistical functions. The functionality of the package is now secured by 7700 unit tests covering all central bits and pieces.
collapse version 1.2.1
collapse 1.2.1, released end of May 2020, is a patch for 1.2.0:
-
Minor fixes for 1.2.0 issues that prevented correct installation on Mac OS X and a vignette rebuilding error on solaris.
-
fmode.grouped_df with groups and weights now saves the sum of the weights instead of the max (this makes more sense as the max only applies if all elements are unique).