- Use stingy instead of frugal (#594).
-
Improved support for handling large data from files and S3: ingestion with
read_parquet_duckdb()
and others, and materialization withas_duckdb_tibble()
,compute.duckplyr_df()
andcompute_file()
. Seevignette("large")
for details. -
Control automatic materialization of duckplyr frames with the new
prudence
argument toas_duckdb_tibble()
,duckdb_tibble()
,compute.duckplyr_df()
andcompute_file()
. Seevignette("prudence")
for details.
-
read_csv_duckdb()
and others, deprecatingduckplyr_df_from_csv()
anddf_from_csv()
(#210, #396, #459). -
read_sql_duckdb()
(experimental) to run SQL queries against the default DuckDB connection and return the result as a duckplyr frame (duckdb/duckdb-r#32, #397). -
db_exec()
to execute configuration queries against the default duckdb connection (#39, #165, #227, #404, #459). -
duckdb_tibble()
(#382, #457). -
as_duckdb_tibble()
, replacesas_duckplyr_tibble()
andas_duckplyr_df()
(#383, #457) and supports dbplyr connections to a duckdb database (#86, #211, #226). -
compute_parquet()
andcompute_csv()
, implementcompute.duckplyr_df()
(#409, #430). -
fallback_config()
to create a configuration file for the settings that do not affect behavior (#216, #426). -
is_duckdb_tibble()
, deprecatesis_duckplyr_df()
(#391, #392). -
last_rel()
to retrieve the last relation object used in materialization (#209, #375). -
Add
"prudent_duckplyr_df"
class that stops automatic materialization and requirescollect()
(#381, #390).
-
Partial support for
across()
inmutate()
andsummarise()
(#296, #306, #318, @lionel-, @DavisVaughan). -
Implement
na.rm
handling forsum()
,min()
,max()
,any()
andall()
, with fallback for window functions (#205, #566). -
Add support for
sub()
andgsub()
(@toppyy, #420). -
Handle
dplyr::desc()
(#550). -
Avoid forwarding
is.na()
tois.nan()
to support non-numeric data, avoid checking roundtrip for timestamp data (#482). -
Correctly handle missing values in
if_else()
. -
Limit number of items that can be handled with
%in%
(#319). -
duckdb_tibble()
checks if columns can be represented in DuckDB (#537). -
Fall back to dplyr when passing
multiple
with joins (#323).
-
Improve fallback error message by explicitly materializing (#432, #456).
-
Point to the native CSV reader if encountering data frames read with readr (#127, #469).
-
Improve
as_duckdb_tibble()
error message for invalidx
(@maelle, #339).
-
Depend on dplyr instead of reexporting all generics (#405). Nothing changes for users in scripts. When using duckplyr in a package, you now also need to import dplyr.
-
Fallback logging is now on by default, can be disabled with configuration (#422).
-
The default DuckDB connection is now based on a file, the location defaults to a subdirectory of
tempdir()
and can be controlled with theDUCKPLYR_TEMP_DIR
environment variable (#439, #448, #561). -
collect()
returns a tibble (#438, #447). -
explain()
returns the input, invisibly (#331).
-
Compute ptype only for join columns in a safe way without materialization, not for the entire data frame (#289).
-
Internal
expr_scrub()
(used for telemetry) can handle function-definitions (@toppyy, #268, #271). -
Harden telemetry code against invalid arguments (#321).
-
New articles:
vignette("large")
,vignette("prudence")
,vignette("fallback")
,vignette("limits")
,vignette("developers")
,vignette("telemetry")
(#207, #504). -
New
flights_df()
used instead ofpalmerpenguins::penguins
(#408). -
Move to the tidyverse GitHub organization, new repository URL https://github.com/tidyverse/duckplyr/ (#225).
-
Avoid base pipe in examples for compatibility with R 4.0.0 (#463, #466).
-
Comparison expressions are translated in a way that allows them to be pushed down to Parquet (@toppyy, #270).
-
Printing a duckplyr frame no longer materializes (#255, #378).
-
Prefer
vctrs::new_data_frame()
overtibble()
(#500).
df_from_file()
and related functions support multiple files (#194, #195), show a clear error message for non-stringpath
arguments (#182), and create a tibble by default (#177).- New
as_duckplyr_tibble()
to convert a data frame to a duckplyr tibble (#177). - Support descending sort for character and other non-numeric data (@toppyy, #92, #175).
- Avoid setting memory limit (#193).
- Check compatibility of join columns (#168, #185).
- Explicitly list supported functions, add contributing guide, add analysis scripts for GitHub activity data (#179).
- Add contributing guide (#179).
- Show a startup message at package load if telemetry is not configured (#188, #198).
?df_from_file
shows how to read multiple files (#181, #186) and how to specify CSV column types (#140, #189), and is shown correctly in reference index (#173, #190).- Discuss dbplyr in README (#145, #191).
- Add analysis scripts for GitHub activity data (#179).
- Use built-in rfuns extension to implement equality and inequality operators, improve translation for
as.integer()
,NA
and%in%
(#83, #154, #148, #155, #159, #160). - Reexport non-deprecated dplyr functions (#144, #163).
library(duckplyr)
callsmethods_overwrite()
(#164).- Only allow constant patterns in
grepl()
. - Explicitly reject calls with named arguments for now.
- Reduce default memory limit to 1 GB.
- Stricter type checks in the set operations
intersect()
,setdiff()
,symdiff()
,union()
, andunion_all()
(#169). - Distinguish between constant
NA
and those used in an expression (#157). head(-1)
forwards to the default implementation (#131, #156).- Fix cli syntax for internal error message (#151).
- More careful detection of row names in data frame.
- Always check roundtrip for timestamp columns.
left_join()
and other join functions callauto_copy()
.- Only reset expression depth if it has been set before.
- Require fallback if the result contains duplicate column names when ignoring case.
row_number()
returns integer.is.na(NaN)
isTRUE
.summarise(count = n(), count = n())
creates only one column namedcount
.- Correct wording in instructions for enabling fallback logging (@TimTaylor, #141).
- Remove styler dependency (#137, #138).
- Avoid error from stats collection.
- Mention wildcards to read multiple files in
?df_from_file
(@andreranza, #133, #134).
- Reenable tests that now run successfully (#166).
- Synchronize tests (#153).
- Test that
vec_ptype()
does not materialize (#149). - Improve telemetry tests.
- Promote equality checks to
expect_identical()
to capture differences between doubles and integers.
- Run autoupload in function so that it will be checked by static analysis (#122).
- New
df_to_parquet()
to write to Parquet, new convenience functionsdf_from_csv()
,duckdb_df_from_csv()
,df_from_parquet()
andduckdb_df_from_parquet()
(#87, #89, #96, #128).
- Forbid reuse of new columns created in
summarise()
(#72, #106). summarise()
no longer restores subclass.- Disambiguate computation of
log10()
andlog()
. - Fix division by zero for positive and negative numbers.
- New
fallback_sitrep()
and related functionality for collecting telemetry data (#102, #107, #110, #111, #115). No data is collected by default, only a message is displayed once per session and then every eight hours. Opt in or opt out by setting environment variables. - Implement
group_by()
and other methods to collect fallback information (#94, #104, #105). - Set memory limit and temporary directory for duckdb.
- Implement
suppressWarnings()
as the identity function. - Prefer
cli::cli_abort()
overstop()
orrlang::abort()
(#114). - Translate
.data$a
and.env$a
. - Strict checks for column class, only supporting
integer
,numeric
,logical
,Date
,POSIXct
, anddifftime
for now. - If the environment variable
DUCKPLYR_METHODS_OVERWRITE
is set toTRUE
, loading duckplyr automatically callsmethods_overwrite()
.
- Better duckdb tests.
- Use standalone purrr for dplyr compatibility.
- Add tests for correct base of
log()
andlog10()
.
methods_overwrite()
andmethods_restore()
show a message.
grepl(x = NA)
gives correct results.- Fix
auto_copy()
for non-data-frame input. - Add output order preservation for filters.
distinct()
now preserves order in corner cases (#77, #78).- Consistent computation of
log(0)
andlog(-1)
(#75, #76).
- Only allow constants in
mutate()
that are actually representable in duckdb (#73). - Avoid translating
ifelse()
, supportif_else()
(#79).
- Separate and explain the new relational examples (@wibeasley, #84).
- Add test that TPC-H queries can be processed.
- Sync with dplyr 1.1.4 (#82).
- Remove
dplyr_reconstruct()
method (#48). - Render README.
- Fix code generated by
meta_replay()
. - Bump constructive dependency.
- Fix output order for
arrange()
in case of ties. - Update duckdb tests.
- Only implement newer
slice_sample()
, notsample_n()
orsample_frac()
(#74). - Sync generated files (#71).
- Join using
IS NOT DISTINCT FROM
for faster execution (duckdb/duckdb-r#41, #68).
- Add stability to README output (@maelle, #62, #65).
-
summarise()
keeps"duckplyr_df"
class (#63, #64). -
Fix compatibility with duckdb >= 0.9.1.
-
Skip tests that give different output on dev tidyselect.
-
Import
utils::globalVariables()
.
-
Small README improvements (@maelle, #34, #57).
-
Fix 301 in README.
-
Improve documentation.
-
Work around problem with
dplyr_reconstruct()
in R 4.3. -
Rename
duckdb_from_file()
todf_from_file()
. -
Unexport private
duckdb_rel_from_df()
,rel_from_df()
,wrap_df()
andwrap_integer()
. -
Reexport
%>%
andtibble()
.
- Implement relational API for DuckDB.
- Fix examples.
- Add CRAN install instructions.
- Satisfy
R CMD check
. - Document argument.
- Error on NOTE.
- Remove
relexpr_window()
for now.
- Clean up reference.
Initial version, exporting:
new_relational()
to construct objects of class"relational"
- Generics
rel_aggregate()
,rel_distinct()
,rel_filter()
,rel_join()
,rel_limit()
,rel_names()
,rel_order()
,rel_project()
,rel_set_diff()
,rel_set_intersect()
,rel_set_symdiff()
,rel_to_df()
,rel_union_all()
new_relexpr()
to construct objects of class"relational_relexpr"
- Expression builders
relexpr_constant()
,relexpr_function()
,relexpr_reference()
,relexpr_set_alias()
,relexpr_window()