-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Following a comment in #104:
I wonder why the default value of class is
c("tbl", "data.frame")and notc("tbl_df", "tbl", "data.frame")? It seems to me thattblis more of an "interface" than a class, since all actual instances oftbls that I have encountered previously are actually also other classes such as eithertbl_dfor more complex types (such as those found indbplyr). And the returned object is structurally atbl_df, right, so it would be useful for any functions taking it to be able to use the most specific class information?
I arrived here because of this:
tmp <- tempfile(fileext = "parquet")
tibble::tibble(a = 1) |>
nanoparquet::write_parquet(tmp)
df <- nanoparquet::read_parquet(tmp)
class(df)
#> [1] "tbl" "data.frame"
class(tibble::as_tibble(df))
#> [1] "tbl_df" "tbl" "data.frame"
df |>
dplyr::summarise() |>
class()
#> [1] "data.frame"
df |>
tibble::as_tibble() |>
dplyr::summarise() |>
class()
#> [1] "tbl_df" "tbl" "data.frame"Created on 2025-07-16 with reprex v2.1.1
Functions like dplyr::summarise() use tibble::is_tibble() internally to check whether something is a tibble, but this is checking specifically on class tbl_df, not tbl. Since this check gives FALSE, the tibble is silently converted to a raw data.frame.
Is there any reason why the object returned by read_parquet() doesn't have the more specific tbl_df class? If it did have it, situations like the one I described would be clearer and less error prone, i.e., users wouldn't have to manually add this class with tibble::as_tibble().