Skip to content

FR: Make read_parquet tibble of class tbl_df #144

@lbm364dl

Description

@lbm364dl

Following a comment in #104:

I wonder why the default value of class is c("tbl", "data.frame") and not c("tbl_df", "tbl", "data.frame")? It seems to me that tbl is more of an "interface" than a class, since all actual instances of tbls that I have encountered previously are actually also other classes such as either tbl_df or more complex types (such as those found in dbplyr). And the returned object is structurally a tbl_df, right, so it would be useful for any functions taking it to be able to use the most specific class information?

I arrived here because of this:

tmp <- tempfile(fileext = "parquet")

tibble::tibble(a = 1) |>
  nanoparquet::write_parquet(tmp)

df <- nanoparquet::read_parquet(tmp)

class(df)
#> [1] "tbl"        "data.frame"
class(tibble::as_tibble(df))
#> [1] "tbl_df"     "tbl"        "data.frame"

df |>
  dplyr::summarise() |>
  class()
#> [1] "data.frame"

df |>
  tibble::as_tibble() |>
  dplyr::summarise() |>
  class()
#> [1] "tbl_df"     "tbl"        "data.frame"

Created on 2025-07-16 with reprex v2.1.1

Functions like dplyr::summarise() use tibble::is_tibble() internally to check whether something is a tibble, but this is checking specifically on class tbl_df, not tbl. Since this check gives FALSE, the tibble is silently converted to a raw data.frame.

Is there any reason why the object returned by read_parquet() doesn't have the more specific tbl_df class? If it did have it, situations like the one I described would be clearer and less error prone, i.e., users wouldn't have to manually add this class with tibble::as_tibble().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions