-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
featurea feature request or enhancementa feature request or enhancement
Description
I've just noticed some unexpected behaviour with POSIXct data types. The timezone argument is completely ignored by nanoparquet during both read_parquet and write_parquet operations. Below is a small reprex showing a comparison in behaviour between nanoparquet and arrow packages. As you can see arrow::write_parquet acknowledges the timezone, while nanoparquet::write_parquet drops it to defaults (UTC in my case).
Interestingly, the Parquet file, even when written with arrow::write_parquet, also cannot read the timezone when opened with nanoparquet::read_parquet, whereas arrow::read_parquet reads it as expected.
library(dplyr)
library(lubridate)
library(nanoparquet)
library(arrow)
set.seed(123)
df <-
storms |>
dplyr::slice_sample(n = 1000) |>
dplyr::mutate(
name,
datetime = lubridate::make_datetime(year,month,day,hour,tz = "UTC"),
datetime_tz = lubridate::with_tz(datetime, "America/New_York"),
.keep = "none"
)
# Compare timezones
waldo::compare(
lubridate::tz(df$datetime),
lubridate::tz(df$datetime_tz)
)
#> `old`: "UTC"
#> `new`: "America/New_York"
# {nanoparquet} approach
tmp_nano <- tempfile(fileext = ".parquet")
nanoparquet::write_parquet(df, tmp_nano)
df_nano <- nanoparquet::read_parquet(tmp_nano)
waldo::compare(
lubridate::tz(df_nano$datetime),
lubridate::tz(df_nano$datetime_tz)
)
#> ✔ No differences
# {arrow} approach
tmp_arrow <- tempfile(fileext = ".parquet")
arrow::write_parquet(df, tmp_arrow)
df_arrow <- arrow::read_parquet(tmp_arrow)
waldo::compare(
lubridate::tz(df_arrow$datetime),
lubridate::tz(df_arrow$datetime_tz)
)
#> `old`: "UTC"
#> `new`: "America/New_York"
# Read parquet file written with {nanoparquet} using {arrow}
arrow::read_parquet(tmp_nano) |>
dplyr::pull(datetime_tz) |>
lubridate::tz() # Should be America/New_York
#> [1] "UTC"
# Read parquet file written with {arrow} using {nanoparquet}
nanoparquet::read_parquet(tmp_arrow) |>
dplyr::pull(datetime_tz) |>
lubridate::tz() # Should be America/New_York
#> [1] "UTC"Created on 2025-06-19 with reprex v2.1.1
I am using nanoparquet_0.4.2 and arrow_20.0.0.2
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement