Skip to content

Timezone is ignored during I/O operations #140

@atsyplenkov

Description

@atsyplenkov

I've just noticed some unexpected behaviour with POSIXct data types. The timezone argument is completely ignored by nanoparquet during both read_parquet and write_parquet operations. Below is a small reprex showing a comparison in behaviour between nanoparquet and arrow packages. As you can see arrow::write_parquet acknowledges the timezone, while nanoparquet::write_parquet drops it to defaults (UTC in my case).

Interestingly, the Parquet file, even when written with arrow::write_parquet, also cannot read the timezone when opened with nanoparquet::read_parquet, whereas arrow::read_parquet reads it as expected.

library(dplyr)
library(lubridate)
library(nanoparquet)
library(arrow)

set.seed(123)
df <-
storms |>
dplyr::slice_sample(n = 1000) |>
dplyr::mutate(
  name,
  datetime = lubridate::make_datetime(year,month,day,hour,tz = "UTC"),
  datetime_tz = lubridate::with_tz(datetime, "America/New_York"),
  .keep = "none"
)

# Compare timezones
waldo::compare(
  lubridate::tz(df$datetime),
  lubridate::tz(df$datetime_tz)
)
#> `old`: "UTC"             
#> `new`: "America/New_York"

# {nanoparquet} approach
tmp_nano <- tempfile(fileext = ".parquet")
nanoparquet::write_parquet(df, tmp_nano)

df_nano <- nanoparquet::read_parquet(tmp_nano)

waldo::compare(
  lubridate::tz(df_nano$datetime),
  lubridate::tz(df_nano$datetime_tz)
)
#> ✔ No differences

# {arrow} approach
tmp_arrow <- tempfile(fileext = ".parquet")
arrow::write_parquet(df, tmp_arrow)
df_arrow <- arrow::read_parquet(tmp_arrow)

waldo::compare(
  lubridate::tz(df_arrow$datetime),
  lubridate::tz(df_arrow$datetime_tz)
)
#> `old`: "UTC"             
#> `new`: "America/New_York"

# Read parquet file written with {nanoparquet} using {arrow}
arrow::read_parquet(tmp_nano) |>
  dplyr::pull(datetime_tz) |>
  lubridate::tz() # Should be America/New_York
#> [1] "UTC"

# Read parquet file written with {arrow} using {nanoparquet}
nanoparquet::read_parquet(tmp_arrow) |>
  dplyr::pull(datetime_tz) |>
  lubridate::tz() # Should be America/New_York
#> [1] "UTC"

Created on 2025-06-19 with reprex v2.1.1

I am using nanoparquet_0.4.2 and arrow_20.0.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions