Skip to content

Commit b10386e

Browse files
serramatutulidavidmjorisvandenbosschefelipecrv
authored
apacheGH-44248: [Format] Add TimestampWithOffset canonical extension type (apache#48002)
### Rationale for this change Closes apache#44248 Arrow has no built-in canonical way of representing the `TIMESTAMP WITH TIME ZONE` SQL type, which is present across multiple different database systems. Not having a native way to represent this forces users to either convert to UTC and drop the time zone, which may have correctness implications, or use bespoke workarounds. A new `arrow.timestamp_with_offset` extension type would introduce a standard canonical way of representing that information. Rust implementation: apache/arrow-rs#8743 Go implementation: apache/arrow-go#558 [DISCUSS] [thread in the mailing list](https://lists.apache.org/thread/yhbr3rj9l59yoxv92o2s6dqlop16sfnk). ### What changes are included in this PR? Proposal and documentation for `arrow.timestamp_with_offset` canonical extension type. ### Are these changes tested? N/A ### Are there any user-facing changes? Yes, this is an extension to the arrow format. * GitHub Issue: apache#44248 --------- Co-authored-by: David Li <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
1 parent ca5cb92 commit b10386e

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

docs/source/format/CanonicalExtensions.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,33 @@ Primitive Type Mappings
544544
| UUID extension type | UUID |
545545
+----------------------+------------------------+
546546

547+
.. _timestamp_with_offset_extension:
548+
549+
Timestamp With Offset
550+
=====================
551+
This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes.
552+
This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines.
553+
554+
* Extension name: ``arrow.timestamp_with_offset``.
555+
556+
* The storage type of the extension is a ``Struct`` with 2 fields, in order:
557+
558+
* ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
559+
560+
* ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets normally range from -779 (-12:59) to +780 (+13:00).
561+
562+
* Extension type parameters:
563+
564+
This type does not have any parameters.
565+
566+
* Description of the serialization:
567+
568+
Extension metadata is an empty string.
569+
570+
.. note::
571+
572+
It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded.
573+
547574
Community Extension Types
548575
=========================
549576

0 commit comments

Comments
 (0)