Skip to content

Breaking change: <haven_labelled> objects no longer support arithmetic — destroys reproducibility and punishes labelled workflows #785

@nickharrigan

Description

@nickharrigan

Summary
I am reporting a major regression introduced through vctrs and propagated into haven. Code that worked for years with <haven_labelled> vectors now throws hard errors such as:

Error in `vec_arith()`:
! <haven_labelled> - <haven_labelled> is not permitted

This breaks reproducibility for researchers who invested heavily in labelled data workflows. What previously worked seamlessly (statistical tests, arithmetic, plotting) now forces us to strip labels and discard metadata.

What changed

  • In earlier versions of haven + vctrs, labelled vectors behaved as numeric in arithmetic contexts (e.g. t-tests, regression, means). Labels provided metadata, but math worked.
  • As of recent vctrs releases, arithmetic on <haven_labelled> is explicitly forbidden. Instead of coercion, code now halts with errors.
  • This breaks pipelines that previously ran without issue, especially in research where labelling is core to data integrity.

Why this matters

I (and many others) have put huge effort into carefully labelling survey data (hundreds of variables, hundreds of hours).

Labelling is not decoration — it is intellectual work: preserving codebooks, ensuring correct interpretation, protecting against mistakes.

Under the new rules, all that investment becomes a liability. To continue analysis, I must either:

  • Strip all labels (zap_labels()), losing metadata, or
  • Rewrite large amounts of code to wrap every variable in as.numeric().

This is an unacceptable cost and undermines reproducibility. Code that produced results five years ago no longer runs today.

Impact on reproducibility

  • Published pipelines, teaching materials, and collaborative projects now fail.
  • Results cannot be regenerated without altering code and discarding metadata.
  • This undermines trust in R as a stable scientific environment.

Request / Proposal

  • Restore arithmetic support for <haven_labelled> objects when underlying values are numeric.
  • At minimum: allow safe coercion to numeric by default, with a warning if labels are present.
  • This was the historical behaviour and respected both metadata and usability.

If you will not restore this behaviour, provide:

  • A global option (e.g. options(haven.arithmetic = "coerce"))
  • Or a helper (e.g. as_numeric_with_labels()) that strips labels at analysis time but keeps a retrievable dictionary.
  • Communicate clearly in release notes and migration guides that arithmetic on labelled data is now disabled, with explicit recommendations for migration.

Closing
This is not a minor breaking change. It directly punishes users who took the time to carefully label data, and forces us to choose between junking labels or junking code. That is hostile to research workflows.

I urge the maintainers to consider how this decision impacts reproducibility, and to provide a path forward that does not discard the enormous investment researchers have made in labelled datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    reprexneeds a minimal reproducible example

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions