Skip to content

[BUG] Spark 4.0.0 v2 Implementation Header and Data Address Options #986

@kaylee-misuraca

Description

@kaylee-misuraca

Am I using the newest version of the library?

  • I have made sure that I'm using the latest version of the library.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

  1. The v2 implementation doesn't seem to recognize the header option. If it is included or left off, the result is the same and the header is read as the first row of data.
  2. The v2 implementation doesn't read past the start cell provided.

Expected Behavior

  1. I expect the header to be recognized when the option is passed as header = True
  2. I expect the starting cell given to the dataAddress option to return that cell and all rows below and columns to the right

Steps To Reproduce

Read in downloaded file:

issue_example.xlsx

spark.read.format("excel").option("header", "true").load([downloaded file from above])

Attempt to start reading at second row:

spark.read.format("excel").option("header", "true").option("dataAddress", "'Sheet1'!A2").load([downloaded file from above])

Environment

- Spark version: 4.0.0
- Spark-Excel version: 4.0.0_0.31.2
- OS: Azure Databricks Ubuntu 24.04.2 LTS
- Cluster environment: DBR 17.0

Anything else?

The v1 implementation seems to be working as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions