Skip to content

Excel formatted Date come out as 2 year digit even when the Excel date format is 4 year digit #879

@tamilselvanyes

Description

@tamilselvanyes

Am I using the newest version of the library?

  • I have made sure that I'm using the latest version of the library.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When reading excel which contains the dates in the format MM/DD/YYYY, after reading using the below

data_frame = (
spark.read.format("excel")
.option(
"header",
"true",
)
.option("maxByteArraySize", 2147483647)
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss")
.option("setErrorCellsToFallbackValues", "true")
.option("maxRowsInMemory", 200)
.load('ExcelReaderProblemExcel')
)

image

Data frame result:
[Row(Date MM/DD/YYYY='3/29/20'),
Row(Date MM/DD/YYYY='3/14/21'),
Row(Date MM/DD/YYYY='3/15/12'),
Row(Date MM/DD/YYYY='3/16/00'),
Row(Date MM/DD/YYYY='3/29/04'),
Row(Date MM/DD/YYYY='3/29/04'),
]
[Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
]

Since there are date from 2100, it could be correct if I directly use the above dates.
ExcelReaderProblemExcel.xlsx

Expected Behavior

The date string should come out same as shown in Excel.
But the date string came out with only 2 digit year instead of 4 digit year value.

Steps To Reproduce

No response

Environment

- Spark version: 3.5.0
- Spark-Excel version: spark-excel_2.12-3.5.0_0.20.3
- OS: Windows
- Cluster environment: -

Anything else?

There is a similar issue related to this issue, which is still open

#351

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions