-
Notifications
You must be signed in to change notification settings - Fork 162
Description
Am I using the newest version of the library?
- I have made sure that I'm using the latest version of the library.
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
When reading excel which contains the dates in the format MM/DD/YYYY, after reading using the below
data_frame = (
spark.read.format("excel")
.option(
"header",
"true",
)
.option("maxByteArraySize", 2147483647)
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss")
.option("setErrorCellsToFallbackValues", "true")
.option("maxRowsInMemory", 200)
.load('ExcelReaderProblemExcel')
)
Data frame result:
[Row(Date MM/DD/YYYY='3/29/20'),
Row(Date MM/DD/YYYY='3/14/21'),
Row(Date MM/DD/YYYY='3/15/12'),
Row(Date MM/DD/YYYY='3/16/00'),
Row(Date MM/DD/YYYY='3/29/04'),
Row(Date MM/DD/YYYY='3/29/04'),
]
[Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
Row(UTF-8 strings='Portégé'),
]
Since there are date from 2100, it could be correct if I directly use the above dates.
ExcelReaderProblemExcel.xlsx
Expected Behavior
The date string should come out same as shown in Excel.
But the date string came out with only 2 digit year instead of 4 digit year value.
Steps To Reproduce
No response
Environment
- Spark version: 3.5.0
- Spark-Excel version: spark-excel_2.12-3.5.0_0.20.3
- OS: Windows
- Cluster environment: -Anything else?
There is a similar issue related to this issue, which is still open
