TFRecords File is too big! 10X the size of parquet

See similar git issues here:--
https://github.com/tensorflow/ecosystem/issues/61#issuecomment-363577011
https://github.com/tensorflow/ecosystem/issues/61
https://github.com/tensorflow/ecosystem/issues/106

This how I'm writing a PySpark dataframe to tf-records to an S3 bucket:---
```
s3_path = "s3://Shuks/dataframe_tf_records"   
df.write.mode("overwrite").format("tfrecord").option("recordType", "Example").save(s3_path)
```

This creates a new key/"directory" on S3 with the following path : s3://Shuks/dataframe_tf_records/
And under this directory are all the tf-records.

How do I specify compression type during conversion? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFRecords File is too big! 10X the size of parquet #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TFRecords File is too big! 10X the size of parquet #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions