You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For TFDS 4.9.7 on Dataflow 2.60.0, I have a company-internal Dataflow job that fails. Given the input collection:
Elements added 332,090
Estimated size 1.74 TB
to train_write/GroupShards, where the output collection reports:
Elements added 2
Estimated size 1.8 GB
it then fails on the next element with
"E0123 207 recordwriter.cc:401] Record exceeds maximum record size (1096571470 > 1073741823)."
Workaround
By installing the TFDS prerelease after 3700745 and controlling --num_shards=4096 (auto-detection choose 2048), the DatasetBuilder runs to completion on Dataflow. I'm curious why the auto-detection didn't choose more file shards however, as all training examples should be roughly the same size in this DatasetBuilder.
is too little headroom for the training examples. The FeatureDict in this particular DatasetBuilder is large, and perhaps the key overhead is unusually large. Should that number be 0.8 instead? Or whether
For TFDS 4.9.7 on Dataflow 2.60.0, I have a company-internal Dataflow job that fails. Given the input collection:
to
train_write/GroupShards
, where the output collection reports:it then fails on the next element with
Workaround
By installing the TFDS prerelease after 3700745 and controlling
--num_shards=4096
(auto-detection choose 2048), the DatasetBuilder runs to completion on Dataflow. I'm curious why the auto-detection didn't choose more file shards however, as all training examples should be roughly the same size in this DatasetBuilder.Suggested fix
Maybe this
datasets/tensorflow_datasets/core/utils/shard_utils.py
Line 79 in 9969ce5
datasets/tensorflow_datasets/core/utils/shard_utils.py
Line 54 in 9969ce5
Side remark
Surprisingly Dataflow limits mention
which doesn't seem to be true in practice since the GroupBy fails on ~1 GB as per the logged error.
The text was updated successfully, but these errors were encountered: