Delete the downloaded zip and folder in retrieve_dataset#2150
Delete the downloaded zip and folder in retrieve_dataset#2150
retrieve_dataset#2150Conversation
Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2150 +/- ##
==========================================
+ Coverage 91.02% 92.21% +1.19%
==========================================
Files 115 103 -12
Lines 6058 5141 -917
==========================================
- Hits 5514 4741 -773
+ Misses 544 400 -144
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| import shutil | ||
| import uuid | ||
|
|
||
| DOWNLOADED_DATASET_DIR = 'datasets.4.27.2021' |
There was a problem hiding this comment.
That seems oddly specific. Isn't it going to change?
There was a problem hiding this comment.
we use a specific zip folder with the date it was created for these datasets (that are put on our blob storage for tests)
There was a problem hiding this comment.
Right, so if the dataset changes this need to be updated. How will we remember?
| # if data not extracted, download zip and extract | ||
| outdirname = 'datasets.4.27.2021' | ||
| outdirname = DOWNLOADED_DATASET_DIR | ||
| if not os.path.exists(outdirname): |
There was a problem hiding this comment.
if the dataset is downloaded we don't re-download it here... we just re-use the downloaded one
| else: | ||
| raise Exception('Unrecognized file extension: ' + extension) | ||
|
|
||
| shutil.rmtree(outdirname) |
There was a problem hiding this comment.
won't this increase network calls and also chance tests fail due to networking issues? also I think this might increase test time a lot? we currently just re-use the downloaded file in all test cases instead of re-downloading it every time
Description
Delete the downloaded zip and folder in
retrieve_datasetChecklist