Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus support: download optimizations #11125

Merged
merged 22 commits into from
Mar 7, 2025
Merged

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Dec 30, 2024

What this PR does / why we need it:

The PR improves the reliability of the Globus download framework, primarily by extending the new task-monitoring system first implemented for uploads in #10781. There are a few other improvements. For example, it fixes a somewhat exotic bug where Globus downloads weren't counted, but only for multi-file downloads and only when a guestbook popup was enabled.
The improvements are implemented in response to/based on the experience with the NESE Data Storage integration with Dataverse at IQSS. Most of these are already in prod. use there, deployed as an experimental beta build.

Which issue(s) this PR closes:

Special notes for your reviewer:

The single most important improvement is that, similarly to what was implemented in #10781 for uploads, the ongoing download tasks can now be monitored asynchronously, with the state saved in the database. This makes the management of the temporary access rules more robust, guaranteeing that Dataverse will register the completion of each task, even if there was a server restart in between.

Assorted other fixes and improvements were added. For example, it is now possible for a user to have simultaneous downloads runnig on files from the same dataset (when the task state is saved in the database, upon completion of a task it is easy to check if any other active tasks are using the same access rule and, if so, avoid deleting it). I fixed something misguided I did in #10781 when I first implemented saving the task state in the database: I misunderstood how the client tokens worked; and tried to save the token for each task, thinking that it needed to be reused throughout the life of the task. In reality, the same token can be used for multiple tasks on the same endpoint; also, for a long-running task that saved token has a good chance of expiring - and I didn't have a provision for that. Now the monitoring service simply caches the access tokens for each endpoint that it manages, and refreshes them as needed. General error handling, logging and state checking has been improved.

Suggestions on how to test this:

dataverse-internal has active Globus configuration tied to a fully-functioning remote storage volume at NESE, identical to our prod. volume there. There are existing datasets with multiple large-ish (100MB+) Globus-stored files (for example: https://dataverse-internal.iq.harvard.edu/dataset.xhtml?persistentId=doi:10.70122/FK2/QZQPQE); more data can be uploaded for testing as needed. QA would involve verifying that such files can be downloaded; with an emphasis on repeating downloads from the same dataset, and perhaps parallel downloads of different files from the same dataset at the same time. For the end user, everything should work as described in the Globus download instruction (written for the prod. users, but fully applies to the test configuration on internal).

One extra thing not directly related to the Globus tech fixed in this PR: there was a glaring inefficiency in the dataset page code in how it was looking up various external tools and previewers for each file on the page, introduced 6 releases ago (!) but then forgotten about. It was re-discovered during this effort and I committed the fixes since they were trivial. So, please check on the previewers and such, before and after, and confirm that they are still shown on the page properly.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@landreev landreev added this to the 6.6 milestone Feb 12, 2025
@landreev landreev self-assigned this Feb 12, 2025
@landreev landreev marked this pull request as ready for review February 25, 2025 16:55
@landreev
Copy link
Contributor Author

I un-drafter the PR; waiting for a Jenkins test, will put the PR on the board if it passes

This comment has been minimized.

@landreev landreev removed their assignment Feb 25, 2025
@stevenwinship stevenwinship self-assigned this Feb 25, 2025
@landreev landreev requested a review from qqmyers February 25, 2025 18:44
@landreev
Copy link
Contributor Author

I just killed the last Jenkins build (no. 14), because it was triggered by a commit of a typo fix in a comment.

This comment has been minimized.

@stevenwinship stevenwinship added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Feature: Globus FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) labels Feb 26, 2025

This comment has been minimized.

…l debt from 1.5 yrs

and 6 releases ago; an annoying inefficiency in the dataset page that we forgot about,
that came up during the work on the pr. #11057

This comment has been minimized.

@cmbz cmbz added the FY25 Sprint 18 FY25 Sprint 18 (2025-02-26 - 2025-03-12) label Feb 27, 2025
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11057-globus-downloads
ghcr.io/gdcc/configbaker:11057-globus-downloads

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev landreev removed their assignment Feb 27, 2025
@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Mar 5, 2025

Getting internal server error when I push PR to internal:
image

@qqmyers
Copy link
Member

qqmyers commented Mar 5, 2025

Looks like the error is not related to the PR - a consequence of testing #11068 first which drops a datasetversion column.

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Mar 6, 2025

@landreev How would I get past this?:

Screen.Recording.2025-03-06.at.12.34.45.PM.mov

Edit: Jim hooked me up. All's I needed to do was download connect-personal and set that up.

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Mar 6, 2025

Testing Evidences:

image
image

Parallel Upload successful with large files:
image
image

Parallel Downloading:

https://github.com/user-attachments/assets/4540dda7-2cb0-4989-8ee3-c6bb49c876cf
image

Getting Globus Emails:
image

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Mar 6, 2025

Maybe not related to the PR but here is an observation I made:
When there are too many ineligible files selected on a Globus download, the message box doesn't allow user to scroll down and hit continue button.

Screen.Recording.2025-03-06.at.2.32.47.PM.mov

@ofahimIQSS
Copy link
Contributor

ofahimIQSS commented Mar 7, 2025

Maybe not related to the PR but here is an observation I made: When there are too many ineligible files selected on a Globus download, the message box doesn't allow user to scroll down and hit continue button.

A separate PR has been opened to track this: #11316. Merging PR.

@ofahimIQSS ofahimIQSS merged commit 1a4a43c into develop Mar 7, 2025
20 checks passed
@ofahimIQSS ofahimIQSS deleted the 11057-globus-downloads branch March 7, 2025 19:47
@ofahimIQSS ofahimIQSS removed their assignment Mar 7, 2025
@landreev
Copy link
Contributor Author

landreev commented Mar 7, 2025

Thank you for merging the PR and for the detailed report!
Sorry for the delay, I just got around to looking into this.
(remember - I am, technically, off today 🥲)
Special thanks to @qqmyers for the assistance with testing.

Opening the new issue was the right thing to do, that's what I would have recommended myself. I'll add a comment there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Globus FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) FY25 Sprint 18 FY25 Sprint 18 (2025-02-26 - 2025-03-12) Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: Merged 🚀
Development

Successfully merging this pull request may close these issues.

Globus integration: improve handling of downloads
5 participants