Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading resources are wasted if threads are downloading the same file #481

Open
jessebrennan opened this issue Nov 4, 2019 · 0 comments
Labels
orange Done by the Azul, Data Browser and Portal team

Comments

@jessebrennan
Copy link
Collaborator

With the current architecture, it's possible that multiple threads are downloading the same file.

This does not affect correctness of the download because of the filestore layout. It does affect efficiency. Because of "copy forward", the same files appear in both the primary and secondary bundles. If both are adjacent in the same manifest, the duplicate download becomes much more likely.

One idea for a solution would be to keep a global table of all of the files that are currently downloading / downloaded. Threads can check this table and sleep if the files already exists.

Another idea would be to have a .tmp version of the file that exists until the download is complete. I have not thought through all of the implications of this design.

@theathorn theathorn added the orange Done by the Azul, Data Browser and Portal team label Nov 7, 2019
@theathorn theathorn added this to the Q4 2019 Milestone 3 milestone Nov 19, 2019
@theathorn theathorn removed this from the Q4 2019 Milestone 3 milestone Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange Done by the Azul, Data Browser and Portal team
Projects
None yet
Development

No branches or pull requests

2 participants