Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2830 persistent endpoints for downloads #2936

Open
wants to merge 15 commits into
base: development
Choose a base branch
from

Conversation

mjy
Copy link
Member

@mjy mjy commented Apr 21, 2022

This adds the following endpoints, allowing for the creation of DwC Archive dumps via the external api:

  • POST http://127.0.0.1:3000/api/v1/downloads/build?project_token=<>&token=<>&type=Download::DwcArchive::Complete
  • DELETE http://127.0.0.1:3000/api/v1/downloads/123?project_token=<>&token=<>
  • GET http://127.0.0.1:3000/api/v1/downloads/123?project_token=<>
  • GET http://127.0.0.1:3000/api/v1/downloads/123/file?project_token=<> # return the binary

TODO:

  • Confirm CSRF modifications to allow delete and post to /api/v1 are not otherwise dangerous
  • Test time to calculate sha2 for large exports
  • Test that the SHA2 calculation (on the zipfile) is repeatble by other non-Ruby calculators
  • Allows data_attributes params to pass through in requests
  • Ensure basic, non external Download models are not impacts (auto tests should do this)
  • Confirm that ActiveJobs are cancelled (raise quietly, but see if we can log if noisy) when the corresponding serialized Download object is missing from the database. This is the easiest way to cancel the job
  • Determine whether we need to cancel jobs "in progress". If needed the likely way to do this is to 1) update the Download instance at the start of build time to indicate builds are in-progress (new Download attribute/migration) then 2) prevent DELETE requests if "in progress" . We could extend this idea to use a % complete style "in progress" tracker to indicate roughly how far we are along. Not nil in this field would indicate the build is in progress.

Changelog

Added

  • Download API enpoints and model to facilitate DwC Archive building remotely #2830
  • Option to include project creators as members on new project creation #2410

@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2022

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (5c9679b) 85.80% compared to head (8d16bad) 85.74%.

❗ Current head 8d16bad differs from pull request most recent head 6670fc1. Consider uploading reports for the commit 6670fc1 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@               Coverage Diff               @@
##           development    #2936      +/-   ##
===============================================
- Coverage        85.80%   85.74%   -0.07%     
===============================================
  Files             2013     2015       +2     
  Lines            73745    73622     -123     
===============================================
- Hits             63277    63126     -151     
- Misses           10468    10496      +28     
Files Coverage Δ
app/controllers/projects_controller.rb 78.26% <ø> (ø)
app/jobs/dwca_create_download_job.rb 70.00% <ø> (ø)
app/models/download/dwc_archive/complete.rb 100.00% <100.00%> (ø)
app/models/project.rb 76.92% <100.00%> (+0.36%) ⬆️
lib/export/download.rb 97.61% <ø> (ø)
lib/export/dwca.rb 88.46% <ø> (ø)
app/helpers/projects_helper.rb 39.13% <66.66%> (+1.25%) ⬆️
app/models/cached_map.rb 66.66% <50.00%> (-1.76%) ⬇️
app/models/download.rb 96.00% <90.90%> (-1.50%) ⬇️
spec/models/download/dwc_archive/complete_spec.rb 94.11% <94.11%> (ø)
... and 2 more

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LocoDelAssembly
Copy link
Contributor

Is my understanding correct that POST/DELETE require user token and project token alone is not enough to authorize access?

Was about to raise concerns about download request flooding but found out that there is protection against that by not allowing more than a download. Perhaps will need some security measures to prevent supplying any download type, but at least I tried with type=Download::Text&name=test and it still marked that name cannot be blank.

@mjy
Copy link
Member Author

mjy commented Jun 9, 2022

Is my understanding correct that POST/DELETE require user token and project token alone is not enough to authorize access?

Yes, POST/PATCH should require user.

Only specific download types are allowed, and n=1 right now, so hopefully that is not an issue.

Thanks for testing.

@LocoDelAssembly
Copy link
Contributor

@LordFlashmeow I think this is tracking your file download requirements?

Allow data_attributes params to pass through in requests
Would that mean filtering by attributes of objects or rather by DataAttribute associations?

@LocoDelAssembly
Copy link
Contributor

* [ ]  Test time to calculate sha2 for large exports
3.1.2 :048 > Benchmark.bm do |x|
3.1.2 :049 >   (1..32).each do |power|
3.1.2 :050 >     data = SecureRandom.random_bytes(2**power)
3.1.2 :051 >     x.report(number_to_human_size(2**power)) { Digest::SHA256.hexdigest(data) }
3.1.2 :052 >   end
3.1.2 :053 > end
       user     system      total        real
2 Bytes  0.000002   0.000008   0.000010 (  0.000009)
4 Bytes  0.000001   0.000003   0.000004 (  0.000004)
8 Bytes  0.000017   0.000000   0.000017 (  0.000003)
16 Bytes  0.000003   0.000000   0.000003 (  0.000003)
32 Bytes  0.000003   0.000000   0.000003 (  0.000003)
64 Bytes  0.000004   0.000000   0.000004 (  0.000003)
128 Bytes  0.000004   0.000000   0.000004 (  0.000003)
256 Bytes  0.000005   0.000000   0.000005 (  0.000004)
512 Bytes  0.000006   0.000000   0.000006 (  0.000005)
1 KB  0.000008   0.000000   0.000008 (  0.000007)
2 KB  0.000011   0.000000   0.000011 (  0.000011)
4 KB  0.000019   0.000000   0.000019 (  0.000018)
8 KB  0.000034   0.000000   0.000034 (  0.000033)
16 KB  0.000063   0.000000   0.000063 (  0.000062)
32 KB  0.000121   0.000000   0.000121 (  0.000121)
64 KB  0.000000   0.000238   0.000238 (  0.000238)
128 KB  0.000000   0.000504   0.000504 (  0.000503)
256 KB  0.000988   0.000000   0.000988 (  0.000987)
512 KB  0.000379   0.001533   0.001912 (  0.001912)
1 MB  0.003588   0.001058   0.004646 (  0.004630)
2 MB  0.008464   0.000000   0.008464 (  0.008475)
4 MB  0.015879   0.000000   0.015879 (  0.015879)
8 MB  0.031285   0.000000   0.031285 (  0.031285)
16 MB  0.060420   0.001607   0.062027 (  0.062027)
32 MB  0.121886   0.000000   0.121886 (  0.121886)
64 MB  0.245944   0.000000   0.245944 (  0.245946)
128 MB  0.488286   0.000393   0.488679 (  0.488710)
256 MB  0.980587   0.000887   0.981474 (  0.981545)
512 MB  1.968927   0.000845   1.969772 (  1.969840)
1 GB  3.945651   0.001307   3.946958 (  3.947090)
2 GB  7.932234   0.000000   7.932234 (  7.932557)
4 GB 16.372113   0.008870  16.380983 ( 16.388204)

@LordFlashmeow
Copy link
Contributor

@LordFlashmeow I think this is tracking your file download requirements?

Allow data_attributes params to pass through in requests
Would that mean filtering by attributes of objects or rather by DataAttribute associations?

Yes. Ideally we could provide a list of predicate IDs to include as a parameter to the request.

@mjy
Copy link
Member Author

mjy commented Aug 22, 2022

@LordFlashmeow Predicate params should be there already, I should have say "Allows", i.e. they need testing, not implementation to pass that check:

    params.permit(collecting_event_predicate_id: [], collection_object_predicate_id: [] ).transform_keys(&:to_sym).to_h

@mjy mjy force-pushed the 2830_persistent_endpoints_for_downloads branch from 3395b65 to c7f7e2b Compare September 4, 2022 15:55
@mjy
Copy link
Member Author

mjy commented Oct 5, 2022

@LocoDelAssembly @LordFlashmeow I'd like to merge this, but do need help confirming the last 3 checks are OK.


after_action -> { set_pagination_headers(:downloads) }, only: [:api_index], if: :json_request?

skip_forgery_protection only: [:api_build, :api_destroy]

Check failure

Code scanning / CodeQL

CSRF protection weakened or disabled

Potential CSRF vulnerability due to forgery protection being disabled or weakened.
@gdower
Copy link
Contributor

gdower commented Jun 8, 2023

I tested COLDP and DarwinCore dashboard downloads with development merged and both still worked.

@gdower
Copy link
Contributor

gdower commented Jun 8, 2023

If you delete a Download while it is running, the job completes and the download file is generated (but it may not be a problem as discussed with @mjy). At least you can't access the file for a deleted download at: http://localhost:3000/api/v1/downloads/{id}/file

I'm not sure how to do a call with the data_attributes or couldn't get it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants