Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: archivistactl import #358

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kairoaraujo
Copy link
Collaborator

@kairoaraujo kairoaraujo commented Aug 19, 2024

What this PR does / why we need it

Introduces a new archivistactl import

It allows Archivista users to import DSSE Envelopes directly to the Archivista database.

The feature allows direct import, which can help import huge amounts of data as it uses concurrency (go routines) to process.

Performance examples

Importing 2100 new DSSE Envelopes

Using default: --max-concurrent 3
Screenshot 2024-08-19 at 13 46 23

Using --max-concurrent 10
Screenshot 2024-08-19 at 13 51 28

Which issue(s) this PR fixes (optional)

(optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged)*

Related #319

Acceptance Criteria Met

  • Docs changes if needed
  • Testing changes if needed
  • All workflow checks passing (automatically enforced)
  • All review conversations resolved (automatically enforced)
  • DCO Sign-off

Special notes for your reviewer:

TODO:

  • Unit Tests
  • Sub-features (nice to have)
    • --exist-first fail in the first import error instead of skipping

I can add the TODO as new commits or as new PRs.

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

Attention: Patch coverage is 7.00000% with 93 lines in your changes missing coverage. Please review.

Project coverage is 1.64%. Comparing base (a035c62) to head (f670f06).
Report is 152 commits behind head on main.

Files Patch % Lines
cmd/archivistactl/cmd/import.go 7.00% 93 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #358       +/-   ##
==========================================
- Coverage   82.40%   1.64%   -80.76%     
==========================================
  Files          10     121      +111     
  Lines         358   28956    +28598     
==========================================
+ Hits          295     477      +182     
- Misses         43   28422    +28379     
- Partials       20      57       +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kairoaraujo kairoaraujo marked this pull request as draft August 19, 2024 13:30
@kairoaraujo kairoaraujo force-pushed the feat/import branch 2 times, most recently from d8ff65a to fe986e1 Compare August 20, 2024 14:01
@kairoaraujo kairoaraujo marked this pull request as ready for review August 21, 2024 05:43
Introduces a new `archivistactl` import

It allows Archivista users to import DSSE Envelopes directly to the
Archivista database.

The feature allows direct import, which can help importing huge amount
of data as it use concurrency (go routines) to process.

Signed-off-by: Kairo Araujo <[email protected]>
@mikhailswift
Copy link
Member

mikhailswift commented Nov 11, 2024

Definitely like this -- there's been a few times where I've hacked this in but never actually got it in a mergable state.

I think one thing we may eventually want to do is also support smarter bulk loading on the server side. We can do some better SQL optimizations when we know we're going to load in a bunch of data, but we can worry about that later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants