Skip to content

Commit

Permalink
Merge pull request #316 from uoregon-libraries/release/v4.4.0
Browse files Browse the repository at this point in the history
v4.4.0 Release
  • Loading branch information
jechols authored May 30, 2024
2 parents 7b6abbb + 300377e commit 4d09de9
Show file tree
Hide file tree
Showing 10 changed files with 42 additions and 30 deletions.
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,40 @@ Brief description, if necessary
### Migration
-->

## v4.4.0

Better batching, and a tiny bit of cleanup.

Batch generation is now much faster, as the SHA sums are generated after each
issue is curated instead of all at once on batch generation. Additionally,
batches minimum/maximum page count can be overridden on the command line to
allow one-off batching runs without modifying settings.

### Fixed

- tests: reports run on separate days should be a lot easier to compare
- tests: passed-in test names work again

### Added

- `queue-batches` now optionally takes command-line flags to override the
min/max batch size settings. This was done to allow cron jobs that behave
differently than a manual run.
- jobs: added simple unit tests for bagit jobs
- The "Find Issues" view explains why a new batch's issues might not show up
for a while

### Changed

- After curation, a `.manifest` file is generated in the issue's directory
which contains the SHA256 sums of all files in the issue. This data is then
used when batches are generated to significantly reduce the time it takes to
finish generating a batch.

### Removed

- Removed a variety of dead functions and structs

## v4.3.2

Minor improvements.
Expand Down
5 changes: 0 additions & 5 deletions changelogs/2024-04-12-batch-cli-flags.md

This file was deleted.

4 changes: 0 additions & 4 deletions changelogs/2024-04-17-tests.md

This file was deleted.

3 changes: 0 additions & 3 deletions changelogs/2024-04-18-job-tests.md

This file was deleted.

3 changes: 0 additions & 3 deletions changelogs/2024-05-17-deadcode.md

This file was deleted.

4 changes: 0 additions & 4 deletions changelogs/2024-05-20-explain-batch-delay.md

This file was deleted.

6 changes: 0 additions & 6 deletions changelogs/2024-05-20-precompute-sha.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/contributing/dev-howto/add-job-types/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ <h2 id="table-of-contents" class="sr-only">Table Of Contents</h2>
<li>Queue a job of the new type.
<ul>
<li>See <a href="https://github.com/uoregon-libraries/newspaper-curation-app/blob/main/src/jobs/queue.go"><code>src/jobs/queue.go</code></a></li>
<li>You might need to create a new arg value in <code>src/models/pipeline.go</code>, like
<li>You might need to create a new arg value in <code>src/jobs/queue.go</code>, like
<code>JobArgSource</code>, <code>JobArgWorkflowStep</code>, etc.</li>
<li>You will certainly need to create the job and push it into a queue. This
happens in a <code>Queue...</code> function (e.g., <code>QueueBatchForDeletion</code>).</li>
Expand Down
9 changes: 6 additions & 3 deletions docs/setup/services/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta itemprop="name" content="Services and Apps">
<meta itemprop="description" content="The services in the NCA suite">

<meta itemprop="wordCount" content="1049">
<meta itemprop="wordCount" content="1077">
<meta itemprop="keywords" content="" /></head>
<body>
<div class="container"><div class="skip"><a href="#maincontent">Skip to main content</a></div>
Expand Down Expand Up @@ -292,8 +292,8 @@ <h2 id="overview">Overview</h2>
reverse-engineer by reading the various docker files to see what you need to
install, and potentially how to install it.</p>
<p><strong>Note</strong>: If you do go manual, the repository contains working examples for
RHEL 7 systemd services to start the job runner as well as the workflow http
server: <a href="https://github.com/uoregon-libraries/newspaper-curation-app/tree/main/rhel7">https://github.com/uoregon-libraries/newspaper-curation-app/tree/main/rhel7</a>.
systemd services to start the job runner as well as the workflow http
server: <a href="https://github.com/uoregon-libraries/newspaper-curation-app/tree/main/deploy">https://github.com/uoregon-libraries/newspaper-curation-app/tree/main/deploy</a>.
Consider looking at these to better understand how you might manage a
production installation.</p>
<h2 id="http-server">HTTP Server</h2>
Expand Down Expand Up @@ -350,6 +350,9 @@ <h2 id="batch-queue">Batch Queue</h2>
your configured <code>BATCH_OUTPUT_PATH</code> and syncing to the <code>BATCH_PRODUCTION_PATH</code>.
The batch status page in NCA will show which batches have finished processing
and are ready for ingest into staging.</p>
<p>The tool can be given flags for <code>--min-batch-size</code> and <code>--max-batch-size</code> in
order to override the standard settings, e.g., if you need cronned batch
generation to behave differently than manual runs.</p>
<h2 id="bulk-upload-queue">Bulk Upload Queue</h2>
<p>The <code>bulk-issue-queue</code> tool allows you to push uploaded issues into the
workflow in bulk. This should <em>only</em> be used when you have some other
Expand Down
2 changes: 1 addition & 1 deletion src/internal/lastmod/lastmod.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ func Time(pth string) (time.Time, error) {

// Different existing manifest (including not having an existing manifest)?
// Write new data and return the current time.
if !existing.Equiv(refreshed) {
if err != nil || !existing.Equiv(refreshed) {
err = refreshed.Write()
if err != nil {
return time.Now(), err
Expand Down

0 comments on commit 4d09de9

Please sign in to comment.