Releases: fair-research/bdbag
BDBag release 1.5.5
Release Notes
Bugfix release.
- Ensure tag file manifest entries for additional tag files uses denormalized path separator (unix-style
/
) similar to payload file manifest entries. - Return result bag path from the
materialize()
function. - Don't use strict mode when guessing mime types to allow for user-extended types.
- Dropped Python 3.3 support.
BDBag release 1.5.4
BDBag release 1.5.3
Release Notes
Compatibility and feature micro release.
-
Added a monkeypatch for
hashlib.algorithms_guaranteed
prior to the
import of anybagit
code so thatbagit-1.7.0
(which assumes
algorithms_guaranteed
is present, but in reality only consistently
exists on Python 2.7.9 or greater) can still be used bybdbag
on
systems that only have Python 2.7.0 to 2.7.8 installed.
Lifted the strict pin on Python>=2.7.9. Note that this won't make
standalonebagit
installations work on these systems, but it will
allowbdbag
to successfully import and usebagit
as a library.
Additional notes
here. -
Added code to properly url encode whitespace and other illegal
characters in thefilename
field offetch.txt
, per thebagit
spec.
This will automatically be encoded whenbdbag
generates a bag from a
remote-file-manifest
, and will automatically decoded when attempting
to resolve files via fetch. Added a corresponding unit test. -
Added a new CLI validate option:
--completeness
. This is in parity
withbagit
CLI options and is useful primarily for determining which
files infetch.txt
have not yet been retrieved. Added a corresponding
unit test. -
Added code in the CLIs to print stack traces in when
--debug
is
specified.
BDBag release 1.5.1
Release Notes
Bugfix release
- Fixed bug with
bdbagit.save()
and "strict mode" version check logic that prohibited mixing of checksum types for payload files when thebagit
specification version of the bag being updated was <1.0
. Added a unit test that would have caught it.
BDBag release 1.5.0
Release Notes
Milestone feature release
-
Added
materialize
CLI and API function. The materialize function is basically a bag bootstrapper. When invoked, it will attempt to fully reconstitute a bag by performing multiple actions depending on the context of the inputpath
parameter. Ifpath
is an actionable URL or a URI of a resolvable identifier scheme, the file referenced by this value will first be downloaded to the current directory. Next, if thepath
value (or previously downloaded file) is a local path to a supported archive format, the archive will be extracted to the current directory. Then, if thepath
value (or previously extracted file) is a valid bag directory, any remote file references contained within the bag'sfetch.txt
file will attempt to be resolved. Finally, full validation will be run on the materialized bag. If any one of these steps fail, an error is raised. -
Refactored identifier resolution into a modular plug-in system. Added support for DOI and DataGUID identifier schemes in addition to existing ARK/Minid schemes. Additional schemes can be supported by creating a compliant "plug-in" resolver class and configuring it via the
bdbag.json
configuration file. -
Bagit specification version compliance is now configurable. The default specification version used is
0.97
which permits heterogeneous mixing of checksums in bag payload manifests. Fixes #27 and reverts the restriction introduced in release1.3.0
. -
Implement cloud storage fetch transports for access to secured Amazon S3 and Google Cloud Store via
boto3
library. GCS bucket and object access viaboto3
is only supported when the target GCS bucket is set to "interoperability mode". Theboto3
library is an optional runtime dependency and need only be installed if support for automatic download ofS3
orGS
URLs fromfetch.txt
entries is desired. Various parameters relating to the operation of this fetch handler are exposed via thebdbag.json
configuration file and can be tuned accordingly. Fixes #25. -
Numerous improvements to HTTP fetch handler:
- Support for "Authorization" header based authentication via the
keychain.json
configuration file. This authentication mode allows for Bearer Token authentication scenarios such as those used in OAuth 2.0 authorization flows. - Improved handling for cookie-based authentication. Added a configurable mechanism that scans for multiple Mozilla/Netscape/CURL/WGET compatible cookie files, merges them, and automatically uses them in outbound HTTP fetch requests.
- Exposed some of the
requests
module's session parameters in thebdbag.json
configuration file. This allows for tuning such values as connect/read retry count, backoff factor, and the status code retry forcelist, along with the option of disabling automatic redirect following.
- Support for "Authorization" header based authentication via the
-
Refactored
bdbag.json
configuration file processing into a separate module and significantly increased the scope of the configuration file. Added a basic mechanism for versioning the configuration file and upgrading existing config files to newer versions while preserving forward-compatible configuration settings, when possible. -
Improved unit test coverage.
-
Updated documentation.
BDBag release 1.4.1
Release Notes
Bugfix release
- Fix bug when no expr passed to filter_dict(), missed from code refactor.
BDBag release 1.4.0
Release Notes
Minor feature release
- Add partial (selective) fetch functionality to API and CLI per #20.
- Add an API and CLI function to automatically generate a basic RO manifest via bag introspection.
- Add 'Bagging-Time' as a default bag-info metadata element.
- Allow 'url' field in remote file manifest to be an array as well as a string, but only read array[0] when generating fetch.txt.
- Add logic to allow an RO manifest object to be "updated" without generating new unique URNs for existing nodes.
- Fixed some issues with keychain handling and HTTP fetch handler.
- Changed
globus-sdk
to a run-time dependency. - Numerous functional changes to bdbag-utils. New create-rfm-from-file function that can create an RFM by parsing a CSV file. Added documentation here.
- Added additional unit tests (copied over from https://github.com/LibraryOfCongress/bagit-python/blob/master/test.py, with modifications) for additional coverage of bdbagit.py.
- Improved unit test coverage.
- Update docs.
BDBag release 1.3.0
Release Notes
-
Enhanced RO/JSON-LD tagfile metadata support. Additions to the CLI and API now support the creation of the RO tagfile metadata directory and any associated JSON-LD files from a single JSON "meta-manifest". Coupling this with
remote-file-manifest
-based bag creation allows for entirely remote payloads but with local RO/JSON-LD metadata using only two metadata input files. -
Refactored the overridden manifest saving functions in
bdbagit.py
to be more inline with the currentbagit
approach and upcomingbagit
1.0 spec changes.IMPORTANT NOTES:
- Due to this change, it will no longer be possible to create/update bags using multiple checksum manifests unless every file in the payload is listed in every payload manifest. In other words, it now is an error condition to specify more than one checksum algorithm (e.g., both
md5
andsha256
) and not be able to calculate or provide all specified checksum types for each payload file, including those listed infetch.txt
. - The primary impact of this change is the creation of bags via
remote-file-manifest
, since the checksums for these files must be known a priori and therefore all remote file references must provide the same checksum algorithm type(s) uniformly across the entire set of payload files.
- Due to this change, it will no longer be possible to create/update bags using multiple checksum manifests unless every file in the payload is listed in every payload manifest. In other words, it now is an error condition to specify more than one checksum algorithm (e.g., both
-
Allow the
bag-info.txt
metadata valueContact-Orcid
to be specified when using the CLI via the argument--contact-orcid
. -
Fixed an issue with the handling of the
metadata
andmetadata_file
arguments ofmake_bag
that allowed for arbitrarily complex JSON content asbag-info.txt
lines. Per thebagit
spec, only string values are supported. -
Ensure URL escaping (of whitespace only) in generated
fetch.txt
URLs, perbagit
spec. -
Build universal (Python 2 and 3 compat.) wheels by default. Fixes #19
BDBag release 1.2.4
Release Notes
- Handle "-" when found as length field in fetch.txt, per bagit spec. BDBag can read and resolve files in bags which have unspecified content lengths, but will not allow them to be created via remote-file-manifests (because Payload-0xum cannot be reliably determined without byte counts for all referenced files), and will throw an exception during the creating/updating of a bag where an unspecified length is encountered.
- Fix duplicate manifest entry issue when creating/updating bags that have remote file references for payload files that are already present in the bag. It is now a conflict for a bag to have both a file in the local payload and in fetch.txt during create/update, and an exception will be thrown when this condition is detected.
- Ensure URL escaping in fetch.txt, per bagit spec.
- Don't emit blank lines when CLI is in quiet mode.
BDBag release 1.2.3
Release Notes
- Fix issue with bag extraction and directory nesting.