From d4745a380731eb8de71d80da4a32f0993863f90a Mon Sep 17 00:00:00 2001 From: Roberto Di Cosmo Date: Wed, 1 Nov 2023 17:53:45 +0100 Subject: [PATCH 01/10] Expand FOSS acronym. --- Chapters/5.Core_identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index f3e417f..05f9785 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -204,7 +204,7 @@ the [Darktable release 2.3.0](https://archive.softwareheritage.org/swh:1:rel:22e Any kind of software origin offers multiple pointers to the “current” state of a development project. In the case of VCS this is reflected by branches (e.g., master, development, but also so called feature branches dedicated to extending the software in a specific direction); in the case of package distributions by notions such as suites that correspond to different maturity levels of individual packages (e.g., stable, development, etc.). -A “snapshot” of a given software origin records all entry points found there and where each of them was pointing at the time. For example, a snapshot object might track the commit where the master branch was pointing to at any given time, as well as the most recent release of a given package in the stable suite of a FOSS distribution. +A “snapshot” of a given software origin records all entry points found there and where each of them was pointing at the time. For example, a snapshot object might track the commit where the master branch was pointing to at any given time, as well as the most recent release of a given package in the stable suite of a free and open source software (FOSS) distribution. Practically, a snapshot is a list of named branches pointing at objects of any of the known types (content, directory, revision, release or snapshot). A branch can also be an alias to another (named) branch, for instance the default `"HEAD"` branch can point at another, more specific, `"refs/heads/main"` branch. From e2d1e55df9f25a92a183d273df0c9bf574038940 Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 18:46:17 +0100 Subject: [PATCH 02/10] added missing ":" colon --- Chapters/5.Core_identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index 05f9785..66db662 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -26,7 +26,7 @@ computed from the content and relevant metadata of the object. A *content* is an uninterpreted byte sequence, typically, the content of a file. For this type of object the intrinsic identifier is the `sha1_git` hash of it, -i.e. the SHA1 of the byte sequence obtained by juxtaposing +i.e. the SHA1 of the byte sequence obtained by juxtaposing: - the ASCII string `"blob"` (4 bytes), - an ASCII space, From 06bfe3583c0944e1e8e2daad4195f8581b060e12 Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 21:21:14 +0100 Subject: [PATCH 03/10] added URL and SPDX identifier to the license notice --- Chapters/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Chapters/index.md b/Chapters/index.md index a721104..2ad0853 100644 --- a/Chapters/index.md +++ b/Chapters/index.md @@ -1,8 +1,8 @@ # The SWHID Specification Version 1.1 -Copyright © 2022-2023 SWHID Contributors. +Copyright © 2022–2023 SWHID Contributors. -This work is licensed under the Community Specification License 1.0. +This work is licensed under the [Community Specification License 1.0](https://spdx.org/licenses/Community-Spec-1.0.html) (Community‑Spec‑1.0). With thanks to Alexios Zavras, From 6240508c089ce928276583069fe4ba17d5f44eee Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 21:31:27 +0100 Subject: [PATCH 04/10] copy-edits --- Chapters/1.Scope.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Chapters/1.Scope.md b/Chapters/1.Scope.md index b9876f4..a007fb4 100644 --- a/Chapters/1.Scope.md +++ b/Chapters/1.Scope.md @@ -2,7 +2,7 @@ This SoftWare Hash IDentifier (SWHID) specification defines a standard data format for referencing digital artifacts that -fit in the data model of modern distributed version control systems. +match the data model of modern distributed version control systems. This includes the typical tree-like structure of a filesystem hierarchy, but also special nodes to track revisions and releases, as well as the @@ -11,7 +11,7 @@ branches. A key property of SWHIDs is that they can be computed using cryptographically strong functions directly from the digital objects they refer to, by anyone that -has access to a copy of them. This enables decentralised and independent +has access to a copy of those objects. This enables decentralised and independent verification of integrity, without relying on a registry or a central authority. The computation of the SWHID identifiers is based on Merkle Acyclic Directed From 8e0449763f0ed85f515f454a3630952ec7b00a61 Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 21:34:11 +0100 Subject: [PATCH 05/10] copy-edits --- Chapters/3.Terms_and_definitions.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Chapters/3.Terms_and_definitions.md b/Chapters/3.Terms_and_definitions.md index f0013ca..c578d80 100644 --- a/Chapters/3.Terms_and_definitions.md +++ b/Chapters/3.Terms_and_definitions.md @@ -21,15 +21,15 @@ Git is a distributed version control system created by Linus Torvalds in 2005. I ## 3.3 hierarchical file system -A hierarchical file system is a method of organizing and managing files in a computer where data is stored hierarchically (in a structure often visualized as a tree). It uses directories (or 'folders') to organize files into a tree structure. Each directory can contain more files and directories, thus forming a hierarchical structure. +A hierarchical file system is a method of organizing and managing files in a computer where data is stored hierarchically. It uses directories (or 'folders') to organize files into a tree structure. Each directory can contain more files and directories, thus forming a hierarchical structure. ## 3.4 intrinsic identifier -An identifier that can be computed directly from the object that it identifies, without needing a registry. Typical examples are cryptographically strong hashes. +An identifier that can be computed directly from the object that it identifies, without needing access to a registry. Typical examples are cryptographically strong hashes. ## 3.5 repository -In the context of version control systems, a repository is a storage location for software development artifacts including but not limited to source code, build scripts, documentation, etc. It often includes metadata about the stored items, such as version number, author, date of the last modification, etc. Repositories can be local or remote and are managed by version control systems like Git. +In the context of version control systems, a repository is a storage location for software development artifacts including but not limited to source code, build scripts, and documentation. It often includes metadata about the stored items, such as version number, author, and date of the last modification. Repositories can be local or remote and are managed by version control systems like Git. ## 3.6 SHA1 @@ -49,4 +49,4 @@ Note that in most cases SHA1 in this specification are computed on objects after ## 3.7 version control system -A version control system (VCS), also known as source control or revision control, is a software tool that helps manage different versions of software development artifacts. It keeps track of all changes made to the code, allows multiple developers to work on the same codebase, and provides mechanisms for merging changes, reverting changes, and branching and merging of code. Examples include Git, Mercurial, and Subversion. +A version control system (VCS), also known as source control or revision control, is a software tool that helps manage different versions of software development artifacts. It keeps track of all changes made to the code, allows multiple developers to work on the same codebase, and provides mechanisms for merging changes, reverting changes, and the branching and merging of code. Examples include Git, Mercurial, and Subversion. From f593eba5a32a1def6a66e08349adfe9d0e78d94a Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 21:39:09 +0100 Subject: [PATCH 06/10] copy-edits --- Chapters/5.Core_identifiers.md | 4 ++-- Chapters/6.Qualified_identifiers.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index 66db662..19060c9 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -148,9 +148,9 @@ As an example, `swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d` is the SWHID ## 5.4 Releases -Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to the release and carries metadata: release name and version, release message, cryptographic signatures, etc. If they're not attached to development history (e.g. if they've been imported from bare tarballs), releases can also point directly to a root directory instead of a full revision with metadata. +Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, etc. If they're not attached to development history (e.g. if they've been imported from bare tarballs), releases can also point directly to a root directory instead of a full revision with metadata. -The supported metadata is as follows: +The metadata fields supported by SWHID are as follows: - name (arbitrary byte sequence, mandatory): a name identifying the release - author (arbitrary byte sequence): generally contains the name and email address of the author of the release. - author timestamp (decimal timestamp from the Unix epoch): the date at which the release was authored. diff --git a/Chapters/6.Qualified_identifiers.md b/Chapters/6.Qualified_identifiers.md index c644976..2672171 100644 --- a/Chapters/6.Qualified_identifiers.md +++ b/Chapters/6.Qualified_identifiers.md @@ -70,7 +70,7 @@ For example, [`swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https:/ indicates that the content seen previously with the function `generate_input_stream` has been seen in the Git repository at `https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git` -This qualifier may be helpful to get hold of the full repository where a +This qualifier may help to get hold of the full repository where a content has been found, but there is no guarantee of success, as an origin can change or disappear over time (as is the case in the example above, since gitorious.org was shut down in 2015). From 5c67cf7975ba17733160696fa179a0f4779dd652 Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 21:46:09 +0100 Subject: [PATCH 07/10] expanded abbreviations for i.e., e.g., and etc. --- Chapters/1.Scope.md | 2 +- Chapters/4.Syntax.md | 2 +- Chapters/5.Core_identifiers.md | 10 +++++----- Chapters/6.Qualified_identifiers.md | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Chapters/1.Scope.md b/Chapters/1.Scope.md index a007fb4..006ce01 100644 --- a/Chapters/1.Scope.md +++ b/Chapters/1.Scope.md @@ -17,5 +17,5 @@ verification of integrity, without relying on a registry or a central authority. The computation of the SWHID identifiers is based on Merkle Acyclic Directed Graphs, a natural generalization of Merkle trees. -The resolution of SWHIDs, i.e. the process of obtaining a copy of a digital +The resolution of SWHIDs, that is, the process of obtaining a copy of a digital artifact corresponding to a given SWHID, is out of the scope of this specification. diff --git a/Chapters/4.Syntax.md b/Chapters/4.Syntax.md index 87d6228..fc80cc4 100644 --- a/Chapters/4.Syntax.md +++ b/Chapters/4.Syntax.md @@ -53,5 +53,5 @@ The last two symbols are defined as: In both of these, all occurrences of `;` (and `%`, as required by the RFC) have been percent-encoded (as `%3B` and `%25` respectively). Other -characters *may* be percent-encoded, e.g., to improve readability and/or +characters *may* be percent-encoded, for example, to improve readability and/or embeddability of SWHID in other contexts. diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index 19060c9..8990d5d 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -26,7 +26,7 @@ computed from the content and relevant metadata of the object. A *content* is an uninterpreted byte sequence, typically, the content of a file. For this type of object the intrinsic identifier is the `sha1_git` hash of it, -i.e. the SHA1 of the byte sequence obtained by juxtaposing: +that is, the SHA1 of the byte sequence obtained by juxtaposing: - the ASCII string `"blob"` (4 bytes), - an ASCII space, @@ -79,7 +79,7 @@ a given point in time of its development on May 4th 2017. Software development within a specific project is essentially a time-indexed series of copies of a single “root” directory that contains the entire project source code. Software evolves when a developer modifies the content of one or more files in that directory and records their changes. -Each recorded copy of the root directory is known as a “revision”. It points to a single fully-determined directory and is equipped with arbitrary metadata. Some of those are added manually by the developer (e.g., revision message), others are automatically synthesized (timestamps, parent revision(s), etc). +Each recorded copy of the root directory is known as a “revision”. It points to a single fully-determined directory and is equipped with arbitrary metadata. Some of those are added manually by the developer (for example, a revision message), others are automatically synthesized (timestamps, parent revision(s), and so forth). The supported metadata is as follows: @@ -148,7 +148,7 @@ As an example, `swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d` is the SWHID ## 5.4 Releases -Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, etc. If they're not attached to development history (e.g. if they've been imported from bare tarballs), releases can also point directly to a root directory instead of a full revision with metadata. +Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, and so forth. If they're not attached to development history (for instance, if they've been imported from bare tarballs), releases can also point directly to a root directory instead of a full revision with metadata. The metadata fields supported by SWHID are as follows: - name (arbitrary byte sequence, mandatory): a name identifying the release @@ -202,7 +202,7 @@ the [Darktable release 2.3.0](https://archive.softwareheritage.org/swh:1:rel:22e ## 5.5 Snapshots -Any kind of software origin offers multiple pointers to the “current” state of a development project. In the case of VCS this is reflected by branches (e.g., master, development, but also so called feature branches dedicated to extending the software in a specific direction); in the case of package distributions by notions such as suites that correspond to different maturity levels of individual packages (e.g., stable, development, etc.). +Any kind of software origin offers multiple pointers to the “current” state of a development project. In the case of VCS this is reflected by branches (for instance, master, development, but also so called feature branches dedicated to extending the software in a specific direction); in the case of package distributions by notions such as suites that correspond to different maturity levels of individual packages (for example, stable, development, and so forth). A “snapshot” of a given software origin records all entry points found there and where each of them was pointing at the time. For example, a snapshot object might track the commit where the master branch was pointing to at any given time, as well as the most recent release of a given package in the stable suite of a free and open source software (FOSS) distribution. @@ -249,7 +249,7 @@ proceeds for [computing identifiers](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) for its objects. The `` part of a SWHID for a content object is the Git blob identifier of any file with the same content; for a revision it is the Git -commit identifier for the same revision, etc. This is not the case for snapshot +commit identifier for the same revision, and so forth. This is not the case for snapshot identifiers, as Git does not have a corresponding object type. Git compatibility is practical, but incidental and is not guaranteed to be diff --git a/Chapters/6.Qualified_identifiers.md b/Chapters/6.Qualified_identifiers.md index 2672171..fb13390 100644 --- a/Chapters/6.Qualified_identifiers.md +++ b/Chapters/6.Qualified_identifiers.md @@ -36,7 +36,7 @@ by ignoring the `lines` qualifier when the `bytes` qualifier is present. A "line" in the context of a file content refers to a sequence of characters that ends with a line break. This line can contain text, code, or any other form of data. In this specification, the line break is the ASCII LF character. The "lines" qualifier allows to designate a line range inside a content. The range can be a single line number, or a pair of line numbers separated by the ASCII `-` character. -Line numbers start from 1, and the range is inclusive, i.e. the fragment includes both the lines numbered as the start and end of the range. +Line numbers start from 1, and the range is inclusive, that is, the fragment includes both the lines numbered as the start and end of the range. For example, [`swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15`](https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15) designates the function `generate_input_stream` that is found at lines 9 to 15 of the *content* with core SWHID `swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b`. @@ -48,7 +48,7 @@ may be a binary file, or a file that uses non standard line termination characte To overcome the limitations of the lines qualifier, the bytes qualifier allows designation of a byte range inside a content. The range can be a single byte number, or a pair of byte numbers separated by `-`. -Byte numbers start from 0, and the range is inclusive, i.e. the fragment includes both the bytes numbered as the start and end of the range. +Byte numbers start from 0, and the range is inclusive, that is, the fragment includes both the bytes numbered as the start and end of the range. If the range is a single byte number, it designates the byte at that specific position. For example, `swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;bytes=154-315` From e1a522eb58c7e9d99134b40c1383268ce802fb44 Mon Sep 17 00:00:00 2001 From: Roberto Di Cosmo Date: Wed, 1 Nov 2023 18:09:04 +0100 Subject: [PATCH 08/10] Add terms suggested by Robbie, and expand them. --- Chapters/3.Terms_and_definitions.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Chapters/3.Terms_and_definitions.md b/Chapters/3.Terms_and_definitions.md index c578d80..59c6a23 100644 --- a/Chapters/3.Terms_and_definitions.md +++ b/Chapters/3.Terms_and_definitions.md @@ -1,3 +1,4 @@ +t # 3 Terms and definitions For the purposes of this document, @@ -50,3 +51,15 @@ Note that in most cases SHA1 in this specification are computed on objects after ## 3.7 version control system A version control system (VCS), also known as source control or revision control, is a software tool that helps manage different versions of software development artifacts. It keeps track of all changes made to the code, allows multiple developers to work on the same codebase, and provides mechanisms for merging changes, reverting changes, and the branching and merging of code. Examples include Git, Mercurial, and Subversion. + +## 3.8 software object, software artifact + +A software object, also referred to as a software artifact, represents a distinct entity identifiable by a SWHID. This entity can be as granular as a single line of code within a source file or as expansive as an entire codebase comprising multiple source files. In addition to source files, a software object can also be a binary file resulting from code compilation or multiple binary files linked together to produce an executable file. + +## 3.9 metadata + +Within the context of this specification, metadata refers to supplementary information associated with a software object. It serves to provide a deeper understanding of the object by detailing attributes such as the programming language used, its functionality, or its dependencies. Metadata can also enumerate the individuals involved in the software's development, elucidate its licensing terms, offer a record of version history, and more. Essentially, metadata encapsulates the broader context, provenance, and attributes of the software object, ensuring a comprehensive understanding of its nature and purpose. + +## 3.10 UNIX epoch + +The UNIX epoch is a time reference point that denotes the precise moment at 00:00:00 Coordinated Universal Time (UTC) on 1 January 1970. In UNIX-based systems, time is often represented as the total number of seconds that have transpired since this specific moment. This convention is widely used in computing for time-stamping and date-time representations. From f754a877670c35ffc266a67b37ac41a7e2ed0362 Mon Sep 17 00:00:00 2001 From: Robbie Morrison Date: Tue, 31 Oct 2023 22:11:07 +0100 Subject: [PATCH 09/10] =?UTF-8?q?"tarballs"=20=E2=86=92=20"compressed=20ar?= =?UTF-8?q?chive=20files"?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Chapters/5.Core_identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index 8990d5d..0024e70 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -148,7 +148,7 @@ As an example, `swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d` is the SWHID ## 5.4 Releases -Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, and so forth. If they're not attached to development history (for instance, if they've been imported from bare tarballs), releases can also point directly to a root directory instead of a full revision with metadata. +Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, and so forth. If they're not attached to development history (for instance, if they've been imported from bare compressed archive files), releases can also point directly to a root directory instead of a full revision with metadata. The metadata fields supported by SWHID are as follows: - name (arbitrary byte sequence, mandatory): a name identifying the release From 1a011124cc04817637bf3b28a6be8011f2d0267f Mon Sep 17 00:00:00 2001 From: Roberto Di Cosmo Date: Wed, 1 Nov 2023 18:11:20 +0100 Subject: [PATCH 10/10] Archive files are not necessarily compressed --- Chapters/5.Core_identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Chapters/5.Core_identifiers.md b/Chapters/5.Core_identifiers.md index 0024e70..4f9cdda 100644 --- a/Chapters/5.Core_identifiers.md +++ b/Chapters/5.Core_identifiers.md @@ -148,7 +148,7 @@ As an example, `swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d` is the SWHID ## 5.4 Releases -Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, and so forth. If they're not attached to development history (for instance, if they've been imported from bare compressed archive files), releases can also point directly to a root directory instead of a full revision with metadata. +Some revisions get selected by developers as denoting important project milestones known as “releases”. Each release points to the last commit in project history corresponding to that release and carries metadata: release name and version, release message, cryptographic signatures, and so forth. If they're not attached to development history (for instance, if they've been imported from bare archive files), releases can also point directly to a root directory instead of a full revision with metadata. The metadata fields supported by SWHID are as follows: - name (arbitrary byte sequence, mandatory): a name identifying the release