Skip to content

Commit

Permalink
doc: various improvements and cleanups
Browse files Browse the repository at this point in the history
  • Loading branch information
phip1611 committed May 3, 2024
1 parent 51b68c6 commit de553f5
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 61 deletions.
65 changes: 38 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,35 @@
# `tar-no-std` - Parse Tar Archives (Tarballs)

_Due to historical reasons, there are several formats of tar archives. All of them are based on the same principles,
but have some subtle differences that often make them incompatible with each other._ [0]
_Due to historical reasons, there are several formats of Tar archives. All of
them are based on the same principles, but have some subtle differences that
often make them incompatible with each other._ [(reference)](https://www.gnu.org/software/tar/manual/html_section/Formats.html)

Library to read Tar archives (by GNU Tar) in `no_std` contexts with zero allocations. If you have a standard
environment and need full feature support, I recommend the use of <https://crates.io/crates/tar> instead.
Library to read Tar archives in `no_std` environments with zero allocations. If
you have a standard environment and need full feature support, I recommend the
use of <https://crates.io/crates/tar> instead.

## Limitations
The crate is simple and only supports reading of "basic" archives, therefore no extensions, such
as GNU Longname. The maximum supported file name length is 256 characters excluding the NULL-byte (using the tar name/prefix longname implementation). The maximum supported file size is 8GiB. Directories are supported, but only regular fields are yielded in iteration.

This crate is simple and focuses on reading files and their content from a Tar
archive. Historic basic Tar and ustar [formats](https://www.gnu.org/software/tar/manual/html_section/Formats.html)
are supported. Other formats may work, but likely without all supported
features. GNU Extensions such as sparse files, incremental archives, and long
filename extension are not supported.

The maximum supported file name length is 256 characters excluding the
NULL-byte (using the Tar name/prefix longname implementation of ustar). The
maximum supported file size is 8GiB. Directories are supported, but only regular
fields are yielded in iteration. The path is reflected in their file name.

## Use Case

This library is useful, if you write a kernel or a similar low-level application, which needs
"a bunch of files" from an archive ("init ramdisk"). The Tar file could for example come
as a Multiboot2 boot module provided by the bootloader.
This library is useful, if you write a kernel or a similar low-level
application, which needs "a bunch of files" from an archive (like an
"init ramdisk"). The Tar file could for example come as a Multiboot2 boot module
provided by the bootloader.

This crate focuses on extracting files from uncompressed Tar archives created with default options by **GNU Tar**.
GNU Extensions such as sparse files, incremental archives, and long filename extension are not supported yet.
[This link](https://www.gnu.org/software/tar/manual/html_section/Formats.html) gives a good overview over possible
archive formats and their limitations.
## Example

## Example (without `alloc`-feature)
```rust
use tar_no_std::TarArchiveRef;

Expand All @@ -33,27 +40,31 @@ fn main() {

// also works in no_std environment (except the println!, of course)
let archive = include_bytes!("../tests/gnu_tar_default.tar");
let archive = TarArchiveRef::new(archive);
let archive = TarArchiveRef::new(archive).unwrap();
// Vec needs an allocator of course, but the library itself doesn't need one
let entries = archive.entries().collect::<Vec<_>>();
println!("{:#?}", entries);
println!("content of last file:");
println!("{:#?}", entries[2].data_as_str().expect("Should be valid UTF-8"));
println!("content of first file:");
println!(
"{:#?}",
entries[0].data_as_str().expect("Should be valid UTF-8")
);
}
```

## Alloc Feature
This crate allows the usage of the additional Cargo build time feature `alloc`. When this is used,
the crate also provides the type `TarArchive`, which owns the data on the heap.
## Cargo Feature

This crate allows the usage of the additional Cargo build time feature `alloc`.
When this is active, the crate also provides the type `TarArchive`, which owns
the data on the heap. The `unstable` feature provides additional convenience
only available on the nightly channel.

## Compression (`tar.gz`)
If your tar file is compressed, e.g. by `.tar.gz`/`gzip`, you need to uncompress the bytes first
(e.g. by a *gzip* library). Afterwards, this crate can read the Tar archive format from the uncompressed
bytes.

## MSRV
The MSRV is 1.76.0 stable.
If your Tar file is compressed, e.g. by `.tar.gz`/`gzip`, you need to uncompress
the bytes first (e.g. by a *gzip* library). Afterwards, this crate can read the
Tar archive format from the uncompressed bytes.

## MSRV

## References
[0]\: https://www.gnu.org/software/tar/manual/html_section/Formats.html
The MSRV is 1.76.0 stable.
17 changes: 9 additions & 8 deletions src/archive.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,7 @@ impl Display for CorruptDataError {
impl core::error::Error for CorruptDataError {}

/// Type that owns bytes on the heap, that represents a Tar archive.
/// Unlike [`TarArchiveRef`], this type is useful, if you need to own the
/// data as long as you need the archive, but no longer.
/// Unlike [`TarArchiveRef`], this type takes ownership of the data.
///
/// This is only available with the `alloc` feature of this crate.
#[cfg(feature = "alloc")]
Expand Down Expand Up @@ -144,8 +143,8 @@ impl From<TarArchive> for Box<[u8]> {
}
}

/// Wrapper type around bytes, which represents a Tar archive.
/// Unlike [`TarArchive`], this uses only a reference to the data.
/// Wrapper type around bytes, which represents a Tar archive. To iterate the
/// entries, use [`TarArchiveRef::entries`].
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct TarArchiveRef<'a> {
data: &'a [u8],
Expand All @@ -162,9 +161,7 @@ impl<'a> TarArchiveRef<'a> {
.ok_or(CorruptDataError)
}

/// Iterates over all entries of the Tar archive.
/// Returns items of type [`ArchiveEntry`].
/// See also [`ArchiveEntryIterator`].
/// Creates an [`ArchiveEntryIterator`].
pub fn entries(&self) -> ArchiveEntryIterator {
ArchiveEntryIterator::new(self.data)
}
Expand Down Expand Up @@ -244,11 +241,15 @@ impl<'a> Iterator for ArchiveHeaderIterator<'a> {
impl<'a> ExactSizeIterator for ArchiveEntryIterator<'a> {}

/// Iterator over the files of the archive.
///
/// Only regular files are supported, but not directories, links, or other
/// special types ([`crate::TypeFlag`]). The full path to files is reflected
/// in their file name.
#[derive(Debug)]
pub struct ArchiveEntryIterator<'a>(ArchiveHeaderIterator<'a>);

impl<'a> ArchiveEntryIterator<'a> {
pub fn new(archive: &'a [u8]) -> Self {
fn new(archive: &'a [u8]) -> Self {
Self(ArchiveHeaderIterator::new(archive))
}

Expand Down
1 change: 1 addition & 0 deletions src/header.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ use crate::{TarFormatDecimal, TarFormatOctal, TarFormatString, BLOCKSIZE, NAME_L
use core::fmt::{Debug, Formatter};
use core::num::ParseIntError;

/// Errors that may happen when parsing the [`ModeFlags`].
#[derive(Debug)]
pub enum ModeError {
ParseInt(ParseIntError),
Expand Down
90 changes: 64 additions & 26 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,42 +21,80 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/
//! Library to read Tar archives (by GNU Tar) in `no_std` contexts with zero
//! allocations. If you have a standard environment and need full feature
//! support, I recommend the use of <https://crates.io/crates/tar> instead.
//! # `tar-no-std` - Parse Tar Archives (Tarballs)
//!
//! The crate is simple and only supports reading of "basic" archives, therefore
//! no extensions, such as GNU Longname. The maximum supported file name length
//! is 100 characters including the NULL-byte. The maximum supported file size
//! is 8 GiB. Also, directories are not supported yet but only flat collections
//! of files.
//! _Due to historical reasons, there are several formats of Tar archives. All of
//! them are based on the same principles, but have some subtle differences that
//! often make them incompatible with each other._ [(reference)](https://www.gnu.org/software/tar/manual/html_section/Formats.html)
//!
//! Library to read Tar archives in `no_std` environments with zero allocations. If
//! you have a standard environment and need full feature support, I recommend the
//! use of <https://crates.io/crates/tar> instead.
//!
//! ## TL;DR
//!
//! Look at the [`TarArchiveRef`] type.
//!
//! ## Limitations
//!
//! This crate is simple and focuses on reading files and their content from a Tar
//! archive. Historic basic Tar and ustar [formats](https://www.gnu.org/software/tar/manual/html_section/Formats.html)
//! are supported. Other formats may work, but likely without all supported
//! features. GNU Extensions such as sparse files, incremental archives, and
//! long filename extension are not supported.
//!
//! The maximum supported file name length is 256 characters excluding the
//! NULL-byte (using the Tar name/prefix longname implementation of ustar). The
//! maximum supported file size is 8GiB. Directories are supported, but only regular
//! fields are yielded in iteration. The path is reflected in their file name.
//!
//! ## Use Case
//!
//! This library is useful, if you write a kernel or a similar low-level
//! application, which needs "a bunch of files" from an archive ("init ram
//! disk"). The Tar file could for example come as a Multiboot2 boot module
//! application, which needs "a bunch of files" from an archive (like an
//! "init ramdisk"). The Tar file could for example come as a Multiboot2 boot module
//! provided by the bootloader.
//!
//! This crate focuses on extracting files from uncompressed Tar archives
//! created with default options by **GNU Tar**. GNU Extensions such as sparse
//! files, incremental archives, and long filename extension are not supported
//! yet. [gnu.org](https://www.gnu.org/software/tar/manual/html_section/Formats.html)
//! provides a good overview over possible archive formats and their
//! limitations.
//! ## Example
//!
//! # Example
//! ```rust
//! use tar_no_std::TarArchiveRef;
//!
//! // also works in no_std environment (except the println!, of course)
//! let archive = include_bytes!("../tests/gnu_tar_default.tar");
//! let archive = TarArchiveRef::new(archive).unwrap();
//! // Vec needs an allocator of course, but the library itself doesn't need one
//! let entries = archive.entries().collect::<Vec<_>>();
//! println!("{:#?}", entries);
//! println!("content of last file:");
//! let last_file_content = unsafe { core::str::from_utf8_unchecked(entries[2].data()) };
//! println!("{:#?}", last_file_content);
//! fn main() {
//! // log: not mandatory
//! std::env::set_var("RUST_LOG", "trace");
//! env_logger::init();
//!
//! // also works in no_std environment (except the println!, of course)
//! let archive = include_bytes!("../tests/gnu_tar_default.tar");
//! let archive = TarArchiveRef::new(archive).unwrap();
//! // Vec needs an allocator of course, but the library itself doesn't need one
//! let entries = archive.entries().collect::<Vec<_>>();
//! println!("{:#?}", entries);
//! println!("content of first file:");
//! println!(
//! "{:#?}",
//! entries[0].data_as_str().expect("Should be valid UTF-8")
//! );
//! }
//! ```
//!
//! ## Cargo Feature
//!
//! This crate allows the usage of the additional Cargo build time feature `alloc`.
//! When this is active, the crate also provides the type `TarArchive`, which owns
//! the data on the heap. The `unstable` feature provides additional convenience
//! only available on the nightly channel.
//!
//! ## Compression (`tar.gz`)
//!
//! If your Tar file is compressed, e.g. by `.tar.gz`/`gzip`, you need to uncompress
//! the bytes first (e.g. by a *gzip* library). Afterwards, this crate can read the
//! Tar archive format from the uncompressed bytes.
//!
//! ## MSRV
//!
//! The MSRV is 1.76.0 stable.
#![cfg_attr(feature = "unstable", feature(error_in_core))]
#![cfg_attr(not(test), no_std)]
Expand Down

0 comments on commit de553f5

Please sign in to comment.