Skip to content

Commit

Permalink
Add 256 filename support (prefix + name) and add support for space te…
Browse files Browse the repository at this point in the history
…rminated numbers and non-null terminated names (#10)

* Add 256 filename support (prefix + name) and add support for space terminated numbers.

* Format check passes. Add missing file.

* Act on clippy issues found

* fx nit and idiomatic conditional compile

* Clippy is wrong as from_utf8 is not yet stable for const fn usage.

* Move long or deep filenames for tar file creation into a tar file. Normally these aren't needed as the .tar files are used in the source. The tests themselves note what files they depend on.

* Upgrade MSVR as there are dependencies in the tree where SemVer is upgrading them past
the Rust 1.60 compiler. Example from an indirectly dependent crate on Linux:

error: package `rustix v0.38.11` cannot be built because it requires rustc 1.63 or newer, while the currently active rustc version is 1.60.0
  • Loading branch information
schnoberts1 authored May 3, 2024
1 parent 57a0f95 commit 14097e0
Show file tree
Hide file tree
Showing 14 changed files with 429 additions and 157 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
rust:
- stable
- nightly
- 1.60.0 # MSVR
- 1.63.0 # MSVR
steps:
- uses: actions/checkout@v2
# Important preparation step: override the latest default Rust version in GitHub CI
Expand All @@ -41,7 +41,7 @@ jobs:
strategy:
matrix:
rust:
- 1.60.0
- 1.63.0
steps:
- uses: actions/checkout@v2
# Important preparation step: override the latest default Rust version in GitHub CI
Expand Down
15 changes: 10 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ name = "tar-no-std"
description = """
Library to read Tar archives (by GNU Tar) in `no_std` contexts with zero allocations.
The crate is simple and only supports reading of "basic" archives, therefore no extensions, such
as GNU Longname. The maximum supported file name length is 100 characters including the NULL-byte.
The maximum supported file size is 8GiB. Also, directories are not supported yet but only flat
collections of files.
as GNU Longname. The maximum supported file name length is 256 characters excluding the NULL-byte
(using the tar name/prefix longname implementation).The maximum supported file size is 8GiB.
Directories are supported, but only regular fields are yielded in iteration.
"""
version = "0.2.0"
edition = "2021"
Expand All @@ -24,10 +24,15 @@ resolver = "2"
default = []
alloc = []

[[example]]
name = "alloc_feature"
required-features = ["alloc"]

[dependencies]
arrayvec = { version = "0.7", default-features = false }
bitflags = "2.0"
log = { version = "0.4", default-features = false }
memchr = { version = "2.6.3", default-features = false }
num-traits = { version = "0.2.16", default-features = false }

[dev-dependencies]
env_logger = "0.10"
env_logger = "0.10"
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@ environment and need full feature support, I recommend the use of <https://crate

## Limitations
The crate is simple and only supports reading of "basic" archives, therefore no extensions, such
as *GNU Longname*. The maximum supported file name length is 100 characters including the NULL-byte.
The maximum supported file size is 8GiB. Also, directories are not supported yet but only flat
collections of files.
as GNU Longname. The maximum supported file name length is 256 characters excluding the NULL-byte (using the tar name/prefix longname implementation). The maximum supported file size is 8GiB. Directories are supported, but only regular fields are yielded in iteration.


## Use Case

Expand Down
2 changes: 1 addition & 1 deletion examples/alloc_feature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

use tar_no_std::TarArchive;

/// This example needs the `alloc` feature.
fn main() {
// log: not mandatory
std::env::set_var("RUST_LOG", "trace");
Expand Down
162 changes: 130 additions & 32 deletions src/archive.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,25 +25,25 @@ SOFTWARE.
//! also exports `TarArchive`, which owns data on the heap.
use crate::header::PosixHeader;
use crate::{TypeFlag, BLOCKSIZE, FILENAME_MAX_LEN};
use crate::tar_format_types::TarFormatString;
use crate::{TypeFlag, BLOCKSIZE, POSIX_1003_MAX_FILENAME_LEN};
#[cfg(feature = "alloc")]
use alloc::boxed::Box;
use arrayvec::ArrayString;
use core::fmt::{Debug, Formatter};
use core::str::{FromStr, Utf8Error};
use core::str::Utf8Error;
use log::warn;

/// Describes an entry in an archive.
/// Currently only supports files but no directories.
pub struct ArchiveEntry<'a> {
filename: ArrayString<FILENAME_MAX_LEN>,
filename: TarFormatString<POSIX_1003_MAX_FILENAME_LEN>,
data: &'a [u8],
size: usize,
}

#[allow(unused)]
impl<'a> ArchiveEntry<'a> {
const fn new(filename: ArrayString<FILENAME_MAX_LEN>, data: &'a [u8]) -> Self {
const fn new(filename: TarFormatString<POSIX_1003_MAX_FILENAME_LEN>, data: &'a [u8]) -> Self {
ArchiveEntry {
filename,
data,
Expand All @@ -53,7 +53,7 @@ impl<'a> ArchiveEntry<'a> {

/// Filename of the entry with a maximum of 100 characters (including the
/// terminating NULL-byte).
pub const fn filename(&self) -> ArrayString<{ FILENAME_MAX_LEN }> {
pub const fn filename(&self) -> TarFormatString<{ POSIX_1003_MAX_FILENAME_LEN }> {
self.filename
}

Expand All @@ -63,6 +63,7 @@ impl<'a> ArchiveEntry<'a> {
}

/// Data of the file as string slice, if data is valid UTF-8.
#[allow(clippy::missing_const_for_fn)]
pub fn data_as_str(&self) -> Result<&'a str, Utf8Error> {
core::str::from_utf8(self.data)
}
Expand Down Expand Up @@ -192,19 +193,35 @@ impl<'a> Iterator for ArchiveIterator<'a> {
return None;
}

let hdr = self.next_hdr(self.block_index);
let mut hdr = self.next_hdr(self.block_index);

loop {
// check if we found end of archive
if hdr.is_zero_block() {
let next_hdr = self.next_hdr(self.block_index + 1);
if next_hdr.is_zero_block() {
// gracefully terminated Archive
log::debug!("End of Tar archive with two zero blocks!");
} else {
log::warn!(
"Zero block found at end of Tar archive, but only one instead of two!"
);
}
// end of archive
return None;
}

// check if we found end of archive
if hdr.is_zero_block() {
let next_hdr = self.next_hdr(self.block_index + 1);
if next_hdr.is_zero_block() {
// gracefully terminated Archive
log::debug!("End of Tar archive with two zero blocks!");
} else {
log::warn!("Zero block found at end of Tar archive, but only one instead of two!");
// Ignore directory entries, i.e. yield only regular files. Works as
// filenames in tarballs are fully specified, e.g. dirA/dirB/file1
if hdr.typeflag != TypeFlag::DIRTYPE {
break;
}
// end of archive
return None;

// in next iteration: start at next Archive entry header
// +1 for current hdr block itself + all data blocks
let data_block_count: usize = hdr.payload_block_count().unwrap();
self.block_index += data_block_count + 1;
hdr = self.next_hdr(self.block_index);
}

if hdr.typeflag != TypeFlag::AREGTYPE && hdr.typeflag != TypeFlag::REGTYPE {
Expand All @@ -219,7 +236,7 @@ impl<'a> Iterator for ArchiveIterator<'a> {
warn!("Found empty file name",);
}

let hdr_size = hdr.size.val();
let hdr_size = hdr.size.as_number::<usize>();
if let Err(e) = hdr_size {
warn!("Can't parse the file size from the header block. Stop iterating Tar archive. {e:#?}");
return None;
Expand All @@ -245,10 +262,13 @@ impl<'a> Iterator for ArchiveIterator<'a> {
// +1 for current hdr block itself + all data blocks
self.block_index += data_block_count + 1;

let filename = ArrayString::from_str(hdr.name.as_string().as_str());
// .unwrap is fine as the capacity is MUST be ok.
let filename = filename.unwrap();

let mut filename: TarFormatString<256> =
TarFormatString::<POSIX_1003_MAX_FILENAME_LEN>::new([0; POSIX_1003_MAX_FILENAME_LEN]);
if hdr.magic.as_str() == "ustar" && hdr.version.as_str() == "00" && !hdr.prefix.is_empty() {
filename.append(&hdr.prefix);
filename.append(&TarFormatString::<1>::new([b'/']));
}
filename.append(&hdr.name);
Some(ArchiveEntry::new(filename, file_bytes))
}
}
Expand All @@ -264,7 +284,6 @@ mod tests {
let entries = archive.entries().collect::<Vec<_>>();
println!("{:#?}", entries);
}

/// Tests to read the entries from existing archives in various Tar flavors.
#[test]
fn test_archive_entries() {
Expand Down Expand Up @@ -299,6 +318,54 @@ mod tests {
assert_archive_content(&entries);
}

/// Tests to read the entries from an existing tarball with a directory in it
#[test]
fn test_archive_with_long_dir_entries() {
// tarball created with:
// $ cd tests; gtar --format=ustar -cf gnu_tar_ustar_long.tar 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678 01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234/ABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJ
let archive = TarArchiveRef::new(include_bytes!("../tests/gnu_tar_ustar_long.tar"));
let entries = archive.entries().collect::<Vec<_>>();

assert_eq!(entries.len(), 2);
// Maximum length of a directory and name when the directory itself is tar'd
assert_entry_content(&entries[0], "012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678/ABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJ", 7);
// Maximum length of a directory and name when only the file is tar'd.
assert_entry_content(&entries[1], "01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234/ABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJABCDEFGHIJ", 7);
}

#[test]
fn test_archive_with_deep_dir_entries() {
// tarball created with:
// $ cd tests; gtar --format=ustar -cf gnu_tar_ustar_deep.tar 0123456789
let archive = TarArchiveRef::new(include_bytes!("../tests/gnu_tar_ustar_deep.tar"));
let entries = archive.entries().collect::<Vec<_>>();

assert_eq!(entries.len(), 1);
assert_entry_content(&entries[0], "0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/0123456789/empty", 0);
}

#[test]
fn test_archive_with_dir_entries() {
// tarball created with:
// $ gtar -cf tests/gnu_tar_default_with_dir.tar --exclude '*.tar' --exclude '012345678*' tests
{
let archive =
TarArchiveRef::new(include_bytes!("../tests/gnu_tar_default_with_dir.tar"));
let entries = archive.entries().collect::<Vec<_>>();

assert_archive_with_dir_content(&entries);
}

// tarball created with:
// $(osx) tar -cf tests/mac_tar_ustar_with_dir.tar --format=ustar --exclude '*.tar' --exclude '012345678*' tests
{
let archive = TarArchiveRef::new(include_bytes!("../tests/mac_tar_ustar_with_dir.tar"));
let entries = archive.entries().collect::<Vec<_>>();

assert_archive_with_dir_content(&entries);
}
}

/// Like [`test_archive_entries`] but with additional `alloc` functionality.
#[cfg(feature = "alloc")]
#[test]
Expand All @@ -314,15 +381,20 @@ mod tests {
assert_eq!(data, archive.into());
}

/// Test that the entry's contents match the expected content.
fn assert_entry_content(entry: &ArchiveEntry, filename: &str, size: usize) {
assert_eq!(entry.filename().as_str(), filename);
assert_eq!(entry.size(), size);
assert_eq!(entry.data().len(), size);
}

/// Tests that the parsed archive matches the expected order. The tarballs
/// the tests directory were created once by me with files in the order
/// specified in this test.
fn assert_archive_content(entries: &[ArchiveEntry]) {
assert_eq!(entries.len(), 3);

assert_eq!(entries[0].filename().as_str(), "bye_world_513b.txt");
assert_eq!(entries[0].size(), 513);
assert_eq!(entries[0].data().len(), 513);
assert_entry_content(&entries[0], "bye_world_513b.txt", 513);
assert_eq!(
entries[0].data_as_str().expect("Should be valid UTF-8"),
// .replace: Ensure that the test also works on Windows
Expand All @@ -331,22 +403,48 @@ mod tests {

// Test that an entry that needs two 512 byte data blocks is read
// properly.
assert_eq!(entries[1].filename().as_str(), "hello_world_513b.txt");
assert_eq!(entries[1].size(), 513);
assert_eq!(entries[1].data().len(), 513);
assert_entry_content(&entries[1], "hello_world_513b.txt", 513);
assert_eq!(
entries[1].data_as_str().expect("Should be valid UTF-8"),
// .replace: Ensure that the test also works on Windows
include_str!("../tests/hello_world_513b.txt").replace("\r\n", "\n")
);

assert_eq!(entries[2].filename().as_str(), "hello_world.txt");
assert_eq!(entries[2].size(), 12);
assert_eq!(entries[2].data().len(), 12);
assert_entry_content(&entries[2], "hello_world.txt", 12);
assert_eq!(
entries[2].data_as_str().expect("Should be valid UTF-8"),
"Hello World\n",
"file content must match"
);
}

/// Tests that the parsed archive matches the expected order and the filename includes
/// the directory name. The tarballs the tests directory were created once by me with files
/// in the order specified in this test.
fn assert_archive_with_dir_content(entries: &[ArchiveEntry]) {
assert_eq!(entries.len(), 3);

assert_entry_content(&entries[0], "tests/hello_world.txt", 12);
assert_eq!(
entries[0].data_as_str().expect("Should be valid UTF-8"),
"Hello World\n",
"file content must match"
);

// Test that an entry that needs two 512 byte data blocks is read
// properly.
assert_entry_content(&entries[1], "tests/bye_world_513b.txt", 513);
assert_eq!(
entries[1].data_as_str().expect("Should be valid UTF-8"),
// .replace: Ensure that the test also works on Windows
include_str!("../tests/bye_world_513b.txt").replace("\r\n", "\n")
);

assert_entry_content(&entries[2], "tests/hello_world_513b.txt", 513);
assert_eq!(
entries[2].data_as_str().expect("Should be valid UTF-8"),
// .replace: Ensure that the test also works on Windows
include_str!("../tests/hello_world_513b.txt").replace("\r\n", "\n")
);
}
}
Loading

0 comments on commit 14097e0

Please sign in to comment.