-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from source-cooperative/upload-s3
Added ability to upload outputs to S3
- Loading branch information
Showing
4 changed files
with
234 additions
and
126 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
[package] | ||
name = "s3-manifest" | ||
authors = ["Kevin Booth <[email protected]>"] | ||
version = "0.0.5" | ||
version = "0.0.7" | ||
edition = "2021" | ||
|
||
[dependencies] | ||
|
@@ -14,5 +14,7 @@ parquet = "52.2.0" | |
rusoto_core = "0.48.0" | ||
rusoto_s3 = "0.48.0" | ||
tokio = { version = "1.0", features = ["full"] } | ||
tokio-retry = "0.3" | ||
url = "2.2.2" | ||
tempfile = "3.2" | ||
tokio-retry = { version = "0.3", features = [] } | ||
bytes = "1.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,84 +1,95 @@ | ||
# S3 Manifest Generator | ||
# S3 Manifest Generator | ||
|
||
S3 Manifest Generator is a Rust-based command-line tool that creates a Parquet manifest file for objects in an S3 bucket or a specific prefix within a bucket. This tool is useful for quickly generating an inventory of S3 objects, including their metadata, in a compact and efficiently queryable format. | ||
S3 Manifest Generator is a Rust-based command-line tool that creates a Parquet manifest file for objects in an S3 bucket or a specific prefix within a bucket. This tool is useful for quickly generating an inventory of S3 objects, including their metadata, in a compact and efficiently queryable format. | ||
|
||
## Features | ||
## Features | ||
|
||
- Generate a Parquet manifest file for S3 objects | ||
- Support for custom S3-compatible endpoints | ||
- Configurable delimiter for file name extraction | ||
- Progress bar with real-time statistics | ||
- Retry mechanism for S3 API calls | ||
- Efficient batch processing of S3 objects | ||
- Generate a Parquet manifest file for S3 objects | ||
- Support for custom S3-compatible endpoints for both source and destination | ||
- Configurable delimiter for file name extraction | ||
- Progress bar with real-time statistics | ||
- Retry mechanism for S3 API calls | ||
- Efficient batch processing of S3 objects | ||
- Option to output to local file or directly to S3 | ||
- Support for separate credentials for source and destination buckets | ||
|
||
## Installation | ||
## Installation | ||
|
||
To install the S3 Manifest Generator, you need to have Rust and Cargo installed on your system. Then, you can build the project from source: | ||
To install the S3 Manifest Generator, you need to have Rust and Cargo installed on your system. Then, you can build the project from source: | ||
|
||
```bash | ||
git clone https://github.com/source-cooperative/s3-manifest.git | ||
cd s3-manifest | ||
cargo build --release | ||
``` | ||
```bash | ||
git clone https://github.com/source-cooperative/s3-manifest.git | ||
cd s3-manifest | ||
cargo build --release | ||
``` | ||
|
||
The compiled binary will be available in the `target/release` directory. | ||
The compiled binary will be available in the `target/release` directory. | ||
|
||
## Usage | ||
## Usage | ||
|
||
```bash | ||
s3-manifest [OPTIONS] <S3_URI> | ||
``` | ||
```bash | ||
s3-manifest [OPTIONS] <S3_URI> | ||
``` | ||
|
||
### Arguments | ||
### Arguments | ||
|
||
- `<S3_URI>`: S3 URI containing both bucket and prefix (e.g., s3://bucket-name/prefix) | ||
- `<S3_URI>`: S3 URI containing both bucket and prefix (e.g., s3://bucket-name/prefix) | ||
|
||
### Options | ||
### Options | ||
|
||
- `-o, --output <OUTPUT>`: Output file name for the Parquet manifest [default: manifest.parquet] | ||
- `--endpoint-url <ENDPOINT_URL>`: Custom S3 endpoint URL (optional, use for S3-compatible services) | ||
- `-d, --delimiter <DELIMITER>`: Delimiter to use for extracting file name [default: "/"] | ||
- `-h, --help`: Print help information | ||
- `-V, --version`: Print version information | ||
- `-o, --output <OUTPUT>`: Output file name for the Parquet manifest (local path or S3 URI) | ||
- `--source-endpoint <SOURCE_ENDPOINT>`: Custom S3 endpoint URL for source bucket (optional, use for S3-compatible services) | ||
- `--dest-endpoint <DEST_ENDPOINT>`: Custom S3 endpoint URL for destination bucket (optional, use for S3-compatible services) | ||
- `-d, --delimiter <DELIMITER>`: Delimiter to use for extracting file name [default: "/"] | ||
- `--source-access-key <SOURCE_ACCESS_KEY>`: AWS Access Key ID for the source bucket | ||
- `--source-secret-key <SOURCE_SECRET_KEY>`: AWS Secret Access Key for the source bucket | ||
- `--dest-access-key <DEST_ACCESS_KEY>`: AWS Access Key ID for the destination bucket | ||
- `--dest-secret-key <DEST_SECRET_KEY>`: AWS Secret Access Key for the destination bucket | ||
- `-h, --help`: Print help information | ||
- `-V, --version`: Print version information | ||
|
||
### Example | ||
### Example | ||
|
||
```bash | ||
s3-manifest s3://my-bucket/my-prefix -o my-manifest.parquet --delimiter "/" | ||
``` | ||
```bash | ||
s3-manifest s3://my-bucket/my-prefix -o s3://output-bucket/my-manifest.parquet --delimiter "/" --source-endpoint https://custom-s3.example.com | ||
``` | ||
|
||
This command will generate a Parquet manifest file named `my-manifest.parquet` for all objects in the `my-prefix` of the `my-bucket` S3 bucket, using "/" as the delimiter for file name extraction. | ||
This command will generate a Parquet manifest file named `my-manifest.parquet` in the `output-bucket` S3 bucket for all objects in the `my-prefix` of the `my-bucket` S3 bucket, using "/" as the delimiter for file name extraction and a custom S3 endpoint for the source bucket. | ||
|
||
## Output | ||
## Output | ||
|
||
The generated Parquet file contains the following columns: | ||
The generated Parquet file contains the following columns: | ||
|
||
- Bucket: The name of the S3 bucket | ||
- Key: The full key of the S3 object | ||
- FileName: The extracted file name based on the specified delimiter | ||
- Size: The size of the object in bytes | ||
- LastModified: The last modified timestamp of the object | ||
- Bucket: The name of the S3 bucket | ||
- Key: The full key of the S3 object | ||
- FileName: The extracted file name based on the specified delimiter | ||
- Size: The size of the object in bytes | ||
- LastModified: The last modified timestamp of the object | ||
|
||
## Dependencies | ||
## Dependencies | ||
|
||
This project relies on several Rust crates, including: | ||
This project relies on several Rust crates, including: | ||
|
||
- arrow | ||
- chrono | ||
- clap | ||
- indicatif | ||
- parquet | ||
- rusoto_core | ||
- rusoto_s3 | ||
- tokio | ||
- url | ||
- arrow | ||
- chrono | ||
- clap | ||
- futures | ||
- indicatif | ||
- parquet | ||
- rusoto_core | ||
- rusoto_s3 | ||
- tokio | ||
- url | ||
- tempfile | ||
- tokio-retry | ||
- bytes | ||
|
||
For a complete list of dependencies and their versions, please refer to the `Cargo.toml` file. | ||
For a complete list of dependencies and their versions, please refer to the `Cargo.toml` file. | ||
|
||
## License | ||
## License | ||
|
||
[MIT License](LICENSE) | ||
[MIT License](LICENSE) | ||
|
||
## Contributing | ||
## Contributing | ||
|
||
Contributions are welcome! Please feel free to submit a Pull Request. | ||
Contributions are welcome! Please feel free to submit a Pull Request. |
Oops, something went wrong.