Automated gcsfuse micro benchmarking #41

anushka567 · 2025-08-19T10:39:19Z

Details: go/gcsfuse-microbenchmark-suite

gemini-code-assist

Summary of Changes

Hello @anushka567, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, automated micro-benchmarking framework for gcsfuse. The system is designed to streamline performance testing by automating the setup of cloud resources (GCS buckets and GCE VMs), executing configurable I/O workloads using FIO, collecting detailed performance metrics, and generating visual reports for comparison and analysis. This framework aims to provide a robust and repeatable method for evaluating gcsfuse performance under various conditions.

Highlights

Automated Micro-benchmarking Suite: This PR introduces a new, automated micro-benchmarking suite for gcsfuse, enabling systematic performance evaluation.
FIO Integration for Workload Generation: The suite integrates FIO (Flexible I/O Tester) to define and execute diverse I/O workloads, allowing for granular control over block size, file size, I/O depth, and thread count.
Automated Cloud Resource Management: It automates the provisioning and cleanup of Google Cloud Storage buckets and Google Compute Engine VMs, streamlining the setup and teardown of benchmark environments.
Comprehensive Performance Metric Collection: The system collects and processes key performance metrics, including FIO throughput, latency, IOPS, and VM CPU utilization, providing statistical summaries (averages and standard deviations).
Result Visualization and Reporting: A dedicated Python script is included to compare different benchmark runs and generate visual plots and detailed tabular reports for easy analysis and visualization of performance trends.
Support for Custom FIO Engines: The startup_script.sh demonstrates support for building and utilizing a custom C++ FIO engine, offering flexibility for specialized I/O testing scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive micro-benchmarking suite for gcsfuse. The changes include scripts for setting up the environment, running benchmarks using fio, parsing results, and comparing different benchmark runs. The code is well-structured into helper modules for managing cloud resources, parsing configurations, and generating reports. I've identified a few critical issues, such as a syntax error in a shell script and a NameError in a Python script's example usage, which could cause failures. I've also provided suggestions to improve robustness, such as better handling of optional arguments, ensuring output directories exist, and using standard timezone formats. Additionally, there are several typos and formatting errors in the README that should be corrected for clarity.

gcsfuse-micro-benchmarking/defaults/speed-of-light/startup_script.sh

gcsfuse-micro-benchmarking/helpers/upload.py

gcsfuse-micro-benchmarking/main.py

gemini-code-assist · 2025-08-19T10:41:08Z

gcsfuse-micro-benchmarking/README.md

+Running the script can be blocking and any failure (for e.g. SSH issues of the local machine from which the script is triggered, etc.) can cause the entire script to retriggered , thus it is advised to run the benchmark in a tmux session. \
+Note: tmx2 is recommended as tmux doesn't work well with propagated ssh-keys. To install tmx2, please run the following commands: `sudo apt install tmux gnubby-wrappers`
+
+### 5. Setup the virual environment
+```
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+
+### 5. Run the benchmark 
+```
+python3  main.py --benchmark_id={benchmark_id} --config_filepath={path/to/benchmark_config_file} --bench_type={bench_type}
+```
+Note: Please ensure Google Cloud SDK is updated as creating zonal buckets is not supported for older versions. 
+
+### 6. Cleanup
+Whenever necessary, a GCE VM of name `{benchmark_id}-vm` and a GCS bucket of name `{benchmark_id}-bkt}` is created at runtime.
+
+Cleanup is handled as part of the script itself if the resources are created in runtime and explicitly stated via the config to delete after use. In case of tool failure, the resources are persisted.
+
+
+### 7. Benchmark Results
+The results from the benchmark run is available at the location `results/{benchmark_id}_result.txt}` locally, at the end of benchmarking and remotely, in the artifacts bucket at `gs://{ARTIFACTS_BUCKET}/{benchmark_id}/result.json`
+
+The raw results are also persisted in the artifacts bucket at  `gs://{ARTIFACTS_BUCKET}/{benchmark_id}/raw-results/`
+
+### 8. Compare Benchmark Runs
+With identical benchmark runs for baseline/topline/feature , the results can be compared using the following steps:
+```
+cd compare_runs
+python3 main.py --benchmark_ids=id1,id2,... --output_dir=output_dir
+```
+
+Visual plots are generated and stored under `output_dir/`
+
+#### Note: 
+* The benchmark_id passed as argument to the script, is used for creating the test bucket and VM instance if required, hence ensure the benchmark_id is complaint with the naming guidelines for such resources


There are a few typos in the documentation that affect readability:

On line 33, retriggered should be re-triggered.

On line 36, virual should be virtual.

On line 70, complaint should be compliant.

gemini-code-assist · 2025-08-19T10:41:08Z

gcsfuse-micro-benchmarking/README.md

+### 5. Run the benchmark 
+```
+python3  main.py --benchmark_id={benchmark_id} --config_filepath={path/to/benchmark_config_file} --bench_type={bench_type}
+```
+Note: Please ensure Google Cloud SDK is updated as creating zonal buckets is not supported for older versions. 
+
+### 6. Cleanup
+Whenever necessary, a GCE VM of name `{benchmark_id}-vm` and a GCS bucket of name `{benchmark_id}-bkt}` is created at runtime.
+
+Cleanup is handled as part of the script itself if the resources are created in runtime and explicitly stated via the config to delete after use. In case of tool failure, the resources are persisted.
+
+
+### 7. Benchmark Results
+The results from the benchmark run is available at the location `results/{benchmark_id}_result.txt}` locally, at the end of benchmarking and remotely, in the artifacts bucket at `gs://{ARTIFACTS_BUCKET}/{benchmark_id}/result.json`
+
+The raw results are also persisted in the artifacts bucket at  `gs://{ARTIFACTS_BUCKET}/{benchmark_id}/raw-results/`
+
+### 8. Compare Benchmark Runs
+With identical benchmark runs for baseline/topline/feature , the results can be compared using the following steps:
+```
+cd compare_runs
+python3 main.py --benchmark_ids=id1,id2,... --output_dir=output_dir
+```
+
+Visual plots are generated and stored under `output_dir/`
+
+#### Note: 
+* The benchmark_id passed as argument to the script, is used for creating the test bucket and VM instance if required, hence ensure the benchmark_id is complaint with the naming guidelines for such resources
+* In case the GCE VM instance is pre-existing, please ensure that the VM scope is set to 
+`https://www.googleapis.com/auth/cloud-platform` for full access to all Cloud APIs
+* For future reference, the benchmark ids are also stored in the artifacts bucket at `gs://{ARTIFACTS_BUCKET}/${user}$/runs.json` . The runs can be labelled by setting the bench_type flag passed to the script`.


There are a few formatting and syntax issues in the instructions:

On line 43, the step numbering is incorrect. This should be step 6, and subsequent steps should be renumbered accordingly.

On lines 50 and 56, there are extra closing braces } at the end of the paths.

On line 73, the variable ${user}$ seems incorrect. It should probably be ${user}.

gemini-code-assist · 2025-08-19T10:41:08Z

gcsfuse-micro-benchmarking/helpers/parse_results.py

+        print("Mismatch in the number of records for fio metrics and timestamps")
+        exit()


Using exit() here is abrupt and prevents the calling function from handling the error gracefully. It's better practice for a helper function to raise an exception, allowing the caller to decide how to proceed.

Suggested change

print("Mismatch in the number of records for fio metrics and timestamps")

exit()

raise ValueError("Mismatch in the number of records for fio metrics and timestamps")

gcsfuse-micro-benchmarking/helpers/record_bench_id.py

gcsfuse-micro-benchmarking/main.py

…ipt.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

anushka567 and others added 23 commits August 16, 2025 08:48

configurable gcsfuse micro benchmarking

1b5324d

add visualization

10449de

update READme

c3813e4

add iops and cpu metrics

ad01628

plot correction

3a3b433

Update README.md

a7ebbb6

Update README.md

1acc888

zone correction

30a2ff2

adding resources for various types of benchmarking

6719609

update the config file

bcc6862

update startup script for cplusplus benchmarking

52877dc

add fiojobfile for cplusplus benchmarking

d8e7649

add directory for benchmark_plots

2099f3d

correct zone

09becae

Update README.md

235e1b8

make it platform independant

8e449e6

fix cpp startup script

b29bf4b

resources for rapid perf sprint

7e5208e

fio installation for cpp benchmarks

328a8cf

correct load fio jobfile path

011627d

jobfile

0c31f2b

jobfile

167c061

Update README.md

44bb124

gemini-code-assist bot reviewed Aug 19, 2025

View reviewed changes

anushka567 and others added 5 commits August 19, 2025 16:13

Update gcsfuse-micro-benchmarking/defaults/speed-of-light/startup_scr…

92b4f1d

…ipt.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update gcsfuse-micro-benchmarking/helpers/upload.py

acfdb0f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update gcsfuse-micro-benchmarking/main.py

4ba4b99

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update gcsfuse-micro-benchmarking/helpers/record_bench_id.py

d3af6f9

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update gcsfuse-micro-benchmarking/main.py

eed103e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

anushka567 and others added 6 commits August 19, 2025 15:14

add newline to csv file

50343e7

script fixes

f84769e

update creation command

90e95f2

final fixes

20ea883

final fixes pt2

8a4b036

Update write_fio_job_cases.csv

35b1fa9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automated gcsfuse micro benchmarking #41

Automated gcsfuse micro benchmarking #41

Uh oh!

anushka567 commented Aug 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 19, 2025

Uh oh!

gemini-code-assist bot Aug 19, 2025

Uh oh!

gemini-code-assist bot Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		print("Mismatch in the number of records for fio metrics and timestamps")
		exit()

	print("Mismatch in the number of records for fio metrics and timestamps")
	exit()
	raise ValueError("Mismatch in the number of records for fio metrics and timestamps")

Automated gcsfuse micro benchmarking #41

Are you sure you want to change the base?

Automated gcsfuse micro benchmarking #41

Uh oh!

Conversation

anushka567 commented Aug 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!