Skip to content

Commit 1984e40

Browse files
authored
Merge pull request #18 from GoogleCloudPlatform/fio-to-bigquery
Add script to upload fio results to bigquery
2 parents 27df1fb + 14405de commit 1984e40

File tree

8 files changed

+583
-70
lines changed

8 files changed

+583
-70
lines changed
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# GCSFuse Benchmarking Framework
2+
3+
## Overview
4+
5+
This directory provides a framework to:
6+
7+
- Set up and tear down a Google Compute Engine (GCE) VM for benchmarking.
8+
- Install necessary tools like FIO, GCSFuse, and Google Cloud SDK on the VM.
9+
- Run FIO benchmarks using predefined job files.
10+
- Monitor GCSFuse CPU and memory usage during benchmarks.
11+
- Upload FIO output and monitoring metrics to Google Cloud Storage (GCS) Bucket and BigQuery.
12+
13+
---
14+
15+
## Setup
16+
17+
Before running the benchmarks, ensure you have:
18+
19+
- **Google Cloud SDK (`gcloud`)**:
20+
- Authenticated with `gcloud auth login`.
21+
- Configured for the correct project:
22+
23+
```bash
24+
gcloud config set project <PROJECT_ID>
25+
```
26+
27+
- **Permissions**:
28+
- Read access to `gs://gcsfuse-release-benchmark-fio-data`.
29+
- Read/Write access to a GCS bucket for results (e.g., `gs://gcsfuse-release-benchmarks-results`).
30+
- Permissions to create/delete GCE VMs, GCS buckets, and BigQuery datasets/tables within your specified project.
31+
32+
- **Python Dependencies**:\
33+
Install the required Python packages:
34+
35+
```bash
36+
pip install -r perf-benchmarking-for-releases/requirements.txt
37+
```
38+
39+
The key Python packages include:
40+
41+
- `google-cloud-bigquery`
42+
- `google-cloud-monitoring`
43+
- `requests`
44+
45+
---
46+
47+
## Usage
48+
49+
The main script to run the benchmarks is `run-benchmarks.sh`.
50+
It should be executed from the `perf-benchmarking-for-releases` directory.
51+
52+
**Note:** This framework currently only supports benchmarking against regional GCS buckets with a flat object namespace (i.e., non-hierarchical).
53+
54+
### Syntax
55+
56+
```bash
57+
bash run-benchmarks.sh <GCSFUSE_VERSION> <PROJECT_ID> <REGION> <MACHINE_TYPE> <IMAGE_FAMILY> <IMAGE_PROJECT>
58+
```
59+
60+
### Arguments:
61+
62+
- `<GCSFUSE_VERSION>`: A Git tag (e.g., `v1.0.0`), branch name (e.g., `main`), or a commit ID on the GCSFuse master branch.
63+
- `<PROJECT_ID>`: Your Google Cloud Project ID in which you want the VM and Bucket to be created.
64+
- `<REGION>`: The GCP region where the VM and GCS buckets will be created (e.g., `us-south1`).
65+
- `<MACHINE_TYPE>`: The GCE machine type for the benchmark VM (e.g., `n2-standard-96`). This script supports attaching 16 local NVMe SSDs (375GB each) for LSSD-supported machine types.
66+
- **Note:** If your machine type supports LSSD but is not included in the `LSSD_SUPPORTED_MACHINES` array within `run-benchmarks.sh` script, you may need to manually add it to ensure LSSDs are attached.
67+
- `<IMAGE_FAMILY>`: The image family for the VM (e.g., `ubuntu-2504-amd64`).
68+
- `<IMAGE_PROJECT>`: The image project for the VM (e.g., `ubuntu-os-cloud`).
69+
70+
### Example:
71+
72+
```bash
73+
bash run-benchmarks.sh master gcs-fuse-test us-south1 n2-standard-96 ubuntu-2504-amd64 ubuntu-os-cloud
74+
```
75+
76+
---
77+
78+
## Workflow
79+
80+
1. **Unique ID Generation**:
81+
A unique ID is generated based on the timestamp and a random suffix to name the VM and related GCS buckets.
82+
83+
2. **GCS Bucket Creation**:
84+
A GCS bucket `gcsfuse-release-benchmark-data-<UNIQUE_ID>` is created in the specified region to store FIO test data.
85+
86+
3. **FIO Job File Upload**:
87+
All `.fio` job files from the local `fio-job-files/` directory are uploaded to the results bucket.
88+
89+
4. **Data Transfer**:
90+
A Storage Transfer Service job copies read data from `gs://gcsfuse-release-benchmark-fio-data` to the newly created test data bucket.
91+
92+
5. **VM Creation**:
93+
- A GCE VM is created with the specified machine type.
94+
- Boot disk size: 1000GB.
95+
96+
6. **`starter-script.sh` Execution**:
97+
This script runs on the VM after creation. It:
98+
- Installs common dependencies (e.g., git, fio, python3-pip).
99+
- Builds GCSFuse from the specified version.
100+
- Sets up local SSDs if enabled.
101+
- Downloads FIO job files.
102+
- Mounts the GCS bucket using the built GCSFuse binary.
103+
- Monitors GCSFuse CPU and memory usage during FIO runs.
104+
- Executes each FIO job and saves the JSON output.
105+
- Uploads the FIO results and monitoring logs to GCS.
106+
- Calls `upload_fio_output_to_bigquery.py` to push results to BigQuery.
107+
108+
7. **Cleanup**:
109+
A cleanup function is trapped to run on exit, ensuring the VM and the created GCS test data bucket are deleted.
110+
111+
---
112+
113+
## Output
114+
115+
### BigQuery
116+
117+
FIO benchmark results, including I/O statistics, latencies, and system resource usage (CPU/Memory), are uploaded to a BigQuery table with:
118+
119+
- **Project ID**: `gcs-fuse-test-ml`
120+
- **Dataset ID**: `gke_test_tool_outputs`
121+
- **Table ID**: `fio_outputs`
122+
123+
---
124+
125+
### Google Cloud Storage
126+
127+
- **FIO Test Data:** The FIO test data (copied from `gs://gcsfuse-release-benchmark-fio-data`) is uploaded to a newly created bucket dynamically named `gcsfuse-release-benchmark-data-<UNIQUE_ID>`.
128+
- **Benchmark Results and FIO Job Files:** FIO JSON output files, benchmark logs, and FIO job files, are uploaded to the `gs://gcsfuse-release-benchmarks-results` bucket. The specific path within this bucket will be `gs://gcsfuse-release-benchmarks-results/<GCSFUSE_VERSION>-<UNIQUE_ID>/`.
129+
- A `success.txt` file is uploaded to GCS upon successful completion of all benchmarks.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
google-cloud-bigquery
16+
google-cloud-monitoring
17+
requests

perf-benchmarking-for-releases/run-benchmarks.sh

Lines changed: 21 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
#!/bin/bash
2-
32
# Copyright 2025 Google LLC
43
#
54
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -23,7 +22,7 @@ if [ "$#" -ne 6 ]; then
2322
echo "This script should be run from the 'perf-benchmarking-for-releases' directory."
2423
echo ""
2524
echo "Example:"
26-
echo " bash run-benchmarks.sh v2.12.0 gcs-fuse-test us-south1 n2-standard-96 ubuntu-2004-lts ubuntu-os-cloud"
25+
echo " bash run-benchmarks.sh master gcs-fuse-test us-south1 n2-standard-96 ubuntu-2504-amd64 ubuntu-os-cloud"
2726
exit 1
2827
fi
2928

@@ -48,9 +47,12 @@ TIMESTAMP=$(date +%Y%m%d-%H%M%S)
4847
RAND_SUFFIX=$(head /dev/urandom | tr -dc a-z0-9 | head -c 8)
4948
UNIQUE_ID="${TIMESTAMP}-${RAND_SUFFIX}"
5049

51-
VM_NAME="gcsfuse-perf-benchmark-${IMAGE_FAMILY}-${UNIQUE_ID}"
50+
VM_NAME="gcsfuse-perf-benchmark-${UNIQUE_ID}"
5251
GCS_BUCKET_WITH_FIO_TEST_DATA="gcsfuse-release-benchmark-data-${UNIQUE_ID}"
5352
RESULTS_BUCKET_NAME="gcsfuse-release-benchmarks-results"
53+
BQ_TABLE="gcs-fuse-test-ml.gke_test_tool_outputs.fio_outputs"
54+
RESULT_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}-${UNIQUE_ID}"
55+
5456

5557
# For VM creation, we need a zone within the specified region.
5658
# We will pick the first available zone, typically ending with '-a'.
@@ -60,13 +62,13 @@ echo "Starting GCSFuse performance benchmarking for version: ${GCSFUSE_VERSION}"
6062
echo "VM Name: ${VM_NAME}"
6163
echo "Test Data Bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}"
6264
echo "Results Bucket: gs://${RESULTS_BUCKET_NAME}"
65+
echo "Result Path: ${RESULT_PATH}"
6366
echo "Project ID: ${PROJECT_ID}"
6467
echo "Region: ${REGION}"
65-
echo "VM Zone: ${VM_ZONE}"
68+
echo "VM Zone: ${VM_ZONE}"
6669
echo "Machine Type: ${MACHINE_TYPE}"
6770

68-
# Array for LSSD supported machines
69-
# Add machine types that support local SSDs (NVMe) here
71+
# Array for LSSD supported machines. If a machine type supports LSSD but is not listed here, please add it manually.
7072
LSSD_SUPPORTED_MACHINES=("n2-standard-96" "c2-standard-60" "c2d-standard-112" "c3-standard-88" "c3d-standard-180")
7173

7274
# Check if the chosen machine type is directly present in the LSSD_SUPPORTED_MACHINES array
@@ -88,42 +90,27 @@ fi
8890

8991
# Cleanup function to be called on exit
9092
cleanup() {
91-
echo "Initiating cleanup..."
92-
9393
# Delete VM if it exists
9494
if gcloud compute instances describe "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" >/dev/null 2>&1; then
95-
echo "Deleting VM: ${VM_NAME}"
96-
gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null
97-
else
98-
echo "VM '${VM_NAME}' not found; skipping deletion."
95+
gcloud compute instances delete "${VM_NAME}" --zone="${VM_ZONE}" --project="${PROJECT_ID}" --delete-disks=all -q >/dev/null 2>&1
9996
fi
10097

10198
# Delete GCS bucket with test data if it exists
10299
if gcloud storage buckets list --project="${PROJECT_ID}" --filter="name:(${GCS_BUCKET_WITH_FIO_TEST_DATA})" --format="value(name)" | grep -q "^${GCS_BUCKET_WITH_FIO_TEST_DATA}$"; then
103-
echo "Deleting GCS bucket: ${GCS_BUCKET_WITH_FIO_TEST_DATA}"
104-
gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null
105-
else
106-
echo "Bucket '${GCS_BUCKET_WITH_FIO_TEST_DATA}' not found; skipping deletion."
100+
gcloud storage rm -r "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" -q >/dev/null 2>&1
107101
fi
108-
109-
echo "Cleanup complete."
110102
}
111103

112-
113104
# Register the cleanup function to run on EXIT signal
114105
trap cleanup EXIT
115106

116107
# Create the GCS bucket for FIO test data in the specified REGION
117108
echo "Creating GCS test data bucket: gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} in region: ${REGION}"
118109
gcloud storage buckets create "gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}" --project="${PROJECT_ID}" --location="${REGION}"
119110

120-
# Clear the existing GCSFUSE_VERSION directory in the results bucket
121-
echo "Clearing previous data in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/..."
122-
gcloud storage rm -r "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/**" --quiet || true
123-
124111
# Upload FIO job files to the results bucket for the VM to download
125-
echo "Uploading all .fio job files from local 'fio-job-files/' directory to gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/..."
126-
gcloud storage cp fio-job-files/*.fio "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/fio-job-files/"
112+
echo "Uploading all .fio job files from local 'fio-job-files/' directory to ${RESULT_PATH}/fio-job-files/..."
113+
gcloud storage cp fio-job-files/*.fio "${RESULT_PATH}/fio-job-files/"
127114
echo "FIO job files uploaded."
128115

129116
# Get the project number
@@ -142,15 +129,13 @@ gcloud storage buckets add-iam-policy-binding "gs://${GCS_BUCKET_WITH_FIO_TEST_D
142129
# job to transfer test data from a fixed GCS bucket to the newly created bucket.
143130
# Note : We need to copy only read data.
144131
echo "Creating storage transfer job to copy read data to gs://${GCS_BUCKET_WITH_FIO_TEST_DATA}..."
145-
146-
TRANSFER_JOB_NAME=$(gcloud transfer jobs create \
132+
gcloud transfer jobs create \
147133
gs://gcsfuse-release-benchmark-fio-data \
148134
gs://${GCS_BUCKET_WITH_FIO_TEST_DATA} \
149135
--include-prefixes=read \
150136
--project="${PROJECT_ID}" \
151137
--format="value(name)" \
152-
--no-async)
153-
138+
--no-async
154139
echo "Transfer completed."
155140

156141

@@ -162,19 +147,19 @@ gcloud compute instances create "${VM_NAME}" \
162147
--machine-type="${MACHINE_TYPE}" \
163148
--image-project="${IMAGE_PROJECT}" \
164149
--zone="${VM_ZONE}" \
165-
--boot-disk-size=1000GB \
150+
--boot-disk-size=100GB \
166151
--network-interface=network-tier=PREMIUM,nic-type=GVNIC \
167152
--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.read_write \
168153
--network-performance-configs=total-egress-bandwidth-tier=TIER_1 \
169-
--metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULTS_BUCKET_NAME="${RESULTS_BUCKET_NAME}",LSSD_ENABLED="${LSSD_ENABLED}" \
154+
--metadata GCSFUSE_VERSION="${GCSFUSE_VERSION}",GCS_BUCKET_WITH_FIO_TEST_DATA="${GCS_BUCKET_WITH_FIO_TEST_DATA}",RESULT_PATH="${RESULT_PATH}",LSSD_ENABLED="${LSSD_ENABLED}",MACHINE_TYPE="${MACHINE_TYPE}",PROJECT_ID="${PROJECT_ID}",UNIQUE_ID="${UNIQUE_ID}" \
170155
--metadata-from-file=startup-script=starter-script.sh \
171156
${VM_LOCAL_SSD_ARGS}
172157
echo "VM created. Benchmarks will run on the VM."
173158

174-
echo "Waiting for benchmarks to complete on VM (polling for success.txt)..."
159+
echo "Waiting for benchmarks to complete on VM..."
175160

176-
SUCCESS_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/success.txt"
177-
LOG_FILE_PATH="gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/benchmark_run.log"
161+
SUCCESS_FILE_PATH="${RESULT_PATH}/success.txt"
162+
LOG_FILE_PATH="${RESULT_PATH}/benchmark_run.log"
178163
SLEEP_TIME=300 # 5 minutes
179164
sleep "$SLEEP_TIME"
180165
#max 18 retries amounting to ~1hr30mins time
@@ -183,13 +168,13 @@ MAX_RETRIES=18
183168
for ((i=1; i<=MAX_RETRIES; i++)); do
184169
if gcloud storage objects describe "${SUCCESS_FILE_PATH}" &> /dev/null; then
185170
echo "Benchmarks completed. success.txt found."
186-
echo "Results are available in gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/"
171+
echo "Results are available in BigQuery: ${BQ_TABLE}"
187172
echo "Benchmark log file: $LOG_FILE_PATH"
188173
exit 0
189174
fi
190175

191176
# Check for early failure indicators
192-
if gcloud storage objects describe "gs://${RESULTS_BUCKET_NAME}/${GCSFUSE_VERSION}/details.txt" &> /dev/null || \
177+
if gcloud storage objects describe "${RESULT_PATH}/details.txt" &> /dev/null || \
193178
gcloud storage objects describe "$LOG_FILE_PATH" &> /dev/null; then
194179
echo "Benchmark log or details.txt found, but success.txt is missing. Possible error in benchmark execution."
195180
echo "Check logs at: $LOG_FILE_PATH"
@@ -200,7 +185,6 @@ for ((i=1; i<=MAX_RETRIES; i++)); do
200185
sleep "$SLEEP_TIME"
201186
done
202187

203-
204188
# Failure case: success.txt was not found after retries
205189
echo "Timed out waiting for success.txt after $((MAX_RETRIES * SLEEP_TIME / 60)) minutes. Perhaps there is some error."
206190
echo "Benchmark log file (for troubleshooting): $LOG_FILE_PATH"

0 commit comments

Comments
 (0)