You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to sync the table in google bigquery and do some convertion to the parquet file and store them to a patch of new file. However, considering the usage generated by querying and downloading, I would like to keep the individual stream simple (dividing it into two streams: downloading to the local machine and storing the converted data in a new file) instead of completing all the tasks in a single stream. How should I proceed?
input:
label: "bigquery_package_versions"
sequence:
inputs:
- gcp_bigquery_select:
project: xxxxx
table: bigquery-public-data.deps_dev_v1.PackageVersions
columns:
- "*"
where: SnapshotAt >= ? AND SnapshotAt < ? # No default (optional)
auto_replay_nacks: true
# job_labels: {}
# priority: ""
args_mapping: |
root = ["2024-09-30", "2024-10-07"]
prefix: |
EXPORT DATA OPTIONS(
uri='gs://ysdb-asia-east-1/package_versions/2024-09-30/*.parquet',
format='parquet',
compression='zstd',
overwrite=true) AS
suffix: |
ORDER BY System asc, Name asc
- gcp_cloud_storage:
bucket: ysdb-asia-east-1
prefix: package_versions
delete_objects: true
# parquet_decode, mapping, parquet_encode process
output:
file:
path: ${YSDB_WORKER_DATA_HOME:/opt/ysdb_worker/data}/${! meta("gcs_key") }
codec: all-bytes
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I want to sync the table in google bigquery and do some convertion to the parquet file and store them to a patch of new file. However, considering the usage generated by querying and downloading, I would like to keep the individual stream simple (dividing it into two streams: downloading to the local machine and storing the converted data in a new file) instead of completing all the tasks in a single stream. How should I proceed?
Beta Was this translation helpful? Give feedback.
All reactions