-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] [Regression] support non-literal batch_id
config for python models on dataproc
#1321
Open
3 tasks done
Labels
Comments
amychen1776
added
python
Pull requests that update Python code
and removed
triage
labels
Aug 28, 2024
@maxmckittrick Thank you for opening up the issue. |
amychen1776
added
python_models
and removed
python
Pull requests that update Python code
labels
Aug 28, 2024
amychen1776
changed the title
[Feature] support non-literal
[Feature] [Regression] support non-literal Aug 28, 2024
batch_id
config for python models on dataprocbatch_id
config for python models on dataproc
@amychen1776 yes, it'd be very helpful for us to see descriptive batch names when viewing the dataproc console; we typically run a few dozen python models per day in production, and there's no way to easily identify which batch is associated with which dbt model: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is this your first time submitting a feature request?
Describe the feature
currently, the default batch ID that's included for python models submitted to dataproc is simply
str(uuid.uuid4())
, this was last changed with #1020.this works, and is sufficient to avoid
409 Already exists: Failed to create batch
errors from dataproc when attempting to submit batches with duplicate names, but after the test changes included in #1014, attempting to pass any non-literalbatch_id
in the model config will cause a parsing error, e.g.;this makes passing any non-default
batch_id
more or less impossible, as using a var to assign a dynamic batch ID at runtime will throw an error fromliteral_eval
, and setting a static batch ID will allow a model to run on dataproc only once before throwing a 409 error.Describe alternatives you've considered
one alternative would be to amend the
default_batch_id
config to prepend the model name with either a uuid, or with a non-static dbt env var, maybeinvocation_id
(unsure if this would only work on dbt cloud)? this would avoid the previous errors when usingcreated_at
as mentioned in #1006Who will this benefit?
everyone who wants to see descriptive batch names in dataproc!
Are you interested in contributing this feature?
yes, I'm a regular dbt user but haven't contributed anything here before :)
Anything else?
I've confirmed this is broken in both dbt-core v1.8.5/dbt-bigquery v1.8.2 and dbt-core v1.7.16/dbt-bigquery v1.7.9
The text was updated successfully, but these errors were encountered: