Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add submit_taskgraph for a common UX #641

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gspowley
Copy link
Member

This PR adds tiledb.cloud.utilities.submit_taskgraph as a proposed common function to launch ingestion (and other) taskgraphs.

If tqdm is available, a progress timer is displayed.

Output while running:

Taskgraph submitted to TileDB - https://cloud.tiledb.com/activity/taskgraphs/...
Running:  5:30 min:sec 

Output when complete:

Taskgraph submitted to TileDB - https://cloud.tiledb.com/activity/taskgraphs/...
Completed:  10:32 min:sec 

Copy link
Contributor

@JohnMoutafis JohnMoutafis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@spencerseale
Copy link
Collaborator

This looks like a better version of run_dag (source).

Can we make a note to deprecate that eventually? Maybe at least add a warning to run_dag for users to start moving that over to submit_taskgraph

@gspowley
Copy link
Member Author

This looks like a better version of run_dag (source).

Can we make a note to deprecate that eventually?

Yes, it's a good idea to converge on one convenience function for running taskgraphs. The debug/_print_logs functionality is useful and can be added to submit_taskgraph. I believe the retry mechanism in run_dag is legacy and adding a RetryStrategy to the DAG is the preferred approach.

Copy link
Collaborator

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gspowley I left a few suggestions for how we could make this more Pythonic (for lack of a better term). Run silently by default and return a tuple, basically.

I like tqdm in command line programs but not in library functions so much, but that's my personal taste. I can see how this function and its visual feedback would be good for Notebook users.

How about run_taskgraph() since this is much more than graph.compute()?

Last suggestion: let's export this from tiledb.cloud.taskgraph or tiledb.cloud.dag, in a new runner.py or wrapper.py module.


# Display a link to the log in TileDB, if available
if graph.server_graph_uuid:
print(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's opt in to printing. I don't think a library function should print by default. I can't think of anything in Python's standard lib that prints except print().

Like if verbose and graph.server_graph_uuid:


if wait:
# Display the progress bar in stdout, so the output is not red in a notebook
pbar = tqdm(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think a progress bar should be explicitly opt-in and not conditioned on waiting. I can easily imagine wanting to waiting silently. How about if verbose: pbar = ...?

except TimeoutError:
# Still running
pbar.set_description_str(str(graph.status))
continue
Copy link
Collaborator

@sgillies sgillies Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what graph.wait() does. Is the status set only when it returns? This is just a question stemming from my ignorance, not a blocker.

pbar.set_description_str(str(graph.status))
break

return {"status": str(graph.status), "graph_id": str(graph.server_graph_uuid)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more natural for a Python function to return a tuple, no? More like standard lib functions is what i mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants