Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api, robot-server): get historical and current command errors #16697

Open
wants to merge 57 commits into
base: edge
Choose a base branch
from

Conversation

TamarZanzouri
Copy link
Contributor

@TamarZanzouri TamarZanzouri commented Nov 5, 2024

Overview

closes https://opentrons.atlassian.net/browse/EXEC-655.
GET historical command errors and refactor current command errors.

Test Plan and Hands on Testing

run a protocol with ER failed commands.

  • GET /runs/{runId}/commandsErros while the run is active and make sure you get the failed commands errors.
  • GET /runs/{runId}/commandsErros when the run is finished and make sure you get the failed commands errors.

Changelog

  • added db schema 8.
  • added command_error, command_status field to commands_table.
  • insert command_error, command_status when inserting a command to commands table.
  • get_commands_errors_slice
  • added missing indexes from previous migrations.

Review requests

changes make sense?

Risk assessment

low.

@TamarZanzouri TamarZanzouri requested a review from a team as a code owner November 5, 2024 21:23
@TamarZanzouri TamarZanzouri marked this pull request as draft November 5, 2024 21:23
@TamarZanzouri TamarZanzouri marked this pull request as ready for review November 6, 2024 21:14
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

api/src/opentrons/protocol_engine/state/command_history.py Outdated Show resolved Hide resolved
Comment on lines 127 to 128
CREATE INDEX ix_run_command_command_error ON run_command (command_error)
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested this, but I don't think this index is what we want. (This index is what happens when you just do index=True on the new column declaration.)

This will, across all commands from all runs, maintain an index that is a lexicographic sort of the errors' JSON strings.

That does not help with efficiently serving the SELECT statement that you've written in RunStore, which is something like, "get all the errors from a single run, in order."

So I think what we want instead is either:

  • A compound index on (run_id, has_error, index_in_run), where has_error is a true/false value computed from command_error being non-NULL. I don't know how to do this off the top of my head with SQLAlchemy, but here is SQLite documentation that looks relevant.
  • A compound index on (run_id, command_status, index_in_run), where command_status is a new column storing the command's succeeded/failed status. And then the SELECT statement filters based on command_status == "failed".

Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also test whether it's fast enough to just use the existing (run_id, index_in_run) compound index. From there, filtering for just the failed commands will be an O(n) linear scan, but...it's a linear scan implemented by optimized C, so maybe it's good enough.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.43%. Comparing base (ec7641c) to head (ec64c12).
Report is 140 commits behind head on edge.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             edge   #16697   +/-   ##
=======================================
  Coverage   92.43%   92.43%           
=======================================
  Files          77       77           
  Lines        1283     1283           
=======================================
  Hits         1186     1186           
  Misses         97       97           
Flag Coverage Δ
g-code-testing 92.43% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@SyntaxColoring
Copy link
Contributor

SyntaxColoring commented Nov 15, 2024

This is pending some merge conflict resolution with edge. @TamarZanzouri and I will pick this up again when 8.2.0 release stuff chills out. Re-drafting until then.

@SyntaxColoring SyntaxColoring marked this pull request as draft November 15, 2024 16:51
@SyntaxColoring SyntaxColoring force-pushed the EXEC-655-store-commands-error-list-in-db branch from bf9b701 to 5af324e Compare November 20, 2024 20:17
@TamarZanzouri TamarZanzouri marked this pull request as ready for review November 20, 2024 20:52
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, but I'm not sure about the query.

)
commands_to_update.append(
{
"_id": row.row_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Does "_id" really need to have that underscore? Or is it only there because that's how the column was named in the example code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that as long as we dont use a defined prop name we are good

Comment on lines +54 to +70
add_column(
dest_engine,
schema_8.run_command_table.name,
schema_8.run_command_table.c.command_error,
)

add_column(
dest_engine,
schema_8.run_command_table.name,
schema_8.run_command_table.c.command_status,
)

_add_missing_indexes(dest_transaction=dest_transaction)

_migrate_command_table_with_new_command_error_col_and_command_status(
dest_transaction=dest_transaction
)
Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, just some thoughts.

Early on in this PR, we made the decision to add the new column with ALTER TABLE, instead of letting SQLAlchemy create the new one from scratch.

I don't think that decision is panning out well.

  1. We have to manually create indexes.
  2. We have to manually create constraints. (And we're not doing that right now—see my # todo comment in schema_8.py).
  3. The command_status column needs to be nullable even when that's not a good match for the data we're storing in it.

1 and 2 are thankfully caught by tests now, and perhaps we can improve them by fleshing out the add_column() function. But 3 is fundamental to SQLite. We talked about fixing 3 with Alembic, but I've looked into it and I'm skeptical that that will be easy.

Meanwhile, is ALTER TABLE really saving that much overhead compared to creating the new table from scratch? We're still iterating over every single command and parsing it as JSON, and I imagine that cost will dominate the savings from not having to copy the data.

I think we should stick with what you have now, but in the future, let's try the other way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other way means creating all tables from scratch? we still need to insert the data.

robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
Comment on lines +74 to +75
# todo(2024-11-20): Probably add the indexes missing from prior migrations here.
# https://opentrons.atlassian.net/browse/EXEC-827
Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamarZanzouri FYI I think that it was correct for you to have had the other fixes in here, but when I was resolving merge conflicts, it was feeling like a bit too much to try to do at once, so I removed them. Let's do it in a follow-up?

Comment on lines +235 to +242
# nullable=True because it was easier for the migration to add the column
# this way. This is not intended to ever be null in practice.
nullable=True,
# todo(mm, 2024-11-20): We want create_constraint=True here. Something
# about the way we compare SQL in test_tables.py is making that difficult--
# even when we correctly add the constraint in the migration, the SQL
# doesn't compare equal to what create_constraint=True here would emit.
create_constraint=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamarZanzouri FYI, I made these changes when resolving merge conflicts:

  1. Made command_status nullable to match the ALTER TABLE statement that you were doing in the migration.
  2. Removed the command_status enum constraint because I couldn't figure out how to get the migration to match it in a way that would pass the tests. We should figure that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried setting create_constraint=True but its acting weird in the test as you mentioned. maybe we should add a CREATE CONSTRAINT to the migration script?

@TamarZanzouri
Copy link
Contributor Author

tested the migration on a FLEX and it works as expected. @SyntaxColoring

Comment on lines +616 to +620
select_command_errors = (
sqlalchemy.select(run_command_table)
.where(run_command_table.c.command_error is not None)
.subquery()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, heads up, this is a SQLAlchemy trap. Python doesn't let SQLAlchemy overload the is not operator, so we have to use is_not().

Right now, this is equivalent to .where(True). This should be easy to catch in test_update_run_state_command_with_errors() unit tests if you add some input commands that aren't status: "failed".

Comment on lines +621 to +638
select_slice = (
sqlalchemy.select(run_command_table.c.command_error)
.where(
and_(
run_command_table.c.run_id == run_id,
run_command_table.c.index_in_run >= actual_cursor,
run_command_table.c.index_in_run < actual_cursor + length,
)
)
.join_from(
run_command_table,
select_command_errors,
onclause=run_command_table.c.index_in_run
== select_command_errors.c.index_in_run,
)
.order_by(run_command_table.c.index_in_run)
)
slice_result = transaction.execute(select_slice).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joins are not my strong suit, but is this right? It looks like we're still doing run_command_table.c.index_in_run >= actual_cursor. Don't we want actual_cursor to decide the offset within select_command_errors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants