-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api, robot-server): get historical and current command errors #16697
base: edge
Are you sure you want to change the base?
Conversation
robot-server/robot_server/persistence/file_and_directory_names.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
CREATE INDEX ix_run_command_command_error ON run_command (command_error) | ||
""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested this, but I don't think this index is what we want. (This index is what happens when you just do index=True
on the new column declaration.)
This will, across all commands from all runs, maintain an index that is a lexicographic sort of the errors' JSON strings.
That does not help with efficiently serving the SELECT
statement that you've written in RunStore
, which is something like, "get all the errors from a single run, in order."
So I think what we want instead is either:
- A compound index on
(run_id, has_error, index_in_run)
, wherehas_error
is a true/false value computed fromcommand_error
being non-NULL
. I don't know how to do this off the top of my head with SQLAlchemy, but here is SQLite documentation that looks relevant. - A compound index on
(run_id, command_status, index_in_run)
, wherecommand_status
is a new column storing the command'ssucceeded
/failed
status. And then theSELECT
statement filters based oncommand_status == "failed"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also test whether it's fast enough to just use the existing (run_id, index_in_run)
compound index. From there, filtering for just the failed commands will be an O(n) linear scan, but...it's a linear scan implemented by optimized C, so maybe it's good enough.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## edge #16697 +/- ##
=======================================
Coverage 92.43% 92.43%
=======================================
Files 77 77
Lines 1283 1283
=======================================
Hits 1186 1186
Misses 97 97
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
This reverts commit 3a29bfd.
Ensure it's based on a copy-paste of the latest (post-merge) schema_7.py.
Ensure EXPECTED_STATEMENTS_V8 is based on a copy-paste of the latest EXPECTED_STATEMENTS_V7.
bf9b701
to
5af324e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, but I'm not sure about the query.
) | ||
commands_to_update.append( | ||
{ | ||
"_id": row.row_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Does "_id"
really need to have that underscore? Or is it only there because that's how the column was named in the example code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that as long as we dont use a defined prop name we are good
add_column( | ||
dest_engine, | ||
schema_8.run_command_table.name, | ||
schema_8.run_command_table.c.command_error, | ||
) | ||
|
||
add_column( | ||
dest_engine, | ||
schema_8.run_command_table.name, | ||
schema_8.run_command_table.c.command_status, | ||
) | ||
|
||
_add_missing_indexes(dest_transaction=dest_transaction) | ||
|
||
_migrate_command_table_with_new_command_error_col_and_command_status( | ||
dest_transaction=dest_transaction | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action needed, just some thoughts.
Early on in this PR, we made the decision to add the new column with ALTER TABLE
, instead of letting SQLAlchemy create the new one from scratch.
I don't think that decision is panning out well.
- We have to manually create indexes.
- We have to manually create constraints. (And we're not doing that right now—see my
# todo
comment inschema_8.py
). - The
command_status
column needs to be nullable even when that's not a good match for the data we're storing in it.
1 and 2 are thankfully caught by tests now, and perhaps we can improve them by fleshing out the add_column()
function. But 3 is fundamental to SQLite. We talked about fixing 3 with Alembic, but I've looked into it and I'm skeptical that that will be easy.
Meanwhile, is ALTER TABLE
really saving that much overhead compared to creating the new table from scratch? We're still iterating over every single command and parsing it as JSON, and I imagine that cost will dominate the savings from not having to copy the data.
I think we should stick with what you have now, but in the future, let's try the other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other way means creating all tables from scratch? we still need to insert the data.
# todo(2024-11-20): Probably add the indexes missing from prior migrations here. | ||
# https://opentrons.atlassian.net/browse/EXEC-827 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TamarZanzouri FYI I think that it was correct for you to have had the other fixes in here, but when I was resolving merge conflicts, it was feeling like a bit too much to try to do at once, so I removed them. Let's do it in a follow-up?
# nullable=True because it was easier for the migration to add the column | ||
# this way. This is not intended to ever be null in practice. | ||
nullable=True, | ||
# todo(mm, 2024-11-20): We want create_constraint=True here. Something | ||
# about the way we compare SQL in test_tables.py is making that difficult-- | ||
# even when we correctly add the constraint in the migration, the SQL | ||
# doesn't compare equal to what create_constraint=True here would emit. | ||
create_constraint=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TamarZanzouri FYI, I made these changes when resolving merge conflicts:
- Made
command_status
nullable to match theALTER TABLE
statement that you were doing in the migration. - Removed the
command_status
enum constraint because I couldn't figure out how to get the migration to match it in a way that would pass the tests. We should figure that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried setting create_constraint=True
but its acting weird in the test as you mentioned. maybe we should add a CREATE CONSTRAINT to the migration script?
Co-authored-by: Max Marrone <[email protected]>
Co-authored-by: Max Marrone <[email protected]>
tested the migration on a FLEX and it works as expected. @SyntaxColoring |
select_command_errors = ( | ||
sqlalchemy.select(run_command_table) | ||
.where(run_command_table.c.command_error is not None) | ||
.subquery() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, heads up, this is a SQLAlchemy trap. Python doesn't let SQLAlchemy overload the is not
operator, so we have to use is_not().
Right now, this is equivalent to .where(True)
. This should be easy to catch in test_update_run_state_command_with_errors()
unit tests if you add some input commands that aren't status: "failed"
.
select_slice = ( | ||
sqlalchemy.select(run_command_table.c.command_error) | ||
.where( | ||
and_( | ||
run_command_table.c.run_id == run_id, | ||
run_command_table.c.index_in_run >= actual_cursor, | ||
run_command_table.c.index_in_run < actual_cursor + length, | ||
) | ||
) | ||
.join_from( | ||
run_command_table, | ||
select_command_errors, | ||
onclause=run_command_table.c.index_in_run | ||
== select_command_errors.c.index_in_run, | ||
) | ||
.order_by(run_command_table.c.index_in_run) | ||
) | ||
slice_result = transaction.execute(select_slice).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joins are not my strong suit, but is this right? It looks like we're still doing run_command_table.c.index_in_run >= actual_cursor
. Don't we want actual_cursor
to decide the offset within select_command_errors
?
Overview
closes https://opentrons.atlassian.net/browse/EXEC-655.
GET historical command errors and refactor current command errors.
Test Plan and Hands on Testing
run a protocol with ER failed commands.
/runs/{runId}/commandsErros
while the run is active and make sure you get the failed commands errors./runs/{runId}/commandsErros
when the run is finished and make sure you get the failed commands errors.Changelog
command_error
,command_status
field tocommands_table
.command_error
,command_status
when inserting a command to commands table.get_commands_errors_slice
Review requests
changes make sense?
Risk assessment
low.