feat(api, robot-server): get historical and current command errors #16697

TamarZanzouri · 2024-11-05T21:23:14Z

Overview

closes https://opentrons.atlassian.net/browse/EXEC-655.
GET historical command errors and refactor current command errors.

Test Plan and Hands on Testing

run a protocol with ER failed commands.

GET /runs/{runId}/commandsErros while the run is active and make sure you get the failed commands errors.
GET /runs/{runId}/commandsErros when the run is finished and make sure you get the failed commands errors.

Changelog

added db schema 8.
added command_error, command_status field to commands_table.
insert command_error, command_status when inserting a command to commands table.
get_commands_errors_slice
added missing indexes from previous migrations.

Review requests

changes make sense?

Risk assessment

low.

robot-server/robot_server/runs/run_store.py

robot-server/robot_server/persistence/persistence_directory.py

robot-server/robot_server/persistence/file_and_directory_names.py

SyntaxColoring

Thank you!

api/src/opentrons/protocol_engine/state/command_history.py

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

SyntaxColoring · 2024-11-06T21:51:04Z

robot-server/tests/persistence/test_tables.py

+    CREATE INDEX ix_run_command_command_error ON run_command (command_error)
+    """,


I have not tested this, but I don't think this index is what we want. (This index is what happens when you just do index=True on the new column declaration.)

This will, across all commands from all runs, maintain an index that is a lexicographic sort of the errors' JSON strings.

That does not help with efficiently serving the SELECT statement that you've written in RunStore, which is something like, "get all the errors from a single run, in order."

So I think what we want instead is either:

A compound index on (run_id, has_error, index_in_run), where has_error is a true/false value computed from command_error being non-NULL. I don't know how to do this off the top of my head with SQLAlchemy, but here is SQLite documentation that looks relevant.

A compound index on (run_id, command_status, index_in_run), where command_status is a new column storing the command's succeeded/failed status. And then the SELECT statement filters based on command_status == "failed".

We could also test whether it's fast enough to just use the existing (run_id, index_in_run) compound index. From there, filtering for just the failed commands will be an O(n) linear scan, but...it's a linear scan implemented by optimized C, so maybe it's good enough.

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

codecov · 2024-11-14T01:58:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.43%. Comparing base (ec7641c) to head (ec64c12).
Report is 140 commits behind head on edge.

Additional details and impacted files

@@           Coverage Diff           @@
##             edge   #16697   +/-   ##
=======================================
  Coverage   92.43%   92.43%           
=======================================
  Files          77       77           
  Lines        1283     1283           
=======================================
  Hits         1186     1186           
  Misses         97       97

Flag	Coverage Δ
g-code-testing	`92.43% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

SyntaxColoring · 2024-11-15T16:51:40Z

~~This is pending some merge conflict resolution with edge. @TamarZanzouri and I will pick this up again when 8.2.0 release stuff chills out. Re-drafting until then.~~

This reverts commit 3a29bfd.

Ensure it's based on a copy-paste of the latest (post-merge) schema_7.py.

Ensure EXPECTED_STATEMENTS_V8 is based on a copy-paste of the latest EXPECTED_STATEMENTS_V7.

SyntaxColoring

Mostly looks good, but I'm not sure about the query.

SyntaxColoring · 2024-11-13T19:58:20Z

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

+        )
+        commands_to_update.append(
+            {
+                "_id": row.row_id,


Nitpick: Does "_id" really need to have that underscore? Or is it only there because that's how the column was named in the example code?

i think that as long as we dont use a defined prop name we are good

SyntaxColoring · 2024-11-20T20:51:14Z

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

+            add_column(
+                dest_engine,
+                schema_8.run_command_table.name,
+                schema_8.run_command_table.c.command_error,
+            )
+
+            add_column(
+                dest_engine,
+                schema_8.run_command_table.name,
+                schema_8.run_command_table.c.command_status,
+            )
+
+            _add_missing_indexes(dest_transaction=dest_transaction)
+
+            _migrate_command_table_with_new_command_error_col_and_command_status(
+                dest_transaction=dest_transaction
+            )


No action needed, just some thoughts.

Early on in this PR, we made the decision to add the new column with ALTER TABLE, instead of letting SQLAlchemy create the new one from scratch.

I don't think that decision is panning out well.

We have to manually create indexes.

We have to manually create constraints. (And we're not doing that right now—see my # todo comment in schema_8.py).

The command_status column needs to be nullable even when that's not a good match for the data we're storing in it.

1 and 2 are thankfully caught by tests now, and perhaps we can improve them by fleshing out the add_column() function. But 3 is fundamental to SQLite. We talked about fixing 3 with Alembic, but I've looked into it and I'm skeptical that that will be easy.

Meanwhile, is ALTER TABLE really saving that much overhead compared to creating the new table from scratch? We're still iterating over every single command and parsing it as JSON, and I imagine that cost will dominate the savings from not having to copy the data.

I think we should stick with what you have now, but in the future, let's try the other way?

the other way means creating all tables from scratch? we still need to insert the data.

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

robot-server/robot_server/runs/run_store.py

SyntaxColoring · 2024-11-20T22:13:32Z

robot-server/robot_server/persistence/_migrations/v7_to_v8.py

+    # todo(2024-11-20): Probably add the indexes missing from prior migrations here.
+    # https://opentrons.atlassian.net/browse/EXEC-827


@TamarZanzouri FYI I think that it was correct for you to have had the other fixes in here, but when I was resolving merge conflicts, it was feeling like a bit too much to try to do at once, so I removed them. Let's do it in a follow-up?

SyntaxColoring · 2024-11-20T22:15:16Z

robot-server/robot_server/persistence/tables/schema_8.py

+            # nullable=True because it was easier for the migration to add the column
+            # this way. This is not intended to ever be null in practice.
+            nullable=True,
+            # todo(mm, 2024-11-20): We want create_constraint=True here. Something
+            # about the way we compare SQL in test_tables.py is making that difficult--
+            # even when we correctly add the constraint in the migration, the SQL
+            # doesn't compare equal to what create_constraint=True here would emit.
+            create_constraint=False,


@TamarZanzouri FYI, I made these changes when resolving merge conflicts:

Made command_status nullable to match the ALTER TABLE statement that you were doing in the migration.

Removed the command_status enum constraint because I couldn't figure out how to get the migration to match it in a way that would pass the tests. We should figure that out.

tried setting create_constraint=True but its acting weird in the test as you mentioned. maybe we should add a CREATE CONSTRAINT to the migration script?

Co-authored-by: Max Marrone <[email protected]>

TamarZanzouri · 2024-11-21T19:43:28Z

tested the migration on a FLEX and it works as expected. @SyntaxColoring

SyntaxColoring · 2024-11-21T21:44:01Z

robot-server/robot_server/runs/run_store.py

+            select_command_errors = (
+                sqlalchemy.select(run_command_table)
+                .where(run_command_table.c.command_error is not None)
+                .subquery()
+            )


Oh, heads up, this is a SQLAlchemy trap. Python doesn't let SQLAlchemy overload the is not operator, so we have to use is_not().

Right now, this is equivalent to .where(True). This should be easy to catch in test_update_run_state_command_with_errors() unit tests if you add some input commands that aren't status: "failed".

SyntaxColoring · 2024-11-21T21:48:38Z

robot-server/robot_server/runs/run_store.py

+            select_slice = (
+                sqlalchemy.select(run_command_table.c.command_error)
+                .where(
+                    and_(
+                        run_command_table.c.run_id == run_id,
+                        run_command_table.c.index_in_run >= actual_cursor,
+                        run_command_table.c.index_in_run < actual_cursor + length,
+                    )
+                )
+                .join_from(
+                    run_command_table,
+                    select_command_errors,
+                    onclause=run_command_table.c.index_in_run
+                    == select_command_errors.c.index_in_run,
+                )
+                .order_by(run_command_table.c.index_in_run)
+            )
+            slice_result = transaction.execute(select_slice).all()


Joins are not my strong suit, but is this right? It looks like we're still doing run_command_table.c.index_in_run >= actual_cursor. Don't we want actual_cursor to decide the offset within select_command_errors?

TamarZanzouri added 9 commits November 1, 2024 12:45

WIP intial schema and orchestrator changes

282c99c

changed schema and imports to 8

1bb8d7c

insert the errors to the DB

77902c6

refactor into commands table

15db246

select from commands

b88e722

linting and command error slice done

b6d33b6

added tests and fixed bugs

85eda17

changed logic for current command errors

5bc0500

removed failed_command_errors from state

e423f37

TamarZanzouri requested a review from a team as a code owner November 5, 2024 21:23

TamarZanzouri marked this pull request as draft November 5, 2024 21:23

TamarZanzouri and others added 2 commits November 5, 2024 16:52

refactor get command errors

e29d95f

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

c275c35

TamarZanzouri commented Nov 6, 2024

View reviewed changes

robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved

TamarZanzouri and others added 8 commits November 6, 2024 11:49

migration and initialization fixes

a950cb8

Update robot-server/robot_server/runs/run_store.py

55a809a

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

1ccaba3

change get_run_errors to get count

9fa97ba

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

f5fe016

tables tests fixed

13c5b87

added test for historical run

a88e95d

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

6fea08a

sanni-t reviewed Nov 6, 2024

View reviewed changes

robot-server/robot_server/persistence/persistence_directory.py Outdated Show resolved Hide resolved

robot-server/robot_server/persistence/file_and_directory_names.py Outdated Show resolved Hide resolved

TamarZanzouri marked this pull request as ready for review November 6, 2024 21:14

fixed persistance path

1bea28e

SyntaxColoring requested changes Nov 6, 2024

View reviewed changes

TamarZanzouri added 4 commits November 7, 2024 10:40

fixed import from schema 7

8f73bc1

rollback add_column change

c87dc65

append_failed_command_id when setting a failed command

5405ef3

append_failed_command_id when setting a failed command

f888364

lint

de02268

TamarZanzouri requested a review from SyntaxColoring November 13, 2024 20:43

SyntaxColoring mentioned this pull request Nov 13, 2024

refactor(robot-server): Adjust SQL declarations to match reality #16799

Merged

TamarZanzouri and others added 3 commits November 13, 2024 16:23

change conversion into enum

085d4bb

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

378eec1

fixing merge conflicts - WIP

ec64c12

WIP - fix test_tables failing

b285c99

SyntaxColoring marked this pull request as draft November 15, 2024 16:51

TamarZanzouri and others added 8 commits November 20, 2024 13:23

update test table

3a29bfd

Revert "update test table"

dca8269

This reverts commit 3a29bfd.

Fix schema_8.py after merge.

4d91d6f

Ensure it's based on a copy-paste of the latest (post-merge) schema_7.py.

Fix up test_tables.py after merge.

fd78ad3

Ensure EXPECTED_STATEMENTS_V8 is based on a copy-paste of the latest EXPECTED_STATEMENTS_V7.

Remove attempts to fix prior migrations, for now.

b59601e

Make command_status column nullable to match what the migration does.

6496d3c

Remove enum constraint, for now.

11aa280

Merge branch 'edge' into EXEC-655-store-commands-error-list-in-db

5af324e

SyntaxColoring force-pushed the EXEC-655-store-commands-error-list-in-db branch from bf9b701 to 5af324e Compare November 20, 2024 20:17

TamarZanzouri marked this pull request as ready for review November 20, 2024 20:52

SyntaxColoring requested changes Nov 20, 2024

View reviewed changes

SyntaxColoring reviewed Nov 20, 2024

View reviewed changes

SyntaxColoring mentioned this pull request Nov 21, 2024

refactor(robot-server): Avoid features that will be removed in SQLAlchemy 2.0 #16926

Open

TamarZanzouri and others added 3 commits November 21, 2024 10:46

Update robot-server/robot_server/persistence/_migrations/v7_to_v8.py

6669e61

Co-authored-by: Max Marrone <[email protected]>

Update robot-server/robot_server/runs/run_store.py

ce3fbb3

Co-authored-by: Max Marrone <[email protected]>

select subquery and minor fixes

e0aacbd

TamarZanzouri requested a review from SyntaxColoring November 21, 2024 19:10

SyntaxColoring requested changes Nov 21, 2024

View reviewed changes

SyntaxColoring reviewed Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api, robot-server): get historical and current command errors #16697

feat(api, robot-server): get historical and current command errors #16697

TamarZanzouri commented Nov 5, 2024 •

edited

Loading

SyntaxColoring left a comment

SyntaxColoring Nov 6, 2024

SyntaxColoring Nov 7, 2024 •

edited

Loading

codecov bot commented Nov 14, 2024

SyntaxColoring commented Nov 15, 2024 •

edited

Loading

SyntaxColoring left a comment

SyntaxColoring Nov 13, 2024

TamarZanzouri Nov 21, 2024

SyntaxColoring Nov 20, 2024 •

edited

Loading

TamarZanzouri Nov 21, 2024

SyntaxColoring Nov 20, 2024 •

edited

Loading

SyntaxColoring Nov 20, 2024

TamarZanzouri Nov 21, 2024

TamarZanzouri commented Nov 21, 2024

SyntaxColoring Nov 21, 2024

SyntaxColoring Nov 21, 2024

		CREATE INDEX ix_run_command_command_error ON run_command (command_error)
		""",

		# todo(2024-11-20): Probably add the indexes missing from prior migrations here.
		# https://opentrons.atlassian.net/browse/EXEC-827

feat(api, robot-server): get historical and current command errors #16697

Are you sure you want to change the base?

feat(api, robot-server): get historical and current command errors #16697

Conversation

TamarZanzouri commented Nov 5, 2024 • edited Loading

Overview

Test Plan and Hands on Testing

Changelog

Review requests

Risk assessment

SyntaxColoring left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 14, 2024

Codecov Report

SyntaxColoring commented Nov 15, 2024 • edited Loading

SyntaxColoring left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TamarZanzouri commented Nov 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TamarZanzouri commented Nov 5, 2024 •

edited

Loading

SyntaxColoring Nov 7, 2024 •

edited

Loading

SyntaxColoring commented Nov 15, 2024 •

edited

Loading

SyntaxColoring Nov 20, 2024 •

edited

Loading

SyntaxColoring Nov 20, 2024 •

edited

Loading