Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may actually want a deep copy of the state to ensure nothing is being updated from another thread:
https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I try to use deep copy I get the following exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, that seems pretty weird... what could be putting a
Future
object into this structure?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm investigating it - it is not
FlappingStates
- I checked out a commit before it was added and the same error happensThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with
PlannedMaintenances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have figured out that the problem lies with
ZinoState.events
- I will investigate further which of them lead to problemsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then again, I'm not sure that this actually solves the issue at hand. I still get the occasional error about dicts changing size while Pydantic is iterating them to copy them, which was the reason for doing a copy to begin with. And, of course, this makes sense. The copy is being made in the same thread that is doing the dumping, so if the structure is being changed by another thread, the same problem will still occur.
The only way to ensure we don't get this error is to ensure that no other threads are concurrently modifying the data structure while it's being serialized to disk (or while it's being copied). This can be done by either making sure we're not running multiple threads, or by adding locks (which will complicate things quite a bit).
It could also be "worked around", by simply retrying the serialization several times if it fails due to these errors - which would be safe enough when #366 is merged, as that would protect us from leaving a corrupt state file when the error inevitably occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe the reason the dumping is happening in a separate thread is because the method
dump_state_to_file()
is not declared as async. When this function is scheduled to run using APscheduler, apscheduler will automatically defer its execution to a separate worker thread because it is a non-async function and would block the entire event loop.I think the safest way to run the function would be to declare it as an
async
function, which lets it run in the main event loop thread. It will, of course, block the event loop while serializing - which could be ok if it's fast, disastrous if it is slow. A more complicated implementation would be to implement it as a function that copies the state in the main thread, and then defers the actual serialization of this copy to a worker thread (since the disk i/o is what would block the most).