Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use copy of state for state dumps to file #365

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/+use-copy-for-statedump.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Use a copy of the Zino state for saving it to a file to avoid errors due to changes to the state in the middle of saving
3 changes: 2 additions & 1 deletion src/zino/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ class ZinoState(BaseModel):
def dump_state_to_file(self, filename: str):
"""Dumps the full state to a file in JSON format"""
_log.debug("dumping state to %s", filename)
copied_state = self.model_copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may actually want a deep copy of the state to ensure nothing is being updated from another thread:

https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I try to use deep copy I get the following exception

2024-09-03 09:48:07,045 - ERROR - apscheduler.executors.default (asyncio_0) - Job "ZinoState.dump_state_to_file (trigger: interval[0:05:00], next run at: 2024-09-03 09:53:07 CEST)" raised an exception
Traceback (most recent call last):
  File "/home/johanna/zino/.venv/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/home/johanna/zino/src/zino/utils.py", line 33, in wrapper
    result = func(*args, **kwargs)
  File "/home/johanna/zino/src/zino/state.py", line 45, in dump_state_to_file
    copied_state = self.model_copy(deep=True)
  File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 311, in model_copy
    copied = self.__deepcopy__() if deep else self.__copy__()
  File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 777, in __deepcopy__
    _object_setattr(m, '__dict__', deepcopy(self.__dict__, memo=memo))
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/home/johanna/zino/.venv/lib/python3.10/site-packages/pydantic/main.py", line 789, in __deepcopy__
    deepcopy({k: v for k, v in self.__pydantic_private__.items() if v is not PydanticUndefined}, memo=memo),
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 206, in _deepcopy_list
    append(deepcopy(a, memo))
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 238, in _deepcopy_method
    return type(x)(x.__func__, deepcopy(x.__self__, memo))
  File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 206, in _deepcopy_list
    append(deepcopy(a, memo))
  File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.10/copy.py", line 211, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.10/copy.py", line 211, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_asyncio.Future' object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, that seems pretty weird... what could be putting a Future object into this structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm investigating it - it is not FlappingStates - I checked out a commit before it was added and the same error happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with PlannedMaintenances

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have figured out that the problem lies with ZinoState.events - I will investigate further which of them lead to problems

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then again, I'm not sure that this actually solves the issue at hand. I still get the occasional error about dicts changing size while Pydantic is iterating them to copy them, which was the reason for doing a copy to begin with. And, of course, this makes sense. The copy is being made in the same thread that is doing the dumping, so if the structure is being changed by another thread, the same problem will still occur.

The only way to ensure we don't get this error is to ensure that no other threads are concurrently modifying the data structure while it's being serialized to disk (or while it's being copied). This can be done by either making sure we're not running multiple threads, or by adding locks (which will complicate things quite a bit).

It could also be "worked around", by simply retrying the serialization several times if it fails due to these errors - which would be safe enough when #366 is merged, as that would protect us from leaving a corrupt state file when the error inevitably occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe the reason the dumping is happening in a separate thread is because the method dump_state_to_file() is not declared as async. When this function is scheduled to run using APscheduler, apscheduler will automatically defer its execution to a separate worker thread because it is a non-async function and would block the entire event loop.

I think the safest way to run the function would be to declare it as an async function, which lets it run in the main event loop thread. It will, of course, block the event loop while serializing - which could be ok if it's fast, disastrous if it is slow. A more complicated implementation would be to implement it as a function that copies the state in the main thread, and then defers the actual serialization of this copy to a worker thread (since the disk i/o is what would block the most).

with open(filename, "w") as statefile:
statefile.write(self.model_dump_json(exclude_none=True, indent=2))
statefile.write(copied_state.model_dump_json(exclude_none=True, indent=2))

@classmethod
@log_time_spent()
Expand Down
Loading