-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload
and AssetBlob
garbage collection implementation
#2087
base: master
Are you sure you want to change the base?
Conversation
324610c
to
93b64c0
Compare
|
||
|
||
class GarbageCollectionEvent(models.Model): | ||
created = models.DateTimeField(auto_now_add=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super minor but I feel like timestamp
would be a better word to describe this. If we're ever going through these events for some reason, seeing the term "created" thrown about might cause some confusion, as really things are being deleted (except for this record of such).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree timestamp
is clearer.
type = models.CharField( | ||
max_length=255, help_text='The model name of the records that were garbage collected.' | ||
) | ||
records = models.JSONField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason you chose to store a collection of records in one "Event" object, rather than one row per object? I.e. records
would become record
, and garbage_collection_event_id
could be a foreign key to a separate table that stores the events.
One thing that it seems would be easier in that model is searching for a particular garbage collected upload or asset blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I realize this is how it's outlined in the design doc, but this question is just coming to me now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intention was to just have a single GarbageCollectionEvent
when garbage collection runs, and just shove the opaque serialized model into a JSONField for simplicity. But then the issue of JSONField size limits came up, so I switched to the chunking approach.
I agree that just making it one row per object makes more sense, I think I just didn't consider it because the starting point was having everything in one column.
# Chunk the queryset to avoid creating a single | ||
# GarbageCollectionEvent with a huge JSONField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also be something that's easier with an Event <- Record
model, as I mentioned in my other comment. You could have many records pointing to one event, which would allow for just one event record to be stored per "actual" event. In the current case, there could be 10 GarbageCollectionEvent
rows for what's really one "event', i.e. one invocation of _garbage_collect_uploads
or _garbage_collect_asset_blobs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also allow you to point both asset blobs and uploads back to the same single event record, rather than two separate records.
32dcaf1
to
b3f574a
Compare
b3f574a
to
10b3cf2
Compare
Implements the design outlined in #2068.