Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tag cleanup job #12654

Merged
merged 1 commit into from
Sep 16, 2024
Merged

feat: tag cleanup job #12654

merged 1 commit into from
Sep 16, 2024

Conversation

jrasm91
Copy link
Contributor

@jrasm91 jrasm91 commented Sep 13, 2024

  • Add a job to remove empty tags
  • Add a button in the UI to manually run a few different clean up jobs
Light Dark
image image
image image
image image

Comment on lines +182 to +209
async deleteEmptyTags() {
await this.dataSource.transaction(async (manager) => {
const ids = new Set<string>();
const tags = await manager.find(TagEntity);
for (const tag of tags) {
const count = await manager
.createQueryBuilder('assets', 'asset')
.innerJoin(
'asset.tags',
'asset_tags',
'asset_tags.id IN (SELECT id_descendant FROM tags_closure WHERE id_ancestor = :tagId)',
{ tagId: tag.id },
)
.getCount();

if (count === 0) {
this.logger.debug(`Found empty tag: ${tag.id} - ${tag.value}`);
ids.add(tag.id);
}
}

if (ids.size > 0) {
await manager.delete(TagEntity, { id: In([...ids]) });
this.logger.log(`Deleted ${ids.size} empty tags`);
}
});
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mertalev if you know a better way to do this, feel free to replace it.

Copy link
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks slick 🚀

Comment on lines +183 to +207
await this.dataSource.transaction(async (manager) => {
const ids = new Set<string>();
const tags = await manager.find(TagEntity);
for (const tag of tags) {
const count = await manager
.createQueryBuilder('assets', 'asset')
.innerJoin(
'asset.tags',
'asset_tags',
'asset_tags.id IN (SELECT id_descendant FROM tags_closure WHERE id_ancestor = :tagId)',
{ tagId: tag.id },
)
.getCount();

if (count === 0) {
this.logger.debug(`Found empty tag: ${tag.id} - ${tag.value}`);
ids.add(tag.id);
}
}

if (ids.size > 0) {
await manager.delete(TagEntity, { id: In([...ids]) });
this.logger.log(`Deleted ${ids.size} empty tags`);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await this.dataSource.transaction(async (manager) => {
const ids = new Set<string>();
const tags = await manager.find(TagEntity);
for (const tag of tags) {
const count = await manager
.createQueryBuilder('assets', 'asset')
.innerJoin(
'asset.tags',
'asset_tags',
'asset_tags.id IN (SELECT id_descendant FROM tags_closure WHERE id_ancestor = :tagId)',
{ tagId: tag.id },
)
.getCount();
if (count === 0) {
this.logger.debug(`Found empty tag: ${tag.id} - ${tag.value}`);
ids.add(tag.id);
}
}
if (ids.size > 0) {
await manager.delete(TagEntity, { id: In([...ids]) });
this.logger.log(`Deleted ${ids.size} empty tags`);
}
});
await this.dataSource.query(`
delete from tags
where not exists(
select 1
from tag_asset
where tag_asset."tagsId" = tags.id
)`);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have a tag Parent/Child I don't want to delete Parent or Child if Child is connected to an asset. Does this handle that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assuming that if the child has a row in tag_asset, the parent also has a row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent doesn't necessarily have a row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why not? It isn't a parent/child relationship if the parent doesn't always include the child.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We link a single tag "Event/Year/Title" to a single asset. We don't want to have to link every tag in the hierarchy, eg Event and Event/Year and Event/Year/Title - that is implied by the fact that Event/Year/Title is the tag name. In terms of the database we track that relationship in a separate table. We have one link per tag hierarchy, instead of one record per asset/nested/tag combination, which would be a lot more records to read, write, and sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to exactly write the not exists query with the current query builder, but I can try to take a look at this in the future.

@alextran1502
Copy link
Contributor

I find the button Create job confusing. Maybe we can have a Clean up job card with three buttons corresponding to the three jobs we have here?

@jrasm91
Copy link
Contributor Author

jrasm91 commented Sep 14, 2024

Maybe. I think there are plans to add more manual jobs and potentially trigger all jobs to be run through a separate interface like this. I'm open to different UX though.

The current problem is that we only have two options - force=true or force=false - and not all of our jobs work with just that option.

Some examples:

  • all the clean up jobs
  • metadata extraction just needs to be run on videos
  • any job just needs to run on assets processed after a certain date
  • changing between thumbnail formats for one thumbnail type doesn't need to regenerate everything

Maybe we can work towards separating the queue stats from the way we queue jobs and the types of jobs we can queue.

@alextran1502
Copy link
Contributor

@jrasm91 sounds good, we can think of a better way of representing these new jobs in a different PR

@jrasm91 jrasm91 merged commit b74b208 into main Sep 16, 2024
35 checks passed
@jrasm91 jrasm91 deleted the fix/delete-empty-tags branch September 16, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants