Calculate hashes of images and include it in image_details field to improve image caching #5238

asudox · 2024-11-29T09:51:44Z

Requirements

Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
Did you check to see if this issue already exists?
Is this only a feature request? Do not put multiple feature requests in one issue.
Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Do you agree to follow the rules in our Code of Conduct?

Is your proposal related to a problem?

The image_details table in (for example) a getpost json response does not include the hash of the image. A hash could be used to cache images better.

I assume the link field in image_details could be used for image caching, but this would not cache duplicate images or duplicates in other instances.

Describe the solution you'd like.

A SHA256 hash would be calculated and stored when an image is uploaded to an instance. This would then be returned in the image_details table.

Describe alternatives you've considered.

None.

Additional context

No response

Nutomic · 2024-11-29T13:09:21Z

Images are already served with all the necessary headers for caching:

cache-control: public, max-age=86400, immutable
etag: W/"1167f-193300d43e0"

asudox · 2024-11-29T14:15:58Z

Images are already served with all the necessary headers for caching:
cache-control: public, max-age=86400, immutable
etag: W/"1167f-193300d43e0"

that caches the image at that unique link. if there are duplicates of the same image, this would not work.

if, for instance, the same image is uploaded again by another user (on the same instance or another instance), this wouldn't get the cached image, but make a new request to get the same image, even though the same image with the same hash is available in the image cache because the duplicate has a different link.

dessalines · 2024-11-29T14:20:24Z

Seems like pictrs could handle this case, maybe via redirects or something on duplicate hashes to the same image.

cc @asonix

asudox · 2024-11-29T14:23:45Z

Seems like pictrs could handle this case, maybe via redirects or something on duplicate hashes to the same image.

well, I guess that would save storage and solve this problem when the duplicate image is on the same instance.
however, with hashes, it wouldn't matter if that image is uploaded to instance X or instance Y, it would still work.

what you suggested could probably be another feature request for saving storage as it does not quite achieve what I meant in my feature request.

Nutomic · 2024-12-02T09:39:02Z

Pictrs already deduplicates images if they are identical, although this doesnt seem to be documented. This is only for storage, I believe the api serves full binary data for each duplicate instead of a redirect. Anyway improvements for this should be suggested to pictrs directly.

https://git.asonix.dog/asonix/pict-rs

https://matrix.to/#/%23pictrs:matrix.asonix.dog?via=matrix.asonix.dog

asudox · 2024-12-02T11:03:52Z

Pictrs already deduplicates images if they are identical, although this doesnt seem to be documented. This is only for storage, I believe the api serves full binary data for each duplicate instead of a redirect. Anyway improvements for this should be suggested to pictrs directly.

https://git.asonix.dog/asonix/pict-rs

https://matrix.to/#/%23pictrs:matrix.asonix.dog?via=matrix.asonix.dog

No no, that is not what my feature request is about. That is what dessalines suggested.

This feature request is for the lemmy clients out there so that image caching across different instances can be possible.

For example I (from instance X) see a cat post in a lemmy community and decide to download it and post it in another community. The image I downloaded gets uploaded to my instance and the post gets posted. Now, another user (from instance Y) comes and downloads the cat picture from my post and posts it in another community. It gets uploaded to instance Y.

An hour later a lemmy user scrolls through their feed as they see two identical cat pictures in two different lemmy communities. Since there's no hash delivered within the image_details table, the lemming's client fetches the image. The client then proceeds to fetch the second identical cat image from the other lemmy community even though they are the same image, just hosted in different instances. If lemmy's backend included an image hash in the image_details table, the client could've fetched the first identical cat picture from the lemmy community, cached it and then proceeded to load the second identical cat picture in the different lemmy community. Since the previous identical cat picture was fetched and cached, the client can load the second identical cat picture from cache just by comparing the cached images' hashes and the second post's image hash in the image_details table.

With just using the link of the image, there is no way to solve this.

I also thought of maybe using the BlurHash from #5142 ? It probably could be used for caching instead of the traditional hashes.

Nutomic · 2024-12-02T13:19:36Z

So this would only help in the specific case where a user browses two different Lemmy instances from the same app, and then views posts with identical images but different urls. Thats a very minor use case, and I dont think its worth the effort to optimize for it.

asudox · 2024-12-02T14:29:14Z

So this would only help in the specific case where a user browses two different Lemmy instances from the same app, and then views posts with identical images but different urls. Thats a very minor use case, and I dont think its worth the effort to optimize for it.

Yep, I do know that happens though, with multiple communities that serve the same purpose and all that. But like I said at the end, I think the blurhash field that seems like is going to likely be added, can be used for this case.

dessalines · 2024-12-02T17:54:03Z

Yep, blurhash will be added in the next pictrs and lemmy release.

Image hosting in general badly needs a decentralized hosted option, ideally one based on torrents or IPFS, because the situation right now is horrible. The exact same image gets shared to tons of sites and platforms, each having to host their own copy, while sharing none of the bandwidth to serve them, and wasting tons of disk space. We're just exacerbating that problem with lemmy (although the new proxying image feature of pictrs helps).

If I had a lot more time I'd work on something.

asudox added the enhancement New feature or request label Nov 29, 2024

asudox changed the title ~~Calculate hash of images and include it in image_details field to improve image caching~~ Calculate hashes of images and include it in image_details field to improve image caching Nov 29, 2024

Nutomic closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate hashes of images and include it in image_details field to improve image caching #5238

Calculate hashes of images and include it in image_details field to improve image caching #5238

asudox commented Nov 29, 2024 •

edited

Loading

Nutomic commented Nov 29, 2024

asudox commented Nov 29, 2024 •

edited

Loading

dessalines commented Nov 29, 2024

asudox commented Nov 29, 2024 •

edited

Loading

Nutomic commented Dec 2, 2024

asudox commented Dec 2, 2024 •

edited

Loading

Nutomic commented Dec 2, 2024

asudox commented Dec 2, 2024 •

edited

Loading

dessalines commented Dec 2, 2024

Calculate hashes of images and include it in image_details field to improve image caching #5238

Calculate hashes of images and include it in image_details field to improve image caching #5238

Comments

asudox commented Nov 29, 2024 • edited Loading

Requirements

Is your proposal related to a problem?

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context

Nutomic commented Nov 29, 2024

asudox commented Nov 29, 2024 • edited Loading

dessalines commented Nov 29, 2024

asudox commented Nov 29, 2024 • edited Loading

Nutomic commented Dec 2, 2024

asudox commented Dec 2, 2024 • edited Loading

Nutomic commented Dec 2, 2024

asudox commented Dec 2, 2024 • edited Loading

dessalines commented Dec 2, 2024

asudox commented Nov 29, 2024 •

edited

Loading

asudox commented Nov 29, 2024 •

edited

Loading

asudox commented Nov 29, 2024 •

edited

Loading

asudox commented Dec 2, 2024 •

edited

Loading

asudox commented Dec 2, 2024 •

edited

Loading