Guess image mime type from file extension (fixes #5196) #5212

Nutomic · 2024-11-19T09:15:12Z

No description provided.

Nothing4You · 2024-11-19T09:36:02Z

i think this is a rather bad solution.

if we want a fallback we should instead use mimetype sniffing, something like https://github.com/flier/rust-mime-sniffer (don't know if this is the best library, just a quick search result)

Nutomic · 2024-11-19T09:53:29Z

This is exactly the same approach which was used by lemmy-ui before 0.19.6 (see here). By moving the logic into the backend, other frontends/clients can also benefit from it.

Downloading and parsing the full file may be more accurate in some edge cases, but it would also use a lot of server resources similar to #4957, so its not a real option.

Nothing4You · 2024-11-19T10:40:30Z

We don't need to download the full file.

We already limit it to 1 MiB (once #5208 (comment) is merged) for opengraph metadata extraction, we could do a similar fallback option that fetches e.g. the first 512 bytes (https://github.com/flier/rust-mime-sniffer/blob/6413ad0a853aa8ce273ab5370020ec0dc36bbf50/src/magic.rs#L107-L109) or even 4 KiB (https://github.com/flier/rust-mime-sniffer/blob/6413ad0a853aa8ce273ab5370020ec0dc36bbf50/src/magic.rs#L379).

dessalines · 2024-11-19T13:44:38Z

crates/api_common/src/request.rs

+  let mut content_type: Option<Mime> = response
    .headers()
    .get(CONTENT_TYPE)
    .and_then(|h| h.to_str().ok())
    .and_then(|h| h.parse().ok());

+  // In some cases servers send a wrong mime type for images, which prevents thumbnail
+  // generation. To avoid this we also try to guess the mime type from file extension.
+  let guess = mime_guess::from_path(url.path());
+  if let Some(guess) = guess.first() {
+    if guess.type_() == mime::IMAGE {
+      content_type = Some(guess);
+    }
+  }


I cleaned this up a bit: #5213

* Mime check fixes. * Adding back comment.

dessalines · 2024-11-19T13:50:11Z

Not a big deal for this one, but in the future put the fixes #X message in the body of the commit message, not the first line. I always used to do it that way too but sleepless noted that github doesn't handle linking issues as cleanly when its the first line / commit title.

Nutomic · 2024-11-19T13:53:55Z

@dessalines In what way? The linking seems to work fine here.

dessalines · 2024-11-19T13:59:29Z

I spose the main thing, is that it doesn't put a linkable issue in the body of the first comment of a PR ( I do see it lower though ). There might be some other things but I'm forgetting.

Nothing4You · 2024-11-19T14:19:51Z

even if you don't want to sniff content type from response bytes, why not use that as fallback only if no suitable mime type could be determined from header/content + opengraph instead of using it as preferred option?

dessalines · 2024-11-19T14:22:43Z

Probably since services can send the wrong mime type for images, as mentioned in #5196 , and its up to us to handle their misconfigurations.

Nothing4You · 2024-11-19T14:48:46Z

the problem with civitai mentioned in #5196 returns a mime-type that wouldn't be handled by other metadata logic though.

with the way the logic is currently implemented, there is no point in even sending a request at all if it's determined to be an image based on file extension heuristic, only issue a request after it's not matching an image file type.
in #5196 it was also pointed out that civitai is not even using a file extension that reflects the actual response content type. yes, it is still an image format, but not the one that is being served.

there are plenty of cases where image hosters serve html while the URL ends in an image file extension, such as https://pasteboard.co/BlkUDi1cB5hi.png. this is not at all uncommon and we shouldn't prioritize misbehaving services over well-behaving ones.

the logic here could be changed to first take the response content type, then if it's an image, video or html assume that that is correct, if the content type is another one mime-type detection could be performed.
depending on the content types lemmy wants to properly support, I'd probably even consider only performing auto-detection when certain known-unprecise content types are detected, such as application/octet-stream, which is the default content-type, or binary/octet-stream, which seems to be an AWS S3 default based on comments in this issue.

if we were using mime-type sniffing by analyzing the first e.g. 512 bytes of the response, this would likely have a more accurate result than the content-type header provided by the server, and I would suggest preferring it in that case, but the file name is likely less accurate than the content-type header for most values that a content-type header would have.

dessalines · 2024-11-19T15:02:02Z

I've re-opened the other issue. I don't have a hard stance on this, so I'll let others decide. There's a tradeoff between flexibility and strictness, and both have edge cases that aren't perfect.

I'd also be up for re-organizing this logic a bit to only fetch when its html (to get opengraph tags), if we stick with the image guessing from the path.

Nutomic · 2024-11-20T11:36:13Z

@Nothing4You The logic in this PR is the same which was used by lemmy-ui before 0.19.6, and that was working just fine. Detecting mime type from file content might work slightly better, but it would also take more work to implement and I dont see any reason why that would be necessary. If you want to work on that go ahead, but I have other things to do.

… (LemmyNet#5212)" This reverts commit 63ea99d.

* Revert "Guess image mime type from file extension (fixes #5196) (#5212)" This reverts commit 63ea99d. * Use magic numbers to determine file type. * fmt * Don't wrap response in an option * Regen Cargo.lock * Clean-up + guess mime type from extension if server is unresponsive * Move some things about. * Some cleanup. * Removing comment lines. --------- Co-authored-by: Dessalines <[email protected]>

Nutomic requested review from dessalines, phiresky, dullbananas and SleeplessOne1917 as code owners November 19, 2024 09:15

Nutomic force-pushed the guess-mime branch from 5e43a84 to 237851e Compare November 19, 2024 09:16

Guess image mime type from file extension (fixes #5196)

73f516d

Nutomic force-pushed the guess-mime branch from 237851e to 73f516d Compare November 19, 2024 09:16

dessalines reviewed Nov 19, 2024

View reviewed changes

Mime check fixes. (#5213)

170e3b1

* Mime check fixes. * Adding back comment.

dessalines approved these changes Nov 19, 2024

View reviewed changes

dessalines enabled auto-merge (squash) November 19, 2024 13:47

dessalines merged commit 63ea99d into main Nov 19, 2024
1 of 2 checks passed

SleeplessOne1917 deleted the guess-mime branch November 19, 2024 20:05

flamingo-cant-draw added a commit to flamingo-cant-draw/lemmy that referenced this pull request Nov 25, 2024

Revert "Guess image mime type from file extension (fixes LemmyNet#5196)…

20fe9ee

… (LemmyNet#5212)" This reverts commit 63ea99d.

flamingo-cant-draw added a commit to flamingo-cant-draw/lemmy that referenced this pull request Nov 25, 2024

Revert "Guess image mime type from file extension (fixes LemmyNet#5196)…

6a41fce

… (LemmyNet#5212)" This reverts commit 63ea99d.

flamingo-cant-draw added a commit to flamingo-cant-draw/lemmy that referenced this pull request Nov 25, 2024

Revert "Guess image mime type from file extension (fixes LemmyNet#5196)…

878e8b8

… (LemmyNet#5212)" This reverts commit 63ea99d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guess image mime type from file extension (fixes #5196) #5212

Guess image mime type from file extension (fixes #5196) #5212

Nutomic commented Nov 19, 2024

Nothing4You commented Nov 19, 2024

Nutomic commented Nov 19, 2024

Nothing4You commented Nov 19, 2024

dessalines Nov 19, 2024

dessalines commented Nov 19, 2024

Nutomic commented Nov 19, 2024

dessalines commented Nov 19, 2024 •

edited

Loading

Nothing4You commented Nov 19, 2024

dessalines commented Nov 19, 2024

Nothing4You commented Nov 19, 2024 •

edited

Loading

dessalines commented Nov 19, 2024

Nutomic commented Nov 20, 2024

Guess image mime type from file extension (fixes #5196) #5212

Guess image mime type from file extension (fixes #5196) #5212

Conversation

Nutomic commented Nov 19, 2024

Nothing4You commented Nov 19, 2024

Nutomic commented Nov 19, 2024

Nothing4You commented Nov 19, 2024

dessalines Nov 19, 2024

Choose a reason for hiding this comment

dessalines commented Nov 19, 2024

Nutomic commented Nov 19, 2024

dessalines commented Nov 19, 2024 • edited Loading

Nothing4You commented Nov 19, 2024

dessalines commented Nov 19, 2024

Nothing4You commented Nov 19, 2024 • edited Loading

dessalines commented Nov 19, 2024

Nutomic commented Nov 20, 2024

dessalines commented Nov 19, 2024 •

edited

Loading

Nothing4You commented Nov 19, 2024 •

edited

Loading