-
-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guess image mime type from file extension (fixes #5196) #5212
Conversation
i think this is a rather bad solution. if we want a fallback we should instead use mimetype sniffing, something like https://github.com/flier/rust-mime-sniffer (don't know if this is the best library, just a quick search result) |
This is exactly the same approach which was used by lemmy-ui before 0.19.6 (see here). By moving the logic into the backend, other frontends/clients can also benefit from it. Downloading and parsing the full file may be more accurate in some edge cases, but it would also use a lot of server resources similar to #4957, so its not a real option. |
We don't need to download the full file. We already limit it to 1 MiB (once #5208 (comment) is merged) for opengraph metadata extraction, we could do a similar fallback option that fetches e.g. the first 512 bytes (https://github.com/flier/rust-mime-sniffer/blob/6413ad0a853aa8ce273ab5370020ec0dc36bbf50/src/magic.rs#L107-L109) or even 4 KiB (https://github.com/flier/rust-mime-sniffer/blob/6413ad0a853aa8ce273ab5370020ec0dc36bbf50/src/magic.rs#L379). |
crates/api_common/src/request.rs
Outdated
let mut content_type: Option<Mime> = response | ||
.headers() | ||
.get(CONTENT_TYPE) | ||
.and_then(|h| h.to_str().ok()) | ||
.and_then(|h| h.parse().ok()); | ||
|
||
// In some cases servers send a wrong mime type for images, which prevents thumbnail | ||
// generation. To avoid this we also try to guess the mime type from file extension. | ||
let guess = mime_guess::from_path(url.path()); | ||
if let Some(guess) = guess.first() { | ||
if guess.type_() == mime::IMAGE { | ||
content_type = Some(guess); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this up a bit: #5213
* Mime check fixes. * Adding back comment.
Not a big deal for this one, but in the future put the |
@dessalines In what way? The linking seems to work fine here. |
I spose the main thing, is that it doesn't put a linkable issue in the body of the first comment of a PR ( I do see it lower though ). There might be some other things but I'm forgetting. |
even if you don't want to sniff content type from response bytes, why not use that as fallback only if no suitable mime type could be determined from header/content + opengraph instead of using it as preferred option? |
Probably since services can send the wrong mime type for images, as mentioned in #5196 , and its up to us to handle their misconfigurations. |
the problem with civitai mentioned in #5196 returns a mime-type that wouldn't be handled by other metadata logic though. with the way the logic is currently implemented, there is no point in even sending a request at all if it's determined to be an image based on file extension heuristic, only issue a request after it's not matching an image file type. there are plenty of cases where image hosters serve html while the URL ends in an image file extension, such as https://pasteboard.co/BlkUDi1cB5hi.png. this is not at all uncommon and we shouldn't prioritize misbehaving services over well-behaving ones. the logic here could be changed to first take the response content type, then if it's an image, video or html assume that that is correct, if the content type is another one mime-type detection could be performed. if we were using mime-type sniffing by analyzing the first e.g. 512 bytes of the response, this would likely have a more accurate result than the content-type header provided by the server, and I would suggest preferring it in that case, but the file name is likely less accurate than the content-type header for most values that a content-type header would have. |
I've re-opened the other issue. I don't have a hard stance on this, so I'll let others decide. There's a tradeoff between flexibility and strictness, and both have edge cases that aren't perfect. I'd also be up for re-organizing this logic a bit to only fetch when its html (to get opengraph tags), if we stick with the image guessing from the path. |
@Nothing4You The logic in this PR is the same which was used by lemmy-ui before 0.19.6, and that was working just fine. Detecting mime type from file content might work slightly better, but it would also take more work to implement and I dont see any reason why that would be necessary. If you want to work on that go ahead, but I have other things to do. |
… (LemmyNet#5212)" This reverts commit 63ea99d.
… (LemmyNet#5212)" This reverts commit 63ea99d.
… (LemmyNet#5212)" This reverts commit 63ea99d.
* Revert "Guess image mime type from file extension (fixes #5196) (#5212)" This reverts commit 63ea99d. * Use magic numbers to determine file type. * fmt * Don't wrap response in an option * Regen Cargo.lock * Clean-up + guess mime type from extension if server is unresponsive * Move some things about. * Some cleanup. * Removing comment lines. --------- Co-authored-by: Dessalines <[email protected]>
* Revert "Guess image mime type from file extension (fixes #5196) (#5212)" This reverts commit 63ea99d. * Use magic numbers to determine file type. * fmt * Don't wrap response in an option * Regen Cargo.lock * Clean-up + guess mime type from extension if server is unresponsive * Move some things about. * Some cleanup. * Removing comment lines. --------- Co-authored-by: Dessalines <[email protected]>
No description provided.