-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MP3 without ID3 algorithm has various problems #70
Comments
cc @padenot |
The precision of this algorithm looks odd compared to the other heuristics - it detects MP3s very accurately - and requires 1.5K of input data as a result. The rest of the sniffing specification is expected to produce less accurate results - including MP3 with ID3 (see #69). As mentioned above Chrome just uses a 2-byte signature match for this - Chrome uses a 1024 byte sniffing buffer which was chosen to fit in 1 ethernet packet along with headers: https://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?revision=HEAD&view=markup#l680 |
Thanks for your comments, I'll have a look and fix the various issues you've outlined. We need a bit more precision here, because This reasoning applies to WebM as well. |
@padenot - that makes sense. Does that mean there's a problem with MP3 with ID3 detection in the spec? Currently a text file starting "ID3 is easy to detect" is detected as "audio/mpeg" (mp3sniff.c does it much more accurately). Also if it's important to detect MP3s as accurately as this should MP3 detection happen earlier in the algorithm to prevent any of the other heuristics matching? It looks like an MP3 without ID3 will be detected by the FF FE pattern for UTF-16LE BOM in identifying-a-resource-with-an-unknown-mime-type. |
I filed WebAudio/web-audio-api#1563 to get more clarity on that method. |
Similar to the concerns I mentioned with WebM in #93, I also wonder if this algorithm is too complex for the purpose it is intending to serve. The mimesniff spec is not meant to be replicating the logic on how to process or play an MP3 file; it is merely supposed to be identifying that a file is indeed an MP3 file. Neither #4 nor the documents it links to give much insight into whether this algorithm is as minimal as it needs to be. In fact, the initial Firefox implementation had multiple hiccups that suggest how complex this algorithm is. For reference, here are the implementation bugs for Firefox: And the current implementation: This document suggests that there are very few pieces of information that signal "MPEG Audio Layer III": How much parsing of "MPEG" do we really have to do to identify that "Audio Layer III" exists? |
Come to think of it, the resulting MIME type is "audio/mpeg". We don't actually have to identify the "Layer III" part at all. |
Well, hmm... This kind of comes down to what the point of the MIME sniffing spec is. It can't be purely to turn byte streams into MIME types, can it? Because the MIME types aren't directly what anyone cares about. Instead I think the MIME types are being used as a proxy for various decisions. Probably most notably, "what decoder to use" when trying to actually present the user with this content. So I was kind of under the impression you'd use a different decoder for MP3 than for other MPEG audio formats, and so it might be worth differentiating? |
The introduction (mostly written by @abarth) suggests otherwise. My opinion is that once you identify what type of resource something is, how to actually deal with that resource is outside the scope of this spec (usually because another spec already exists). It's why, for example, we documented the idiosyncrasies of GIF on the wiki instead of here. If this point is contentious, though, let's spin off a separate issue for discussion.
For MPEG audio in particular, this point may be moot. My naive understanding is that MP1, MP2, and MP3 have significant enough similarities that they could be decoded with the same decoder. |
There's two places where we have a need for sniffing in the web platform as I understand it:
And there will be a third: https://github.com/annevk/orb. |
That seems sufficient enough to spin off #152. |
Looking through that issue and what became of it, I've concluded that the current "MP3 without ID3" algorithm is misplaced, confusing, and potentially harmful. We should not be sniffing MP3 data so deeply in this context. |
The algorithm looks like it's based on mp3sniff.c
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/mediasniffer/mp3sniff.c
which was added to fix this issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=862088
The algorithm is described in
https://mimesniff.spec.whatwg.org/#signature-for-mp3-without-id3
Bitfield operator precedence
The bitfield extraction operations like
sequence[s + 2] & 0x0c >> 2
should have brackets(sequence[s + 2] & 0x0c) >> 2
(like mp3sniff.c) since in C/C++/C#/Java/JavaScript>>
has higher precedence than&
so0x0c >> 2
is evaluated before the&
Step 2 in match-an-mp3-header should read 'or' instead of 'and'
This describes is_mp3() in mp3sniff.c. Step 2 reads "If sequence[s] is not equal to 0xff and sequence[s + 1] & 0xe0 is not equal to 0xe0, return false." It should read "If sequence[s] is not equal to 0xff or sequence[s + 1] & 0xe0 is not equal to 0xe0, return false."
Step 7 looks wrong because s - length is always negative and skipped-bytes >= 0 so the algorithm always returns false at this step.
Step 7 reads "If skipped-bytes is less than 4, or skipped-bytes is greater than s - length, return false."
Confusion over naming
Raw MP3s are structured like this:
The "match an mp3 header" and "parse an mp3 frame" computations both operate on the same data structure - the 4-byte MP3 frame header. This would be much clearer if they were named "match an mp3 frame header" and "parse an mp3 frame header". There are also variables named
sample-rate
which is the same assamplerate-index
andsamplerate
which is not the same assample-rate
.Confusion over return values
The "match an mp3 header" is equivalent to is_mp3() in mp3sniff.c which returns true/false and "parse an mp3 frame" is equivalent to mp3_parse() which returns an mp3_header struct. All the values used by the compute-an-mp3-frame-size calculation are computed in parse-an-mp3-frame except
freq
which is computed in match-an-mp3-header (freq was missing from the original commit 998b959 of this algorithm, and was added later in commit d954dab).Note: Chrome doesn't implement any of this and detects MP3s using two signatures:
"ID3" (for MP3 in ID3)
FF E0 / FF E0 (match raw MP3 frame header - equivalent to step 2 in match-an-mp3-header)
The text was updated successfully, but these errors were encountered: