-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-detect archive format fails if the file extension does not match the actual file format #236
Comments
Hi! This is strange, and I cannot seem to able to replicate this issue 🤔. I'm using the same archive you attached, with the following simple code: try {
Bit7zLibrary lib{R"(C:\Program Files\7-Zip\7z.dll)"};
BitArchiveReader reader{lib, R"(E:\Downloads\TestMismatchOfFileExtensionAndActualFormat.7z.zip)"};
std::println("Detected format: {}", reader.detectedFormat().value());
for (const auto& entry : reader) {
std::println(" - {}", entry.path());
}
} catch (const BitException& ex) {
std::println("\n{}", ex.what());
} Output:
What is the value of
This is currently not possible and should not be needed. I suppose I could add a new build option that disables detection by extension, but I'm a bit hesitant because this library already has many build options, and testing them all is becoming increasingly complex. I'll continue to investigate the issue, but without replicating it is a bit difficult. |
I am using bit7z Here is my code: try
{
const BitInFormat& archiveFormat = GetInputArchiveFormat(); // this returns 👉 bit7z::BitFormat::Auto
Bit7zLibrary lib(this->m_7ZipDllPath);
BitArchiveReader reader(lib, archivePath, archiveFormat); // throws an exception for a .zip extn. 👉 Failed to detect the format of the file: No known signature found. error_code = 15;
auto detectedFormat = reader.detectedFormat().value(); // After changing the extension to .xyz, the value is 👉 7!
entries = reader.items();
}
catch (const std::exception& ex)
{
return OpResult(ex); // failure
}
return OpResult(); // success Sorry - I by mistake mentioned that the When I change the file extension from Thanks for your prompt replies 🙏 |
Thanks for the further details!
This is really strange. If the file is indeed a 7z archive, bit7z should find the signature. The code for detecting the archive format is quite old now and has been tested many times, not just by me. Which compiler and architecture are you using?
This is even stranger, because what does not seem to work is the signature format detection 🤔. |
My Env:
See if this gives you any clues. I have also attached a test app / project that will help in reproducing the problem Bit7Issue_FileExtensionAndFormatMismatch.zip
Thanks again for your support! 🙏 |
No problem at all! Thanks for all the details and test project you provided, it really helped me troubleshoot! 🙏
I have good news: thanks to your test project, I was finally able to reproduce the problem. And I also found the cause: you are using an old version of bit7z, specifically This also explains why I could not replicate the problem: I assumed you were using the If you replace your version of bit7z with the latest I strongly recommend using the latest stable version |
Thanks a ton... 🤗 I will try with the latest version v4.0.8 and give you feedback. Quick question: How to identify from the source files, which exact version one is using? |
You can usually check it in the As a side note, it has happened in the past that I have simply forgotten to update the version in the |
OK - I verified with v4.0.8 that my test program works ! :-) Then I wanted to use it in my application for which I am trying to (re)build bit7z with PS: I have manually modified the bit7z.vcxproj to suit my project - attached herewith. I might have missed or messed something. Is it ok to use Thanks for your support! |
Sorry for the late reply!
Perfect!
Yeah, the
That is a good question. I have been looking for an answer for a long time. |
Thanks for your reply and detailed explanation - it helps. |
Today, one of our testers reported that if the archive contains a file name with special characters e.g.
The callstack shows that the error is thrown by I tested with v4.0.8 and it seems to work fine! So it appears that I will have to migrate to this version! But I have some questions / doubts:
Sorry for these questions, but I am really worried now, as our application needs to process files from the users all over the world in any language! And any failure in zipping/unzipping becomes a blocker! Thanks again for your continuing support! 🙏 🙏 |
This is both a fix and an improvement I made some time ago. First, it fixes problems like the one your tester found in the old Also, this improves the performance of the code: the
On Windows, the native char type is
I think I need to clarify some details: since you are using bit7z with the This is because bit7z, 7-Zip, and MSVC's The call to On Windows, symbolic links are not so common, so I'm not sure if this applies to your use cases. In any case, I think I've answered most of your questions already, but I'll address each of them anyway.
As I said, when using
Wide strings are UTF-16 encoded on Windows, so they support the full set of Unicode codepoints. Note, however, that Windows limits the set of characters that can be used for filenames and paths. If your program only needs to handle archives created on Windows, this is not a problem. Without the Other than that, there's no other restriction.
The UNICODE, _UNICODE flags are needed for bit7z to work properly with 7-Zip, so I would not remove them. As for converting to UTF-8, I already replied above, but there should be no loss of information as far as I know. MSVC's
No problem! I hope I have clarified the issue and reassured you on this matter. And just to be clear, even without the
You're welcome! |
Thank you very much for these clarifications and reassurance! 🙏 🙏 |
You're welcome! 😄 |
With I have attached a sample of such data here that contains the same file with different extensions. My test app reports the following:
I request you to please reopen this issue. I sincerely request you to please provide a way of always using the file's signature to determine the archive format, instead of file's extension, otherwise it produces disastrous results. Thank you 🙏 |
Hi! In the picture, I have highlighted the signature of the 7z archive in red, and that of the zip archive in blue. Since bit7z recognised the zip format from the file extension, it tries to open it using such format. Then, 7-Zip searches for the archive start and detects the signature highlighted in blue, confirming that it has found a zip archive, and opening it successfully (for some reason). As for why the extraction fails and some of the extracted files are corrupted, my hunch is that it fails because the parent 7z file is compressed (unlike that other tar archive): basically, the zip file is compressed twice (once zip and then 7z), but 7-Zip doesn't know this and just tries to extract it using the zip format. As I said in the other issue, this is not a problem with format detection by extension per se, but rather the default behaviour of 7-Zip. This can be turned off, and I'm currently working on an API that will do just that. enum struct ArchiveStart : std::uint8_t {
ScanFile, ///< Search the whole input file for the archive's start (if the file format supports it).
FileStart ///< Check only the file start for the archive's start.
};
//...
// Throwing an error if the file is not a zip archive from the start.
BitArchiveReader( lib, "path-to-archive.zip", ArchiveStart::FileStart, BitFormat::Zip ); To keep the backward compatibility with older versions of bit7z, the default behavior will still be // Searching for the archive's start by seeking through the input file.
BitArchiveReader( lib, "path-to-archive.zip", BitFormat::Zip ); I'll keep this issue opened, and close it once this feature will ship. |
Thanks for your detailed explanation. Please also test using the file I gave you 😊 Thanks! |
I have a real use case where my legacy application produces an archive with 7z format but gives it a .zip extension!
Sorry, but this is beyond my control :-(
Observations:
bit7z::BitFormat::Auto
, then theBitArchiveReader
is not able to read items from this file.auto entries = reader.items();
fails..xyz
, then it works!It appears that the Reader tries to find the format from the file extension (.zip) and then gets confused along the way!
❓ Would it be possible for the Reader to solely use the actual signature of the file format instead of the extension?
Here is a sample archive file with actual format of
7z
, but extension.zip
:TestMismatchOfFileExtensionAndActualFormat.7z.zip
In some cases it was observed that some such files (actual format = 7z, and extension = .zip) actually get opened, but wrong entries are returned. This could be similar to the issue: #235
The text was updated successfully, but these errors were encountered: