-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence file input validator does not accept gzipped files #77
Comments
Hey @sarahet thanks for the report. This is currently indeed not supported and would be a feature. It does need some thoughts though to do it correctly. We will discuss this and see how we can put this into out road map. |
Great, thanks. A related issue would be that input validators generally do not have the possibility to allow for file endings hat include multiple dots, like |
yes, I guess that is somewhat related. is the multiple dot version restricted to 2 dots assuming one valid extension and one compression extension? Or are more compex cases possible? |
So far I can only think of one extension and a compression extension, but of course there could be other use cases, that I currently don't have in mind .. |
Is it a use-case to compress you file multiple times, e.g. |
I think |
Would it be a solution to have a constant list of "compression file extensions" similar to the sequence file extensions?
I suggest not to add the compression extensions to the help page, as the product of [file ext.] x [compression ext.] is too large and not helpful to repeat for each input file parameter. We can document somewhere that files with certain extensions get implicitly extracted. |
Core Meeting 2022-03-22We will defer this feature until the seqan3 I/O design is fixed. In the meantime, @eseiler will post a workaround here: |
My workaround#include <seqan3/argument_parser/all.hpp>
#include <seqan3/io/sequence_file/input.hpp>
class my_validator : public seqan3::input_file_validator<void> // No template param in sharg
{
public:
my_validator() : my_validator{combined_extensions} {}
my_validator(my_validator const &) = default;
my_validator & operator=(my_validator const &) = default;
my_validator(my_validator &&) = default;
my_validator & operator=(my_validator &&) = default;
~my_validator() = default;
explicit my_validator(std::vector<std::string> const & extensions)
{
// my_validator::extensions_str = sharg::detail::to_string(extensions); // Sharg only
my_validator::extensions = std::move(extensions);
}
// Optional for readable help page:
std::string get_help_page_message() const
{
return seqan3::detail::to_string("The input file must exist and read permissions must be granted. Valid file extensions are: ",
sequence_extensions,
#if defined(SEQAN3_HAS_BZIP2) || defined(SEQAN3_HAS_ZLIB)
", possibly followed by ", compression_extensions,
#endif
'.');
}
private:
std::vector<std::string> sequence_extensions{seqan3::detail::valid_file_extensions<typename seqan3::sequence_file_input<>::valid_formats>()};
std::vector<std::string> compression_extensions{[&] ()
{
std::vector<std::string> result;
#ifdef SEQAN3_HAS_BZIP2
result.push_back("bz2");
#endif
#ifdef SEQAN3_HAS_ZLIB
result.push_back("gz");
result.push_back("bgzf");
#endif
return result;
}()};
std::vector<std::string> combined_extensions{[&] ()
{
if (compression_extensions.empty())
return sequence_extensions;
std::vector<std::string> result;
for (auto && sequence_extension : sequence_extensions)
{
result.push_back(sequence_extension);
for (auto && compression_extension : compression_extensions)
result.push_back(sequence_extension + std::string{'.'} + compression_extension);
}
return result;
}()};
};
int main()
{
std::string some_path{};
const char * argv[] = {"./test", "-h"};
seqan3::argument_parser parser{"test_parser", 2, argv, seqan3::update_notifications::off};
parser.add_option(some_path, 'i', "input", "Fancy descprition,", seqan3::option_spec::required, my_validator{});
parser.parse();
} Possible output
Works for both sharg and seqan3, I added two comments where the code for both differ. ALso, you would need to use the SHARG macros instead of SEQAN3. The output for a failed validation is quite noisy for seqan3 (it will print all combinations of |
When defining an input file with a validator for sequence input files the following way:
it does not accept
.gz
files but only[embl,fasta,fa,fna,ffn,faa,frn,fastq,fq,genbank,gb,gbk,sam]
Shouldn't this be possible as most sequence input files are actually compressed?
The text was updated successfully, but these errors were encountered: