-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a .qsv
file format that is an implementation of W3C's CSV on the Web
#1982
Comments
Also consider https://digital-preservation.github.io/csv-schema/ |
.qsv
file that is an implementation of W3C's CSV on the Web.qsv
file format that is an implementation of W3C's CSV on the Web
Experimenting with this: Sample |
For comparison, note that several popular file formats are actually compressed "packages":
|
May be nice if the .qsv file is verified to be validated or there's a flag that can be quickly checked to see if it is or not along with whether an index is available. |
Right @rzmk ! The Further, we can assign a Digital Object Identifier (DOI) to each qsv file so we can track/trace its provenance, and possibly, downstream use. |
If done properly, even with all the extra metadata in the |
The qsv file will contain the cache file (#2097 ). |
Related to #1705. |
Worth experimenting with different compression algorithms. We have found Zstandard to work very well with csv files. |
Thanks @Orcomp , do you have any benchmarks/metrics you can share? For Zstandard and other compression algorithms you considered? |
You can check out https://morotti.github.io/lzbench-web (From my personal experience, zstd has a good balance between compression ratio and compress/decompress speeds. I looked into this 2-3 years ago, so things might have changed a bit since.) |
Instead of just signing the qsv using conventional techniques, "explore using two emerging standards: the W3C Verifiable Credentials Data Model 2.0 and Decentralized Identifiers (DIDs) v1.0 that leverage NIST's FIPS 186-5 but also align well with DCAT RDF model, making both human and machine readable." |
Currently, qsv creates, consumes and validates CSV files hewing closely to the RFC4180 specification as interpreted by the csv crate.
However, it doesn't allow us to save additional metadata - about the CSV file (dialect, delimiter used, comments, DOI, url, etc.) nor the data the file contains (summary statistics, data dictionary, creator, last updated, hash of the data, etc.)
The request is to create a
.qsv
file format that is an implementation of W3C's CSV on the Web specification using guidance on https://csvw.org and store schemata/metadata/data in the qsv file that includes not just the schema info, but summary and frequency statistics as well; container for DCAT 3/CKAN package/resource metadata; etc.Doing so will unlock additional capabilities in qsv, qsv pro, Datapusher+ and CKAN.
It will also allow us to "clean-up" and consolidate the "metadata" files that qsv creates - the
stats
cache files, the index file, etc. and package up the CSV and its associated metadata in one container as a signed zip file.It will also make "harvesting" and federation with CKAN easier and more robust as all the needed data/metadata is in one container.
The text was updated successfully, but these errors were encountered: