-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35
Comments
The one hash function currently in use by the Unsecret Agent team is. E.g.:
|
What's entailed here is to compute the sha1sum on the file in the client browser before the upload, then upload the hash then have the server recheck the uploaded data. The sha1sum "file" should be added to the KGE file archive. Any archive created on the server side, for downloading, would also have a sha1sum computed and available for independent downloading by the UI (and/or CLI and/or program library). |
Hi @jeffhhk, I'd just like to clarify something. In your mind, does |
Hash before the upload? What would be the benefit? Hashing before upload would compound the significant performance problems in the upload implementation. It would also close off the possibility of labeling the upload with extra information. |
@kbruskiewicz Great question. The purpose of the sha1 hash is to track the identity of a particular incarnation of a particular knowledge graph. Thus, if we observe an artifact with a certain sha1 in our system, and we see the same sha1 in KGE, then we can know (with high probabilistic bound) that we do not have to download or reprocess said artifact. The only thing our system knows how to process is a whole knowledge graph. We do not have a use case for processing one file of a File Set. |
Richard is referring to some spit-balling we did when we were first thinking through the issue. I asked this question about handling the archive vs handling files in the archive as it does affect my implementation strategy - our current understanding wants to hash server-side, at the point where an archive is generated. I will continue any broader thoughts in #45. |
Done! |
... generated in the post-processing step after data set uploads
The text was updated successfully, but these errors were encountered: