KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35

RichardBruskiewich · 2021-05-25T19:25:58Z

... generated in the post-processing step after data set uploads

jeffhhk · 2021-07-20T22:23:07Z

The one hash function currently in use by the Unsecret Agent team is. E.g.:

sha1sum ~/Downloads/semmed.data.zip
e7276d5afac1d13b2909a05618ca14fc07f88c95  semmed.data.zip

RichardBruskiewich · 2021-07-26T23:00:47Z

What's entailed here is to compute the sha1sum on the file in the client browser before the upload, then upload the hash then have the server recheck the uploaded data. The sha1sum "file" should be added to the KGE file archive.

Any archive created on the server side, for downloading, would also have a sha1sum computed and available for independent downloading by the UI (and/or CLI and/or program library).

kennethbruskiewicz · 2021-07-31T01:05:20Z

The one hash function currently in use by the Unsecret Agent team is. E.g.:
sha1sum ~/Downloads/semmed.data.zip
e7276d5afac1d13b2909a05618ca14fc07f88c95  semmed.data.zip

Hi @jeffhhk, I'd just like to clarify something. In your mind, does semmed.data.zip include both the nodes and the edges you use in your reasoner? In other words, with the hash, are you tracking the uniqueness of the knowledge graph on a whole?

jeffhhk · 2021-08-02T17:38:38Z

@RichardBruskiewich

compute the sha1sum on the file in the client browser before the upload

Hash before the upload? What would be the benefit? Hashing before upload would compound the significant performance problems in the upload implementation. It would also close off the possibility of labeling the upload with extra information.

jeffhhk · 2021-08-02T18:17:53Z

@kbruskiewicz Great question. The purpose of the sha1 hash is to track the identity of a particular incarnation of a particular knowledge graph. Thus, if we observe an artifact with a certain sha1 in our system, and we see the same sha1 in KGE, then we can know (with high probabilistic bound) that we do not have to download or reprocess said artifact.

The only thing our system knows how to process is a whole knowledge graph. We do not have a use case for processing one file of a File Set.

kennethbruskiewicz · 2021-08-03T16:23:07Z

@RichardBruskiewich

compute the sha1sum on the file in the client browser before the upload

Hash before the upload? What would be the benefit? Hashing before upload would compound the significant performance problems in the upload implementation. It would also close off the possibility of labeling the upload with extra information.

Richard is referring to some spit-balling we did when we were first thinking through the issue.

I asked this question about handling the archive vs handling files in the archive as it does affect my implementation strategy - our current understanding wants to hash server-side, at the point where an archive is generated. I will continue any broader thoughts in #45.

RichardBruskiewich · 2021-09-20T17:16:34Z

Done!

RichardBruskiewich assigned RichardBruskiewich and kennethbruskiewicz Jul 26, 2021

jeffhhk mentioned this issue Aug 2, 2021

Support upload of multi-file archives #45

Open

RichardBruskiewich closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35

KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35

RichardBruskiewich commented May 25, 2021

jeffhhk commented Jul 20, 2021 •

edited

Loading

RichardBruskiewich commented Jul 26, 2021

kennethbruskiewicz commented Jul 31, 2021 •

edited

Loading

jeffhhk commented Aug 2, 2021

jeffhhk commented Aug 2, 2021

kennethbruskiewicz commented Aug 3, 2021 •

edited

Loading

RichardBruskiewich commented Sep 20, 2021

KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35

KGE files / archives should have md5 and/or sha256 hashes generated and available for download #35

Comments

RichardBruskiewich commented May 25, 2021

jeffhhk commented Jul 20, 2021 • edited Loading

RichardBruskiewich commented Jul 26, 2021

kennethbruskiewicz commented Jul 31, 2021 • edited Loading

jeffhhk commented Aug 2, 2021

jeffhhk commented Aug 2, 2021

kennethbruskiewicz commented Aug 3, 2021 • edited Loading

RichardBruskiewich commented Sep 20, 2021

jeffhhk commented Jul 20, 2021 •

edited

Loading

kennethbruskiewicz commented Jul 31, 2021 •

edited

Loading

kennethbruskiewicz commented Aug 3, 2021 •

edited

Loading