Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing tiny .tar.gz file stays in processing state for over 1 hour #63

Closed
jeffhhk opened this issue Sep 20, 2021 · 3 comments
Closed

Comments

@jeffhhk
Copy link
Contributor

jeffhhk commented Sep 20, 2021

Repro:

  1. create a new KG (I named mine test-tgz-upload)

  2. create a .tar.gz file containing the first 10000 lines of each file in rtx-kg2. Contents:

     $ ls -l
     total 1668
     -rw-rw-r-- 1 jeff jeff 457784 Aug 20 14:38 rtx_kg2.edgeprop.tsv
     -rw-rw-r-- 1 jeff jeff 364168 Aug 20 14:38 rtx_kg2.edge.tsv
     -rw-rw-r-- 1 jeff jeff 727841 Aug 20 14:44 rtx_kg2.nodeprop.tsv
     -rw-rw-r-- 1 jeff jeff 154579 Aug 20 14:41 rtx_kg2.node.tsv
     $ wc -l *
       10000 rtx_kg2.edgeprop.tsv
       10000 rtx_kg2.edge.tsv
       10000 rtx_kg2.nodeprop.tsv
       10000 rtx_kg2.node.tsv
       40000 total
    
  3. Register new fileset, version 1.0.

  4. Select "nodes". Upload the .tar.gz file prepared above.

  5. After uploading, select "Done Uploading" (performed at around Sep 20 11:20am PST.)

Result: File set has had "status": "Processing" for over an hour.

@jeffhhk
Copy link
Contributor Author

jeffhhk commented Sep 21, 2021

When sampled every 30 seconds, the GET https://archive.translator.ncats.io/archive/test-tgz-upload/1.0/metadata, the server returned:

  {
    "provider": {
      "kg_id": "test-tgz-upload",
      "kg_name": "test-tgz-upload",
      "kg_description": "    ",
      "translator_component": "ARA",
      "translator_team": "Unsecret Agent",
      "submitter_name": "Jeff Henrikson",
      "submitter_email": "[email protected]",
      "license_name": "Other",
      "license_url": "",
      "terms_of_service": ""
    },
    "fileset": {
      "biolink_model_release": "2.2.4",
      "fileset_version": "1.0",
      "date_stamp": "2021-09-20",
      "submitter_name": "Jeff Henrikson",
      "submitter_email": "[email protected]",
      "status": "Processing",
      "files": [
        {
          "original_name": "rtx-kg2-202108-lines10000.tar.gz"
        }
      ],
      "size": 0.2545785903930664
    }
  }

For the window beginning Mon 20 Sep 2021 06:35:49 PM UTC and ending Mon 20 Sep 2021 07:54:46 PM UTC.

Today, at Tue 21 Sep 2021 03:55:39 PM UTC, in response to the same request, the server returns:

  {"type": "about:blank", "title": "Internal Server Error", "detail": "Server got itself in trouble", "status": 500}

@RichardBruskiewich
Copy link
Collaborator

Fixed the bug that threw the exception but the underlying cause of the error condition was a premature (manual removal) of the "test" knowledge graph from the system (mea culpa!)

This suggests that we consider distinguishing between private and public datasets, where private can include 'test' datasets.

See Issue #64

@RichardBruskiewich
Copy link
Collaborator

The upload crashed for an obscure reason - manual removal of the given dateset when it's presence was still expected.

An associated bug relating to graceful reporting of the failure has been fixed, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants