Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fasta API as a microservice #630

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Fasta API as a microservice #630

wants to merge 23 commits into from

Conversation

ctcncgr
Copy link
Member

@ctcncgr ctcncgr commented Nov 18, 2024

For #629

@ctcncgr ctcncgr added the enhancement New feature or request label Nov 18, 2024
@nathanweeks
Copy link
Contributor

It appears that the target URL needs to be URL-encoded, or a 404 results. Could support for non-URL-encoded URLs be retained, to enhance usability (e.g., make URLs easier to copy/paste, and verify by visual inspection)?

Otherwise, few places ALLOWED_HOSTS should be updated to change www.soybase.org/data/v2/ to data.soybase.org/

@adf-ncgr
Copy link
Contributor

@nathanweeks regarding the URL encoding, @ctcncgr had asked me about this a while back and I opined that we ought to stick to URL standards (ie not allowing non-allowed characters without encoding); as I understood it, this had to do with URLs passed as query string parameters. But I'm not sure I entirely understand the use cases or what would be involved in bending the rules, so I'm open to further discussion about it if you have strong opinions.

@nathanweeks
Copy link
Contributor

It's mainly about user experience: try starting fasta_api and submitting requests using a few different target URLs/filesystem paths. Having to translate those URLs/paths with a software layer (or by hand) when constructing the complete fasta_api URL-path is a noticeable speed bump.

There are no query parameters per se, as there is no ? in the URL, so I'd opine that URL standards could still be adhered to (no "bending" required) by not requiring URL-encoding.

@adf-ncgr
Copy link
Contributor

OK, sounds like I misunderstood what was being described when I gave @ctcncgr my opinion. Maybe he can elaborate on what he was having issues with.

@ctcncgr
Copy link
Member Author

ctcncgr commented Nov 21, 2024

@nathanweeks @adf-ncgr Initially I really didn't like including things like colon and slash in any of this. I kind of think the query should be broken into something more like param1/param2/param3 (removing things like "-" and ":"), since all are required, encoded as URL parameters instead of parts of a dynamic URL, or sent as part of a POST request. I am however not willing to die on any of these hills and was asked to maintain the original behavior, so will change it to support this np (I think). I also asked the GPT if it was valid in the URI, which it was OK with, HOWEVER, when I asked it how best to use URLs as part of a URI it recommended that I encode it... Which was kinda funny lol.

@ctcncgr ctcncgr marked this pull request as draft November 21, 2024 00:38
@ctcncgr
Copy link
Member Author

ctcncgr commented Nov 21, 2024

Changed to "Draft" after review.

The following needs to be addressed:

  1. The API should accept URLs as part of the URI.
  2. ALLOWED_HOSTS should now use, data.soybase.org/ instead of www.soybase.org/data/v2/

@ctcncgr
Copy link
Member Author

ctcncgr commented Nov 25, 2024

@nathanweeks @adf-ncgr Technically, paths in dynamic URL paths is against OAS as commented on from FastAPI.

From, https://fastapi.tiangolo.com/tutorial/path-params/#openapi-support, "OpenAPI doesn't support a way to declare a path parameter to contain a path inside, as that could lead to scenarios that are difficult to test and define. Nevertheless, you can still do it in FastAPI, using one of the internal tools from Starlette."

I am going to settle on using url as a query parameter instead of as part of a dynamic path parameter to not violate OAS:

url = request.rel_url.query['url']

This allows you to put an unencoded path value into the request without violating OAS.

@adf-ncgr
Copy link
Contributor

sounds like a good solution to me, let us know your thoughts @nathanweeks

@nathanweeks
Copy link
Contributor

That works OK for me!

@ctcncgr
Copy link
Member Author

ctcncgr commented Dec 2, 2024

That works OK for me!

@nathanweeks awesome, thanks for being flexible with this. I'll get this done rq and remove the draft status. It should then prompt for review again.

@ctcncgr ctcncgr marked this pull request as ready for review December 2, 2024 18:39
@ctcncgr
Copy link
Member Author

ctcncgr commented Dec 2, 2024

hey @nathanweeks and @alancleary. This is ready to go. I removed the dependency on fastapi completely, so it just uses our uvloop setup. Things should work now as discussed. Thanks!

@nathanweeks
Copy link
Contributor

A few initial observations:

README.md

  • The docker compose commands reference compose.yaml, whereas the file is currently called "compose.yml" (though this is inconsistent with compose.prod.yaml & compose.dev.yaml --- suggest standardizing on either "yaml" or ".yml" or the other)
  • As the shell performs pathname expansion on the "?" character, the URL for the example curl command should be enclosed in single quotes (particularly to avoid a zsh error when no match is found: https://stackoverflow.com/q/52467094)

Regarding the implementation:

  • URLs with integer ranges seem to fail; e.g.:
% curl 'http://localhost:8080/fasta/fetch/glyma.Wm82.gnm2.Gm01:1-100?url=https://data.legumeinfo.org/Glycine/max/genomes/Wm82.gnm2.DTC4/glyma.Wm82.gnm2.DTC4.genome_main.fna.gz'
500 Internal Server Error

Server got itself in trouble

Stack trace:

fasta_api-1  | Traceback (most recent call last):
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
fasta_api-1  |     resp = await request_handler(request)
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/aiohttp/web_app.py", line 543, in _handle
fasta_api-1  |     resp = await handler(request)
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/fasta_api/http_server.py", line 21, in http_fasta_range
fasta_api-1  |     range = handler.fasta_range(url, seqid, start, end)
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/fasta_api/request_handler.py", line 36, in fasta_range
fasta_api-1  |     seq = pysam.FastaFile(url).fetch(reference=seqid, start=start, end=end)
fasta_api-1  |   File "pysam/libcfaidx.pyx", line 288, in pysam.libcfaidx.FastaFile.fetch
fasta_api-1  |   File "pysam/libcutils.pyx", line 235, in pysam.libcutils.parse_region
fasta_api-1  | TypeError: an integer is required

@ctcncgr
Copy link
Member Author

ctcncgr commented Dec 5, 2024

For consistencies sake, @nathanweeks do you think it would make more sense to just include all of the parameters as url parameters as opposed to part of a dynamic URL?

@nathanweeks
Copy link
Contributor

I would prefer maintaining the current URL path convention, instead of a single endpoint ("/") with required URL parameters. While seemingly less-"flexible", I think it's easier to document, more-predictable/readable, and simpler to use interactively.

@ctcncgr
Copy link
Member Author

ctcncgr commented Dec 10, 2024

I would prefer maintaining the current URL path convention, instead of a single endpoint ("/") with required URL parameters. While seemingly less-"flexible", I think it's easier to document, more-predictable/readable, and simpler to use interactively.

Gotcha, let me fix the issues and I'll get back to you soon.

@ctcncgr
Copy link
Member Author

ctcncgr commented Dec 11, 2024

@nathanweeks I think were good to go. Let me know if you run into more issues or have additional comments. Thanks

@nathanweeks
Copy link
Contributor

Thanks! Still seeing an error with this request (that doesn't involve a range):

$ curl 'http://localhost:8080/fasta/fetch/glyma.Wm82.gnm2.scaffold_2709?url=https://data.legumeinfo.org/Glycine/max/genomes/Wm82.gnm2.DTC4/glyma.Wm82.gnm2.DTC4.genome_main.fna.gz'
500 Internal Server Error

Server got itself in trouble

stack trace from the backend process:

fasta_api-1  | Traceback (most recent call last):
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
fasta_api-1  |     resp = await request_handler(request)
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/aiohttp/web_app.py", line 543, in _handle
fasta_api-1  |     resp = await handler(request)
fasta_api-1  |   File "/usr/local/lib/python3.10/site-packages/fasta_api/http_server.py", line 21, in http_fasta_range
fasta_api-1  |     range = handler.fasta_range(url, seqid, int(start), int(end))
fasta_api-1  | TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants