-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document viewer #486
base: main
Are you sure you want to change the base?
Document viewer #486
Conversation
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]> Co-authored-by: Nick Byrne <[email protected]>
ragna/deploy/_api/core.py
Outdated
with get_session() as session: | ||
_, metadata = database.get_document(session, user=user, id=id) | ||
if "path" not in metadata: | ||
raise HTTPException( | ||
status_code=400, | ||
detail="Document path not found", | ||
) | ||
with aiofiles.open(metadata["path"], "rb") as file: | ||
while content := await file.read(1024): | ||
yield content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our documents have a .read()
method that should be used here:
Lines 75 to 77 in 2066dcd
@abc.abstractmethod | |
def read(self) -> bytes: ... | |
It is not async for now, but there is no need to require documents to be on the local file system. We just need the conversion to the Document
object
ragna/ragna/deploy/_api/schemas.py
Lines 29 to 37 in 2066dcd
def to_core(self) -> ragna.core.Document: | |
return ragna.core.LocalDocument( | |
id=self.id, | |
name=self.name, | |
# TEMP: setting an empty metadata dict for now. | |
# Will be resolved as part of the "managed ragna" work: | |
# https://github.com/Quansight/ragna/issues/256 | |
metadata={}, | |
) |
Maybe this is also a good time to resolve this TODO first?
2b38d59
to
8f7c25e
Compare
Co-authored-by: Philip Meier <[email protected]>
@pmeier I have two outstanding questions before being ready for a full review:
Here's a video of how it looks so far. Screencast.from.2024-08-26.11-33-04.mp4 |
Does the browser handle this automatically? Meaning, we just pass it a blob and the browser figures out whether it can show it (
FastAPI has a
Looks good. #466 also states that
Is that possible / planned? |
I thiiink so as long as the right mimetype is specified in the response header. I will confirm and let you know what I find out.
Okay, I'd be interested in seeing the level of effort on building our own. It seems like FileResponse is mostly just a wrapper around StreamingResponse that sets the right headers.
I need to think a bit more about how this could work. Might be best suited for a followup PR. |
68d7973
to
0d17d06
Compare
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
@pmeier This is ready for another look! This is was the source viewer looks like now after putting the accordion widget back and adding a new button. And I added some MIME types to our supported documents so the browser knows what to do with them when it receives the blob. On my browser PDFs open in a new tab, text is displayed as HTML, and Word and Powerpoints are downloaded. For sources with page numbers there's also now an anchor that scrolls the view to the right page for in-browser sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks Blake! One question though: in case the browser downloads the file, I get something like "{uuid.UUID}.docx"
. Why isn't this the proper file name?
That's a typo, thanks for catching! |
@@ -24,6 +24,15 @@ class DocumentUploadParameters(BaseModel): | |||
data: dict | |||
|
|||
|
|||
_MIME_TYPES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's use mimetypes
from the standard library.
): | ||
self.id = id or uuid.uuid4() | ||
self.name = name | ||
self.metadata = metadata | ||
self.handler = handler or self.get_handler(name) | ||
self.mime_type = mime_type or self.parse_mime_type(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to store this in the DB as well. Otherwise any MIME type set by the user will be overridden as soon as we pull it from the DB, because the default would be used.
Closes #466