Retrieve an answer to a question given the context of a webpage or text, using a pretrained machine learning model that runs in the browser.
QuestionMark uses HuggingFace's Transformers.js under the hood.
The question-answering model Xenova/distilbert-base-cased-distilled-squad is used to retrieve the answer.
Loading models for the first time can take a while.
For example, asking a question to the author of a blog post:
questionmark-demo.mov
The subsequent retrieval of answers is much quicker:
questionmark-demo-2.mov
Note that it is an extractive question-answering model, so the answer is not generated but extracted from the given context. Therefore, it tends to be better with factoid questions instead of open-ended ones. A confidence score for the predicted answer is shown along with the resulting answer.
The model is loaded and run directly in the browser in a separate thread from the main thread, using a web worker.
Important
For now, the models have to be re-downloaded on each HMR reload instead of loading from browser cache because of an issue that seems to be occurring with bundlers, which may be fixed in Transformers.js V3.
From a given URL, the HTML string of a website is first sanitized using isomorphic-dompurify
, then the text content is parsed with Mozilla's Readability.js
. The result is the context of the question the user provides, both of which are passed as arguments to the QuestionAnsweringPipeline
.
A given text is first sanitized using isomorphic-dompurify
which is the context of the question the user provides, both of which are passed as arguments to the QuestionAnsweringPipeline
.
Clone the repository:
git clone [email protected]:rivea0/questionmark.git
cd
into it:
cd questionmark
Install dependencies:
npm install
Run the server:
npm run dev
MIT