| title | emoji | colorFrom | colorTo | sdk | pinned | app_port |
|---|---|---|---|---|---|---|
AskBookie API |
🔥 |
blue |
purple |
docker |
false |
7860 |
Production-grade retrieval-augmented generation service for document-based question answering. The system operates on pre-vectorised document clusters stored in Qdrant, with semantic retrieval feeding into instruction-tuned language model inference.
Base URL: https://pmmdot-askbookie.hf.space
Interactive Documentation: /docs (Swagger UI) | /redoc (ReDoc)
- Authentication
- Rate Limits
- Core Endpoints
- Frontend Integration Guide
- System Endpoints
- Admin Endpoints
- Error Handling
| Component | Technology |
|---|---|
| Web Framework | FastAPI |
| Vector Database | Qdrant Cloud |
| Embedding Model | HuggingFace gte-modernbert-base |
| Language Models | Gemini 3 Flash/Pro, GPT-4o-mini, Claude-3-Haiku |
| RAG Orchestration | LangChain |
| Metadata Storage | MongoDB Atlas |
| PDF Processing | PyPDFLoader |
| Model ID | Name | Description |
|---|---|---|
| 1 | Gemini-3-flash | Gemini Primary API Key |
| 2 | Gemini-3-flash (Back-up) | Gemini Secondary API Key |
| 3 | Gemini-3-Pro | Gemini Primary API Key |
| 4 | GPT-4o-mini | DuckDuckGo (Free) |
| 5 | Claude-3-Haiku | DuckDuckGo (Free) |
All endpoints except /health and / require HMAC-SHA256 request signing.
| Header | Description |
|---|---|
X-API-Key-Id |
Unique identifier for your API key |
X-API-Timestamp |
Current Unix timestamp (seconds) |
X-API-Signature |
HMAC-SHA256 signature of the request |
The signature message follows the format:
{timestamp}\n{HTTP_METHOD}\n{path}
JavaScript Implementation:
async function generateAuthHeaders(method, path, keyId, secret) {
const timestamp = Math.floor(Date.now() / 1000).toString();
const message = `${timestamp}\n${method.toUpperCase()}\n${path}`;
const encoder = new TextEncoder();
const key = await crypto.subtle.importKey(
'raw', encoder.encode(secret),
{ name: 'HMAC', hash: 'SHA-256' }, false, ['sign']
);
const sig = await crypto.subtle.sign('HMAC', key, encoder.encode(message));
const signature = Array.from(new Uint8Array(sig))
.map(b => b.toString(16).padStart(2, '0')).join('');
return {
'X-API-Key-Id': keyId,
'X-API-Timestamp': timestamp,
'X-API-Signature': signature
};
}- Timestamp tolerance: 300 seconds (5 minutes)
- Failed auth lockout: 5 attempts per IP (5-minute window)
- Constant-time signature comparison (timing attack prevention)
| Endpoint | Limit | Window |
|---|---|---|
/ask |
30 requests | 60 seconds |
/upload |
2 requests | 60 seconds |
| All other endpoints | 50 requests | 60 seconds |
When rate limited, responses include Retry-After: 60 header.
Query documents using semantic retrieval + LLM synthesis.
Important
Two query modes exist:
- Standard Mode: Query pre-indexed university materials using
subject+unit - Custom Upload Mode: Query user-uploaded PDFs using
cluster(returned from/upload)
| Field | Type | Required | Constraints | Description |
|---|---|---|---|---|
query |
string | Yes | 1-1000 chars | Natural language question |
subject |
string | Conditional | 1-100 chars, alphanumeric + _- |
Subject collection (e.g., evs, physics) |
unit |
integer | Conditional | 1-4 | Unit number within the subject |
cluster |
string | Conditional | max 100 chars | Temp cluster from /upload response |
context_limit |
integer | No | 1-20, default 5 | Number of context chunks |
Warning
Mutual Exclusivity: Either provide cluster OR provide BOTH subject AND unit. Never mix them.
POST /ask HTTP/1.1
Content-Type: application/json
{
"query": "What are the different types of ecosystems?",
"subject": "evs",
"unit": 2,
"context_limit": 5
}POST /ask HTTP/1.1
Content-Type: application/json
{
"query": "Summarize the main findings",
"cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}{
"answer": "Ecosystems are classified into terrestrial and aquatic...",
"sources": [
"evs_chapter3.pdf: Slide 12",
"evs_chapter3.pdf: Slide 15"
],
"collection": "askbookie_evs_unit-2",
"request_id": "a1b2c3d4e5f6g7h8"
}| Field | Description |
|---|---|
answer |
LLM-generated response (Markdown formatted, LaTeX supported) |
sources |
List of source references: "filename: Slide N" |
collection |
The Qdrant collection queried |
request_id |
Unique identifier for debugging |
Upload a PDF document for custom RAG queries. Processing is asynchronous.
POST /upload HTTP/1.1
Content-Type: multipart/form-data
file: [binary PDF data]| Field | Type | Required | Constraints |
|---|---|---|---|
file |
binary | Yes | PDF only, max 10MB, must start with %PDF magic bytes |
{
"job_id": "a1b2c3d4e5f6g7h8i9j0k1l2",
"status": "queued",
"filename": "my_notes.pdf",
"size": 2457600,
"temp_cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}Important
Critical fields for frontend:
job_id: Use this to poll/jobs/{job_id}for processing statustemp_cluster: SAVE THIS! Use it in/askrequests to query this PDF
- Validation: MIME type, magic bytes, size check
- Chunking: Split by page boundaries with context overlap
- Embedding: Vectorize using
gte-modernbert-base - Storage: Upsert to Qdrant under
temp_clustercollection
| Status | Description |
|---|---|
queued |
Accepted, awaiting processing |
processing |
Currently being chunked/embedded |
done |
Ready for queries |
failed |
Check error field for details |
Poll the status of a PDF processing job.
{
"job_id": "a1b2c3d4e5f6g7h8i9j0k1l2",
"status": "done",
"temp_cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2",
"filename": "my_notes.pdf",
"error": null
}List all jobs for the authenticated API key.
Important
This section provides implementation guidance for frontend developers.
interface ChatSession {
// User selection (standard mode)
subject: string | null; // e.g., "evs", "physics"
unit: number | null; // 1-4
// Custom upload (custom mode)
tempCluster: string | null; // From /upload response
uploadJobId: string | null; // For status polling
// Mode lock
isCustomMode: boolean; // Once PDF uploaded, lock to custom mode
}┌─────────────────────────────────────────────────────────┐
│ User selects Subject: [EVS ▼] and Unit: [2 ▼] │
│ ───────────────────────────────────────────────────── │
│ User types: "What are ecosystem types?" │
│ │
│ → POST /ask { query, subject: "evs", unit: 2 } │
│ ← Response with answer + sources │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Step 1: User uploads PDF │
│ ───────────────────────────────────────────────────── │
│ → POST /upload (multipart/form-data) │
│ ← { job_id, temp_cluster, status: "queued" } │
│ │
│ ⚠️ SAVE: temp_cluster = "temp_abc123..." │
│ ⚠️ LOCK: subject/unit dropdowns (disable them) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Step 2: Poll for completion │
│ ───────────────────────────────────────────────────── │
│ Loop every 2-3 seconds: │
│ → GET /jobs/{job_id} │
│ ← { status: "processing" | "done" | "failed" } │
│ │
│ When status === "done": Enable chat input │
│ When status === "failed": Show error, unlock dropdowns │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Step 3: Query the uploaded PDF │
│ ───────────────────────────────────────────────────── │
│ User types: "Summarize the main points" │
│ │
│ → POST /ask { query, cluster: "temp_abc123..." } │
│ ← Response with answer + sources from their PDF │
│ │
│ ⚠️ Keep using the same temp_cluster for all queries │
│ in this chat session │
└─────────────────────────────────────────────────────────┘
// When user uploads a PDF
async function handlePdfUpload(file: File) {
const formData = new FormData();
formData.append('file', file);
const response = await fetch('/upload', {
method: 'POST',
headers: generateAuthHeaders('POST', '/upload'),
body: formData
});
const data = await response.json();
// CRITICAL: Store these values in session state
session.tempCluster = data.temp_cluster; // ← SAVE THIS
session.uploadJobId = data.job_id;
session.isCustomMode = true; // ← LOCK MODE
// Disable subject/unit dropdowns in UI
disableSubjectUnitSelectors();
// Start polling
pollJobStatus(data.job_id);
}
// When sending a query
async function sendQuery(query: string) {
let payload;
if (session.isCustomMode && session.tempCluster) {
// Custom mode: use cluster
payload = {
query: query,
cluster: session.tempCluster // ← USE STORED VALUE
};
} else {
// Standard mode: use subject + unit
payload = {
query: query,
subject: session.subject,
unit: session.unit
};
}
const response = await fetch('/ask', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...generateAuthHeaders('POST', '/ask')
},
body: JSON.stringify(payload)
});
return await response.json();
}| Scenario | Subject Dropdown | Unit Dropdown | Upload Button |
|---|---|---|---|
| Fresh chat session | ✅ Enabled | ✅ Enabled | ✅ Enabled |
| After selecting subject/unit | ✅ Enabled (can change) | ✅ Enabled | ✅ Enabled |
| After uploading PDF | ❌ Disabled | ❌ Disabled | ❌ Disabled |
| After upload fails | ✅ Re-enabled | ✅ Re-enabled | ✅ Re-enabled |
| New chat started | ✅ Enabled (reset) | ✅ Enabled | ✅ Enabled |
Caution
Once a user uploads a PDF in a chat session, ALL subsequent queries in that session MUST use the cluster parameter, not subject/unit. The temp_cluster is tied to their uploaded document.
| Subject | Units Available | Collection Pattern |
|---|---|---|
evs |
1, 2, 3, 4 | askbookie_evs_unit-{N} |
physics |
1, 2, 3, 4 | askbookie_physics_unit-{N} |
| other subjects | 1-4 | askbookie_{subject}_unit-{N} |
Answers are returned in Markdown with LaTeX support:
- Inline math:
$E = mc^2$ - Block math:
$$\int_0^1 x^2 dx$$
Use a Markdown renderer with KaTeX/MathJax integration.
Service health check. No authentication required.
{
"status": "healthy",
"uptime_hours": 48.5,
"current_model": {
"model_id": 1,
"name": "Gemini-3-flash",
"description": "Gemini Primary API Key"
}
}Returns dashboard HTML or service metadata.
Note
All admin endpoints require the admin API key.
Paginated query history across all users.
List all API keys with status.
Re-enable a disabled key.
Disable an API key (cannot disable admin).
Get current active model.
{ "model_id": 2 }Switch to a different LLM backend (1-5).
All errors return:
{ "detail": "Error description" }| Code | Meaning |
|---|---|
| 400 | Bad Request - Missing/invalid parameters |
| 401 | Unauthorized - Invalid signature or expired key |
| 403 | Forbidden - Admin endpoint accessed with non-admin key |
| 404 | Not Found - Job doesn't exist or wrong owner |
| 413 | Payload Too Large - PDF > 10MB or JSON > 16KB |
| 429 | Rate Limited - See Retry-After header |
| 500 | Internal Error - RAG pipeline failure |
| Detail Message | Cause | Frontend Action |
|---|---|---|
"Rate limit exceeded" |
Too many requests | Wait 60s, show countdown |
"Too many concurrent uploads" |
3+ uploads in progress | Wait for pending jobs |
"LLM quota exhausted" |
Model API limit hit | Retry in 1hr or notify user |
"Too many failed attempts" |
Auth lockout | Wait 5 minutes |
Standard Mode:
{
"query": "Your question here",
"subject": "evs",
"unit": 2
}Custom Upload Mode:
{
"query": "Your question here",
"cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}❌ Invalid (mixing modes):
{
"query": "question",
"subject": "evs",
"cluster": "temp_..."
}