Skip to content

Production-ready RAG for document Q&A. Pre-vectorized Clusters, Query with AI (Gemini/GPT-4o), and get answers with source citations.

Notifications You must be signed in to change notification settings

dotpmm/askbookie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title emoji colorFrom colorTo sdk pinned app_port
AskBookie API
🔥
blue
purple
docker
false
7860

AskBookie API

Production-grade retrieval-augmented generation service for document-based question answering. The system operates on pre-vectorised document clusters stored in Qdrant, with semantic retrieval feeding into instruction-tuned language model inference.

Base URL: https://pmmdot-askbookie.hf.space
Interactive Documentation: /docs (Swagger UI) | /redoc (ReDoc)


Table of Contents

  1. Authentication
  2. Rate Limits
  3. Core Endpoints
  4. Frontend Integration Guide
  5. System Endpoints
  6. Admin Endpoints
  7. Error Handling

Technical Stack

Component Technology
Web Framework FastAPI
Vector Database Qdrant Cloud
Embedding Model HuggingFace gte-modernbert-base
Language Models Gemini 3 Flash/Pro, GPT-4o-mini, Claude-3-Haiku
RAG Orchestration LangChain
Metadata Storage MongoDB Atlas
PDF Processing PyPDFLoader

Available Models

Model ID Name Description
1 Gemini-3-flash Gemini Primary API Key
2 Gemini-3-flash (Back-up) Gemini Secondary API Key
3 Gemini-3-Pro Gemini Primary API Key
4 GPT-4o-mini DuckDuckGo (Free)
5 Claude-3-Haiku DuckDuckGo (Free)

Authentication

All endpoints except /health and / require HMAC-SHA256 request signing.

Required Headers

Header Description
X-API-Key-Id Unique identifier for your API key
X-API-Timestamp Current Unix timestamp (seconds)
X-API-Signature HMAC-SHA256 signature of the request

Signature Construction

The signature message follows the format:

{timestamp}\n{HTTP_METHOD}\n{path}

JavaScript Implementation:

async function generateAuthHeaders(method, path, keyId, secret) {
    const timestamp = Math.floor(Date.now() / 1000).toString();
    const message = `${timestamp}\n${method.toUpperCase()}\n${path}`;
    
    const encoder = new TextEncoder();
    const key = await crypto.subtle.importKey(
        'raw', encoder.encode(secret),
        { name: 'HMAC', hash: 'SHA-256' }, false, ['sign']
    );
    const sig = await crypto.subtle.sign('HMAC', key, encoder.encode(message));
    const signature = Array.from(new Uint8Array(sig))
        .map(b => b.toString(16).padStart(2, '0')).join('');
    
    return {
        'X-API-Key-Id': keyId,
        'X-API-Timestamp': timestamp,
        'X-API-Signature': signature
    };
}

Security Constraints

  • Timestamp tolerance: 300 seconds (5 minutes)
  • Failed auth lockout: 5 attempts per IP (5-minute window)
  • Constant-time signature comparison (timing attack prevention)

Rate Limits

Endpoint Limit Window
/ask 30 requests 60 seconds
/upload 2 requests 60 seconds
All other endpoints 50 requests 60 seconds

When rate limited, responses include Retry-After: 60 header.


Core Endpoints

POST /ask

Query documents using semantic retrieval + LLM synthesis.

Important

Two query modes exist:

  1. Standard Mode: Query pre-indexed university materials using subject + unit
  2. Custom Upload Mode: Query user-uploaded PDFs using cluster (returned from /upload)

Request Schema

Field Type Required Constraints Description
query string Yes 1-1000 chars Natural language question
subject string Conditional 1-100 chars, alphanumeric + _- Subject collection (e.g., evs, physics)
unit integer Conditional 1-4 Unit number within the subject
cluster string Conditional max 100 chars Temp cluster from /upload response
context_limit integer No 1-20, default 5 Number of context chunks

Warning

Mutual Exclusivity: Either provide cluster OR provide BOTH subject AND unit. Never mix them.

Example 1: Standard Mode (Pre-indexed Materials)

POST /ask HTTP/1.1
Content-Type: application/json

{
    "query": "What are the different types of ecosystems?",
    "subject": "evs",
    "unit": 2,
    "context_limit": 5
}

Example 2: Custom Upload Mode (User PDF)

POST /ask HTTP/1.1
Content-Type: application/json

{
    "query": "Summarize the main findings",
    "cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}

Response (200 OK)

{
    "answer": "Ecosystems are classified into terrestrial and aquatic...",
    "sources": [
        "evs_chapter3.pdf: Slide 12",
        "evs_chapter3.pdf: Slide 15"
    ],
    "collection": "askbookie_evs_unit-2",
    "request_id": "a1b2c3d4e5f6g7h8"
}
Field Description
answer LLM-generated response (Markdown formatted, LaTeX supported)
sources List of source references: "filename: Slide N"
collection The Qdrant collection queried
request_id Unique identifier for debugging

POST /upload

Upload a PDF document for custom RAG queries. Processing is asynchronous.

Request

POST /upload HTTP/1.1
Content-Type: multipart/form-data

file: [binary PDF data]
Field Type Required Constraints
file binary Yes PDF only, max 10MB, must start with %PDF magic bytes

Response (200 OK)

{
    "job_id": "a1b2c3d4e5f6g7h8i9j0k1l2",
    "status": "queued",
    "filename": "my_notes.pdf",
    "size": 2457600,
    "temp_cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}

Important

Critical fields for frontend:

  • job_id: Use this to poll /jobs/{job_id} for processing status
  • temp_cluster: SAVE THIS! Use it in /ask requests to query this PDF

Processing Pipeline

  1. Validation: MIME type, magic bytes, size check
  2. Chunking: Split by page boundaries with context overlap
  3. Embedding: Vectorize using gte-modernbert-base
  4. Storage: Upsert to Qdrant under temp_cluster collection

Job Status Values

Status Description
queued Accepted, awaiting processing
processing Currently being chunked/embedded
done Ready for queries
failed Check error field for details

GET /jobs/{job_id}

Poll the status of a PDF processing job.

{
    "job_id": "a1b2c3d4e5f6g7h8i9j0k1l2",
    "status": "done",
    "temp_cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2",
    "filename": "my_notes.pdf",
    "error": null
}

GET /jobs

List all jobs for the authenticated API key.


Frontend Integration Guide

Important

This section provides implementation guidance for frontend developers.

Chat Session State Model

interface ChatSession {
    // User selection (standard mode)
    subject: string | null;      // e.g., "evs", "physics"
    unit: number | null;         // 1-4
    
    // Custom upload (custom mode)  
    tempCluster: string | null;  // From /upload response
    uploadJobId: string | null;  // For status polling
    
    // Mode lock
    isCustomMode: boolean;       // Once PDF uploaded, lock to custom mode
}

Flow 1: Standard Query (Pre-indexed Materials)

┌─────────────────────────────────────────────────────────┐
│  User selects Subject: [EVS ▼] and Unit: [2 ▼]          │
│  ─────────────────────────────────────────────────────  │
│  User types: "What are ecosystem types?"                │
│                                                         │
│  → POST /ask { query, subject: "evs", unit: 2 }         │
│  ← Response with answer + sources                       │
└─────────────────────────────────────────────────────────┘

Flow 2: Custom PDF Upload

┌─────────────────────────────────────────────────────────┐
│  Step 1: User uploads PDF                               │
│  ─────────────────────────────────────────────────────  │
│  → POST /upload (multipart/form-data)                   │
│  ← { job_id, temp_cluster, status: "queued" }           │
│                                                         │
│  ⚠️  SAVE: temp_cluster = "temp_abc123..."              │
│  ⚠️  LOCK: subject/unit dropdowns (disable them)        │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│  Step 2: Poll for completion                            │
│  ─────────────────────────────────────────────────────  │
│  Loop every 2-3 seconds:                                │
│  → GET /jobs/{job_id}                                   │
│  ← { status: "processing" | "done" | "failed" }         │
│                                                         │
│  When status === "done": Enable chat input              │
│  When status === "failed": Show error, unlock dropdowns │
└─────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────┐
│  Step 3: Query the uploaded PDF                         │
│  ─────────────────────────────────────────────────────  │
│  User types: "Summarize the main points"                │
│                                                         │
│  → POST /ask { query, cluster: "temp_abc123..." }       │
│  ← Response with answer + sources from their PDF        │
│                                                         │
│  ⚠️  Keep using the same temp_cluster for all queries   │
│      in this chat session                               │
└─────────────────────────────────────────────────────────┘

UI State Logic

// When user uploads a PDF
async function handlePdfUpload(file: File) {
    const formData = new FormData();
    formData.append('file', file);
    
    const response = await fetch('/upload', {
        method: 'POST',
        headers: generateAuthHeaders('POST', '/upload'),
        body: formData
    });
    const data = await response.json();
    
    // CRITICAL: Store these values in session state
    session.tempCluster = data.temp_cluster;  // ← SAVE THIS
    session.uploadJobId = data.job_id;
    session.isCustomMode = true;              // ← LOCK MODE
    
    // Disable subject/unit dropdowns in UI
    disableSubjectUnitSelectors();
    
    // Start polling
    pollJobStatus(data.job_id);
}

// When sending a query
async function sendQuery(query: string) {
    let payload;
    
    if (session.isCustomMode && session.tempCluster) {
        // Custom mode: use cluster
        payload = {
            query: query,
            cluster: session.tempCluster  // ← USE STORED VALUE
        };
    } else {
        // Standard mode: use subject + unit
        payload = {
            query: query,
            subject: session.subject,
            unit: session.unit
        };
    }
    
    const response = await fetch('/ask', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            ...generateAuthHeaders('POST', '/ask')
        },
        body: JSON.stringify(payload)
    });
    
    return await response.json();
}

Subject/Unit Locking Rules

Scenario Subject Dropdown Unit Dropdown Upload Button
Fresh chat session ✅ Enabled ✅ Enabled ✅ Enabled
After selecting subject/unit ✅ Enabled (can change) ✅ Enabled ✅ Enabled
After uploading PDF Disabled Disabled ❌ Disabled
After upload fails ✅ Re-enabled ✅ Re-enabled ✅ Re-enabled
New chat started ✅ Enabled (reset) ✅ Enabled ✅ Enabled

Caution

Once a user uploads a PDF in a chat session, ALL subsequent queries in that session MUST use the cluster parameter, not subject/unit. The temp_cluster is tied to their uploaded document.

Available Subjects & Units

Subject Units Available Collection Pattern
evs 1, 2, 3, 4 askbookie_evs_unit-{N}
physics 1, 2, 3, 4 askbookie_physics_unit-{N}
other subjects 1-4 askbookie_{subject}_unit-{N}

Answer Formatting

Answers are returned in Markdown with LaTeX support:

  • Inline math: $E = mc^2$
  • Block math: $$\int_0^1 x^2 dx$$

Use a Markdown renderer with KaTeX/MathJax integration.


System Endpoints

GET /health

Service health check. No authentication required.

{
    "status": "healthy",
    "uptime_hours": 48.5,
    "current_model": {
        "model_id": 1,
        "name": "Gemini-3-flash",
        "description": "Gemini Primary API Key"
    }
}

GET /

Returns dashboard HTML or service metadata.


Admin Endpoints

Note

All admin endpoints require the admin API key.

GET /history

Paginated query history across all users.

GET /admin/keys

List all API keys with status.

POST /admin/keys/{key_id}/enable

Re-enable a disabled key.

POST /admin/keys/{key_id}/disable

Disable an API key (cannot disable admin).

GET /admin/models/current

Get current active model.

POST /admin/models/switch

{ "model_id": 2 }

Switch to a different LLM backend (1-5).


Error Handling

All errors return:

{ "detail": "Error description" }
Code Meaning
400 Bad Request - Missing/invalid parameters
401 Unauthorized - Invalid signature or expired key
403 Forbidden - Admin endpoint accessed with non-admin key
404 Not Found - Job doesn't exist or wrong owner
413 Payload Too Large - PDF > 10MB or JSON > 16KB
429 Rate Limited - See Retry-After header
500 Internal Error - RAG pipeline failure

Special 429 Cases

Detail Message Cause Frontend Action
"Rate limit exceeded" Too many requests Wait 60s, show countdown
"Too many concurrent uploads" 3+ uploads in progress Wait for pending jobs
"LLM quota exhausted" Model API limit hit Retry in 1hr or notify user
"Too many failed attempts" Auth lockout Wait 5 minutes

Quick Reference: /ask Request Bodies

Standard Mode:

{
    "query": "Your question here",
    "subject": "evs",
    "unit": 2
}

Custom Upload Mode:

{
    "query": "Your question here", 
    "cluster": "temp_a1b2c3d4e5f6g7h8i9j0k1l2"
}

❌ Invalid (mixing modes):

{
    "query": "question",
    "subject": "evs",
    "cluster": "temp_..."
}

About

Production-ready RAG for document Q&A. Pre-vectorized Clusters, Query with AI (Gemini/GPT-4o), and get answers with source citations.

Topics

Resources

Stars

Watchers

Forks