API Reference

Complete API documentation for integrating with SageAI. Build custom interfaces, automation scripts, or integrate with your existing research tools.

Quick Facts

Protocol: REST API over HTTP
Format: JSON request/response
Default Port: 8000
API Version: v1

Authentication

Current Version: SageAI runs locally without authentication. All endpoints are accessible without credentials. For production deployments, consider adding authentication via a reverse proxy.

Base URL

All API endpoints are prefixed with:

http://localhost:8000/api/v1

💡 If you change the port in Docker Compose, update the base URL accordingly.

Query Papers

POST

POST /api/v1/query

Submit a natural language question and receive an AI-generated answer with citations from your uploaded papers.

Request Body

{
  "question": "What methodology was used in the transformer paper?",
  "top_k": 5,
  "paper_ids": [1, 3]  // optional: limit to specific papers
}

Parameters

Parameter	Type	Description
`question` *required	string	The question to ask about your papers
`top_k`	integer	Number of relevant chunks to retrieve (1-10, default: 5)
`paper_ids`	array	Limit search to specific paper IDs (omit for all papers)

Response

{
  "answer": "The transformer paper uses a self-attention mechanism...",
  "citations": [
    {
      "paper_title": "Attention is All You Need",
      "section": "Methodology",
      "page": 3,
      "relevance_score": 0.89,
      "chunk_text": "..."
    }
  ],
  "sources_used": ["paper3_nlp_transformers.pdf"],
  "confidence": 0.85,
  "query_time_ms": 1250,
  "cached": false
}

List All Papers

GET

GET /api/v1/papers

Retrieve a list of all uploaded papers with metadata and processing status.

Response

{
  "papers": [
    {
      "id": 1,
      "title": "Attention is All You Need",
      "filename": "transformer_paper.pdf",
      "status": "indexed",
      "chunk_count": 42,
      "vector_count": 42,
      "upload_date": "2025-10-31T10:30:00Z",
      "indexed_date": "2025-10-31T10:31:15Z",
      "file_size_mb": 2.4
    }
  ],
  "total": 1
}

Upload Paper

POST

POST /api/v1/papers/upload

Upload a PDF research paper. The embedder service will extract text, chunk it, and create vector embeddings for semantic search.

Request

Content-Type: multipart/form-data
Form Field: file (PDF file)

Example (cURL)

curl -X POST http://localhost:8000/api/v1/papers/upload \
  -F "file=@/path/to/paper.pdf"

Response

{
  "paper_id": 5,
  "title": "BERT: Pre-training of Deep Bidirectional Transformers",
  "filename": "bert_paper.pdf",
  "status": "processing",
  "message": "Paper uploaded successfully. Processing in background."
}

Delete Paper

DELETE

DELETE /api/v1/papers/:id

Remove a paper and all associated data (chunks, vectors, metadata).

Response

{
  "message": "Paper deleted successfully",
  "paper_id": 5
}

Paper Statistics

GET

GET /api/v1/papers/:id/stats

Get detailed statistics for a specific paper.

Response

{
  "paper_id": 1,
  "title": "Attention is All You Need",
  "filename": "transformer_paper.pdf",
  "status": "indexed",
  "chunk_count": 42,
  "vector_count": 42,
  "upload_date": "2025-10-31T10:30:00Z",
  "indexed_date": "2025-10-31T10:31:15Z",
  "processing_time_ms": 75000,
  "file_size_mb": 2.4,
  "sections": ["Abstract", "Introduction", "Methods", "Results", "Conclusion"],
  "query_count": 87  // how many times this paper was referenced
}

Query History

GET

GET /api/v1/queries/history?limit=20&offset=0&include_answer=true

Retrieve paginated query history with optional full answers.

Query Parameters

Parameter	Type	Description
`limit`	integer	Results per page (default: 20)
`offset`	integer	Skip N results (default: 0)
`include_answer`	boolean	Include full answer text (default: false)

Response

{
  "queries": [
    {
      "query_id": "q_123",
      "question": "What is self-attention?",
      "timestamp": "2025-10-31T14:22:00Z",
      "confidence": 0.92,
      "rating": 5,
      "query_time_ms": 1200,
      "answer": "Self-attention is..."  // only if include_answer=true
    }
  ],
  "total": 150,
  "limit": 20,
  "offset": 0
}

Rate Answer

PATCH

PATCH /api/v1/queries/:id/rating

Submit a user rating (1-5 stars) for a query response.

Request Body

{
  "rating": 4
}

Response

{
  "message": "Rating saved successfully",
  "query_id": "q_123",
  "rating": 4
}

Popular Analytics

GET

GET /api/v1/analytics/popular

Get aggregated data on most queried topics and most referenced papers.

Response

{
  "top_questions": [
    { "question": "What is self-attention?", "count": 15 },
    { "question": "How does BERT work?", "count": 12 }
  ],
  "top_papers": [
    { "paper_id": 1, "title": "Attention is All You Need", "reference_count": 45 },
    { "paper_id": 3, "title": "BERT Paper", "reference_count": 38 }
  ]
}

Health Checks

GET

Liveness Check

GET /api/v1/health/healthz

Returns 200 if the API server is alive.

{ "status": "ok" }

Readiness Check

GET /api/v1/health/readyz

Returns 200 if the API is ready to serve requests. Checks all dependencies.

{
  "status": "ready",
  "dependencies": {
    "mongo": { "status": "healthy", "response_time_ms": 5 },
    "redis": { "status": "healthy", "response_time_ms": 2 },
    "qdrant": { "status": "healthy", "response_time_ms": 10 },
    "ollama": { "status": "healthy", "response_time_ms": 50 }
  }
}

Error Codes

Code	Status	Description
`400`	Bad Request	Invalid parameters or malformed request
`404`	Not Found	Paper or query ID doesn't exist
`413`	Payload Too Large	PDF file exceeds size limit
`429`	Too Many Requests	Rate limit exceeded
`500`	Internal Server Error	Something went wrong on the server
`503`	Service Unavailable	Dependent service (Ollama, Qdrant, etc.) is down

Error Response Format

{
  "error": "Invalid top_k value",
  "message": "top_k must be between 1 and 10",
  "code": 400
}

Rate Limits

Default Limits

Window: 60 seconds
Max Requests: 120 per window
Configurable: Via RATE_LIMIT_* environment variables

When rate limit is exceeded, you'll receive:

{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Please try again in 30 seconds.",
  "code": 429,
  "retry_after": 30
}

Code Examples

JavaScript (Fetch)

// Query papers
async function queryPapers(question, topK = 5) {
  const response = await fetch('http://localhost:8000/api/v1/query', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ question, top_k: topK })
  });
  
  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }
  
  const data = await response.json();
  console.log('Answer:', data.answer);
  console.log('Confidence:', data.confidence);
  return data;
}

// Usage
queryPapers('What is self-attention?')
  .then(result => console.log(result))
  .catch(error => console.error('Error:', error));

Python (Requests)

import requests

BASE_URL = "http://localhost:8000/api/v1"

# Query papers
def query_papers(question, top_k=5, paper_ids=None):
    payload = {
        "question": question,
        "top_k": top_k
    }
    if paper_ids:
        payload["paper_ids"] = paper_ids
    
    response = requests.post(f"{BASE_URL}/query", json=payload)
    response.raise_for_status()
    return response.json()

# Upload paper
def upload_paper(file_path):
    with open(file_path, 'rb') as f:
        files = {'file': f}
        response = requests.post(f"{BASE_URL}/papers/upload", files=files)
        response.raise_for_status()
        return response.json()

# Usage
result = query_papers("What is self-attention?")
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")

cURL

# Query papers
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is self-attention?",
    "top_k": 5
  }'

# Upload paper
curl -X POST http://localhost:8000/api/v1/papers/upload \
  -F "file=@/path/to/paper.pdf"

# Get query history
curl "http://localhost:8000/api/v1/queries/history?limit=10&include_answer=true"

# Health check
curl http://localhost:8000/api/v1/health/readyz

API Reference

Quick Facts

Authentication

Base URL

Query Papers

Request Body

Parameters

Response

List All Papers

Response

Upload Paper

Request

Example (cURL)

Response

Delete Paper

Response

Paper Statistics

Response

Query History

Query Parameters

Response

Rate Answer

Request Body

Response

Popular Analytics

Response

Health Checks

Liveness Check

Readiness Check

Error Codes

Error Response Format

Rate Limits

Default Limits

Code Examples

JavaScript (Fetch)

Python (Requests)

cURL

Ready to Integrate?