API Reference
Complete API documentation for integrating with SageAI. Build custom interfaces, automation scripts, or integrate with your existing research tools.
Quick Facts
- Protocol: REST API over HTTP
- Format: JSON request/response
- Default Port: 8000
- API Version: v1
Authentication
Current Version: SageAI runs locally without authentication. All endpoints are accessible without credentials. For production deployments, consider adding authentication via a reverse proxy.
Base URL
All API endpoints are prefixed with:
http://localhost:8000/api/v1
💡 If you change the port in Docker Compose, update the base URL accordingly.
Query Papers
POSTPOST /api/v1/query
Submit a natural language question and receive an AI-generated answer with citations from your uploaded papers.
Request Body
{
"question": "What methodology was used in the transformer paper?",
"top_k": 5,
"paper_ids": [1, 3] // optional: limit to specific papers
}
Parameters
| Parameter | Type | Description |
|---|---|---|
question
*required
|
string | The question to ask about your papers |
top_k |
integer | Number of relevant chunks to retrieve (1-10, default: 5) |
paper_ids |
array | Limit search to specific paper IDs (omit for all papers) |
Response
{
"answer": "The transformer paper uses a self-attention mechanism...",
"citations": [
{
"paper_title": "Attention is All You Need",
"section": "Methodology",
"page": 3,
"relevance_score": 0.89,
"chunk_text": "..."
}
],
"sources_used": ["paper3_nlp_transformers.pdf"],
"confidence": 0.85,
"query_time_ms": 1250,
"cached": false
}
List All Papers
GETGET /api/v1/papers
Retrieve a list of all uploaded papers with metadata and processing status.
Response
{
"papers": [
{
"id": 1,
"title": "Attention is All You Need",
"filename": "transformer_paper.pdf",
"status": "indexed",
"chunk_count": 42,
"vector_count": 42,
"upload_date": "2025-10-31T10:30:00Z",
"indexed_date": "2025-10-31T10:31:15Z",
"file_size_mb": 2.4
}
],
"total": 1
}
Upload Paper
POSTPOST /api/v1/papers/upload
Upload a PDF research paper. The embedder service will extract text, chunk it, and create vector embeddings for semantic search.
Request
Content-Type:
multipart/form-data
Form Field: file (PDF file)
Example (cURL)
curl -X POST http://localhost:8000/api/v1/papers/upload \ -F "file=@/path/to/paper.pdf"
Response
{
"paper_id": 5,
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"filename": "bert_paper.pdf",
"status": "processing",
"message": "Paper uploaded successfully. Processing in background."
}
Delete Paper
DELETEDELETE /api/v1/papers/:id
Remove a paper and all associated data (chunks, vectors, metadata).
Response
{
"message": "Paper deleted successfully",
"paper_id": 5
}
Paper Statistics
GETGET /api/v1/papers/:id/stats
Get detailed statistics for a specific paper.
Response
{
"paper_id": 1,
"title": "Attention is All You Need",
"filename": "transformer_paper.pdf",
"status": "indexed",
"chunk_count": 42,
"vector_count": 42,
"upload_date": "2025-10-31T10:30:00Z",
"indexed_date": "2025-10-31T10:31:15Z",
"processing_time_ms": 75000,
"file_size_mb": 2.4,
"sections": ["Abstract", "Introduction", "Methods", "Results", "Conclusion"],
"query_count": 87 // how many times this paper was referenced
}
Query History
GETGET /api/v1/queries/history?limit=20&offset=0&include_answer=true
Retrieve paginated query history with optional full answers.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
limit |
integer | Results per page (default: 20) |
offset |
integer | Skip N results (default: 0) |
include_answer |
boolean | Include full answer text (default: false) |
Response
{
"queries": [
{
"query_id": "q_123",
"question": "What is self-attention?",
"timestamp": "2025-10-31T14:22:00Z",
"confidence": 0.92,
"rating": 5,
"query_time_ms": 1200,
"answer": "Self-attention is..." // only if include_answer=true
}
],
"total": 150,
"limit": 20,
"offset": 0
}
Rate Answer
PATCHPATCH /api/v1/queries/:id/rating
Submit a user rating (1-5 stars) for a query response.
Request Body
{
"rating": 4
}
Response
{
"message": "Rating saved successfully",
"query_id": "q_123",
"rating": 4
}
Popular Analytics
GETGET /api/v1/analytics/popular
Get aggregated data on most queried topics and most referenced papers.
Response
{
"top_questions": [
{ "question": "What is self-attention?", "count": 15 },
{ "question": "How does BERT work?", "count": 12 }
],
"top_papers": [
{ "paper_id": 1, "title": "Attention is All You Need", "reference_count": 45 },
{ "paper_id": 3, "title": "BERT Paper", "reference_count": 38 }
]
}
Health Checks
GETLiveness Check
GET /api/v1/health/healthz
Returns 200 if the API server is alive.
{ "status": "ok" }
Readiness Check
GET /api/v1/health/readyz
Returns 200 if the API is ready to serve requests. Checks all dependencies.
{
"status": "ready",
"dependencies": {
"mongo": { "status": "healthy", "response_time_ms": 5 },
"redis": { "status": "healthy", "response_time_ms": 2 },
"qdrant": { "status": "healthy", "response_time_ms": 10 },
"ollama": { "status": "healthy", "response_time_ms": 50 }
}
}
Error Codes
| Code | Status | Description |
|---|---|---|
400 |
Bad Request | Invalid parameters or malformed request |
404 |
Not Found | Paper or query ID doesn't exist |
413 |
Payload Too Large | PDF file exceeds size limit |
429 |
Too Many Requests | Rate limit exceeded |
500 |
Internal Server Error | Something went wrong on the server |
503 |
Service Unavailable | Dependent service (Ollama, Qdrant, etc.) is down |
Error Response Format
{
"error": "Invalid top_k value",
"message": "top_k must be between 1 and 10",
"code": 400
}
Rate Limits
Default Limits
- Window: 60 seconds
- Max Requests: 120 per window
- Configurable: Via RATE_LIMIT_* environment variables
When rate limit is exceeded, you'll receive:
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please try again in 30 seconds.",
"code": 429,
"retry_after": 30
}
Code Examples
JavaScript (Fetch)
// Query papers
async function queryPapers(question, topK = 5) {
const response = await fetch('http://localhost:8000/api/v1/query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question, top_k: topK })
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const data = await response.json();
console.log('Answer:', data.answer);
console.log('Confidence:', data.confidence);
return data;
}
// Usage
queryPapers('What is self-attention?')
.then(result => console.log(result))
.catch(error => console.error('Error:', error));
Python (Requests)
import requests
BASE_URL = "http://localhost:8000/api/v1"
# Query papers
def query_papers(question, top_k=5, paper_ids=None):
payload = {
"question": question,
"top_k": top_k
}
if paper_ids:
payload["paper_ids"] = paper_ids
response = requests.post(f"{BASE_URL}/query", json=payload)
response.raise_for_status()
return response.json()
# Upload paper
def upload_paper(file_path):
with open(file_path, 'rb') as f:
files = {'file': f}
response = requests.post(f"{BASE_URL}/papers/upload", files=files)
response.raise_for_status()
return response.json()
# Usage
result = query_papers("What is self-attention?")
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")
cURL
# Query papers
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"question": "What is self-attention?",
"top_k": 5
}'
# Upload paper
curl -X POST http://localhost:8000/api/v1/papers/upload \
-F "file=@/path/to/paper.pdf"
# Get query history
curl "http://localhost:8000/api/v1/queries/history?limit=10&include_answer=true"
# Health check
curl http://localhost:8000/api/v1/health/readyz
Ready to Integrate?
Start building with SageAI's powerful API. Questions? Check the docs or join our community.