SageAI Documentation

Welcome to SageAI, your private research assistant powered by state-of-the-art RAG (Retrieval-Augmented Generation) technology. This documentation will guide you through every feature and capability.

What is SageAI?

SageAI is a research paper assistant that runs entirely on your local machine. It combines vector search, natural language processing, and large language models to help you understand and analyze academic papers like never before.

100% Private: Your data never leaves your machine
Offline Capable: Works without internet after setup
Context-Aware: Understands relationships across papers
Citation-Backed: Every answer includes sources

Architecture

SageAI System Architecture

Backend Stack

Node.js + Express: REST API server
MongoDB: Paper metadata & query history
Redis: Response caching
Qdrant: Vector database for similarity search
Ollama: Local LLM inference

Frontend Stack

React 18: UI framework
TypeScript: Type safety
Vite: Build tool
ShadCN UI: Component library
Tailwind CSS: Styling

Embedder Service

Python + FastAPI: Async API
PyMuPDF: PDF parsing
fastembed: Text embeddings
Section Detection: Smart chunking

Infrastructure

Docker Compose: Orchestration
Health Checks: Service monitoring
Volume Persistence: Data retention
Network Isolation: Security

Core Features

Natural Language Chat

Ask questions in plain English. SageAI understands context, handles follow-ups, and provides detailed answers with confidence scores.

Example: "What methodology did the transformer paper use?" → Get a detailed answer with citations to specific sections.

Vector-Based Search

Uses semantic similarity, not just keywords. Finds relevant content even when papers use different terminology.

Top-K Selection: Control how many relevant chunks (1-10) to include in context.

Smart PDF Processing

Automatically extracts text, detects sections (Abstract, Introduction, Methods, etc.), and chunks intelligently to preserve context.

Supported: Multi-page PDFs, complex layouts, academic formatting.

Answer Rating System

Rate responses from 1-5 stars. Your feedback helps track quality and identify areas for improvement.

Confidence Scores: Every answer includes an AI-generated confidence level.

Chat Interface

How to Use

1 Type your question in the composer at the bottom of the page
2 Optionally adjust Top-K (relevance) and select specific papers
3 Press Enter or click Send to get your answer
4 View citations, rate the response, and ask follow-up questions

Top-K Selector

Controls how many relevant chunks to include (1-10). Higher values give more context but may dilute relevance.

Recommended: Start with 5, adjust based on answer quality.

Paper Filter

Limit search to specific papers. Useful when comparing or focusing on particular research.

Default: "All papers" searches across your entire library.

File Management

Uploading Papers

1. Click the "Upload PDF" button in the sidebar
2. Select your PDF file (up to 50MB recommended)
3. Wait for processing—the embedder extracts text and creates chunks
4. Once complete, the paper appears in your sidebar and is ready to query

Processing Time: ~30 seconds per paper (varies by length)

Managing Papers

Delete

Click the delete icon next to any paper. This removes the paper, its chunks, and all vector embeddings.

View Stats

See detailed stats in the "Paper Stats" page: chunk count, vector count, status, and indexed time.

Analytics & History

Query History

Access all your past queries with timestamps, confidence scores, and ratings. Click "View answer" to see the full response again.

Use Case: Review your research trail, find previous insights, or revisit interesting questions.

Analytics Dashboard

See aggregated insights about your research patterns:

Top Questions: Most frequently asked queries
Top Papers: Most referenced papers in answers
Usage Patterns: Query volume over time

Paper Statistics

Detailed view of each paper's processing status, chunk count, vector embeddings, and metadata. Useful for debugging or understanding your library composition.

Service Health Monitoring

The Service Health page provides real-time status of all system components. Each service is checked independently with color-coded status pills.

Healthy

Service is running and responding correctly. All systems operational.

Degraded

Service is responding slowly or partially. May impact performance.

Down

Service is not responding. Check logs or restart the container.

Unknown

Unable to determine status. May indicate network issues.

Monitored Services

Backend API (liveness & readiness)
MongoDB
Redis
Qdrant
Ollama
Embedder Service

Configuration

Most users won't need to change anything. The default Docker Compose configurations are production-ready and optimized for local use.

Docker Compose Variants

File	OS	Mode	Use Case
`docker-compose.linux.dev.yml`	Linux	Dev	Local development on Linux
`docker-compose.mac-win.dev.yml`	Mac/Win	Dev	Local development on macOS/Windows
`docker-compose.linux.prod.yml`	Linux	Prod	Production deployment on Linux
`docker-compose.mac-win.prod.yml`	Mac/Win	Prod	Production deployment on macOS/Windows

Environment Variables

These are pre-configured in the Docker Compose files. Advanced users can customize them.

Backend Variables

API_PORT=8000                    # Backend port
LOG_LEVEL=info                   # Logging verbosity
MONGO_URI=mongodb://mongo:27017  # MongoDB connection
QDRANT_URL=http://qdrant:6333    # Qdrant vector DB
REDIS_URL=redis://redis:6379     # Redis cache
OLLAMA_BASE_URL=http://...       # Ollama API endpoint
EMBEDDER_URL=http://embedder:9100 # Embedder service
RATE_LIMIT_WINDOW_MS=60000       # Rate limit window
RATE_LIMIT_MAX=120               # Max requests per window

Frontend Variables (Vite)

VITE_API_BASE=http://localhost:8000      # Backend API base
VITE_EMBEDDER_BASE=http://localhost:9100 # Embedder service base
VITE_SERVICE_BASE=http://localhost:8000  # Service health checks

Docker Setup Details

Included Services

MongoDB

Stores paper metadata, query history, and user ratings

Port: 27017 | Volume: mongo_data

Qdrant

Vector database for semantic similarity search

Port: 6333 | Volume: qdrant_data

Redis

Caches responses for faster repeat queries

Port: 6379 | In-memory

Embedder

Python service for PDF parsing and text embedding

Port: 9100 | Built from source

Backend

Node.js REST API orchestrating all services

Port: 8000 | Built from source

Frontend

React UI served via Nginx

Port: 8080 | Built from source

Useful Commands

Start all services (detached mode):

docker compose -f infra/docker-compose.linux.dev.yml up -d #Run this from project root

View logs:

docker compose -f infra/docker-compose.linux.dev.yml logs -f #Run this from the project root

Stop all services:

docker compose -f infra/docker-compose.linux.dev.yml down

Rebuild after code changes:

docker compose -f infra/docker-compose.linux.dev.yml up --build #Run this from project root

Troubleshooting

Ollama connection failed

Symptoms: Backend logs show "Ollama not ready" or connection errors.

Solutions:

Ensure Ollama is running: ollama serve
Check if accessible: curl http://localhost:11434
On Linux: Use docker-compose.linux.dev.yml (host networking)
On Mac/Windows: Make sure you're using host.docker.internal in the compose file

Port already in use

Symptoms: Docker fails to start, says port is already allocated.

Solutions:

Find what's using the port: lsof -i :8080 (Mac/Linux) or netstat -ano | findstr :8080 (Windows)
Kill the process or modify the port in your compose file
Common conflicts: Other dev servers, MongoDB instances, Redis

PDF upload fails or hangs

Symptoms: Upload progress bar stalls, or error message appears.

Solutions:

Check embedder service logs: docker logs rag_embedder
Ensure PDF is valid and not corrupted
Large PDFs (>50MB) may take longer—wait patiently
Check disk space: df -h

Slow or low-quality responses

Symptoms: Answers take forever, or quality is poor.

Solutions:

Try a smaller model: ollama pull llama3.2:1b
Adjust Top-K slider to fewer chunks (e.g., 3 instead of 10)
Check system resources: Ollama needs RAM (8GB+ recommended)
Clear Redis cache: docker exec rag_redis redis-cli FLUSHALL

Frontend shows "Cannot connect to backend"

Symptoms: UI loads but shows connection errors.

Solutions:

Check if backend is running: curl http://localhost:8000/health/healthz
Verify environment variables in frontend build (VITE_API_BASE)
Check CORS settings in backend if accessing from different domain
Rebuild frontend: docker compose ... up --build frontend

Frequently Asked Questions

Do I need internet after setup?

No! Once you've downloaded the Docker images and Ollama models, SageAI runs completely offline. Your data never leaves your machine.

Can I use different LLM models?

Yes! Pull any Ollama-compatible model and update the backend configuration. Popular choices: llama3, mistral, codellama.

How much disk space do I need?

Minimum 15GB free space. This includes Docker images (~5GB), Ollama models (~5GB), and room for your papers and vectors. More papers = more space needed.

Is my data persistent?

Yes! Docker volumes persist MongoDB (papers, queries) and Qdrant (vectors) data. Your data survives container restarts. To reset, use docker compose down -v.

Can I deploy this to a server?

Absolutely! Use the production compose files. Consider adding authentication, HTTPS (via reverse proxy like Nginx), and firewall rules. Perfect for lab servers or research group deployments.

What paper formats are supported?

Currently PDF only. The embedder uses PyMuPDF for robust text extraction. Works with most academic PDFs, including complex layouts and multi-column formats.

SageAI Documentation

What is SageAI?

Architecture

Backend Stack

Frontend Stack

Embedder Service

Infrastructure

Core Features

Natural Language Chat

Vector-Based Search

Smart PDF Processing

Answer Rating System

Chat Interface

How to Use

Top-K Selector

Paper Filter

File Management

Uploading Papers

Managing Papers

Delete

View Stats

Analytics & History

Query History

Analytics Dashboard

Paper Statistics

Service Health Monitoring

Healthy

Degraded

Down

Unknown

Monitored Services

Configuration

Docker Compose Variants

Environment Variables

Docker Setup Details

Included Services

MongoDB

Qdrant

Redis

Embedder

Backend

Frontend

Useful Commands

Troubleshooting

Frequently Asked Questions

Need More Help?