SageAI Documentation
Welcome to SageAI, your private research assistant powered by state-of-the-art RAG (Retrieval-Augmented Generation) technology. This documentation will guide you through every feature and capability.
What is SageAI?
SageAI is a research paper assistant that runs entirely on your local machine. It combines vector search, natural language processing, and large language models to help you understand and analyze academic papers like never before.
- 100% Private: Your data never leaves your machine
- Offline Capable: Works without internet after setup
- Context-Aware: Understands relationships across papers
- Citation-Backed: Every answer includes sources
Architecture
SageAI System Architecture
Backend Stack
- Node.js + Express: REST API server
- MongoDB: Paper metadata & query history
- Redis: Response caching
- Qdrant: Vector database for similarity search
- Ollama: Local LLM inference
Frontend Stack
- React 18: UI framework
- TypeScript: Type safety
- Vite: Build tool
- ShadCN UI: Component library
- Tailwind CSS: Styling
Embedder Service
- Python + FastAPI: Async API
- PyMuPDF: PDF parsing
- fastembed: Text embeddings
- Section Detection: Smart chunking
Infrastructure
- Docker Compose: Orchestration
- Health Checks: Service monitoring
- Volume Persistence: Data retention
- Network Isolation: Security
Core Features
Natural Language Chat
Ask questions in plain English. SageAI understands context, handles follow-ups, and provides detailed answers with confidence scores.
Vector-Based Search
Uses semantic similarity, not just keywords. Finds relevant content even when papers use different terminology.
Smart PDF Processing
Automatically extracts text, detects sections (Abstract, Introduction, Methods, etc.), and chunks intelligently to preserve context.
Answer Rating System
Rate responses from 1-5 stars. Your feedback helps track quality and identify areas for improvement.
Chat Interface
How to Use
- 1 Type your question in the composer at the bottom of the page
- 2 Optionally adjust Top-K (relevance) and select specific papers
- 3 Press Enter or click Send to get your answer
- 4 View citations, rate the response, and ask follow-up questions
Top-K Selector
Controls how many relevant chunks to include (1-10). Higher values give more context but may dilute relevance.
Paper Filter
Limit search to specific papers. Useful when comparing or focusing on particular research.
File Management
Uploading Papers
- 1. Click the "Upload PDF" button in the sidebar
- 2. Select your PDF file (up to 50MB recommended)
- 3. Wait for processing—the embedder extracts text and creates chunks
- 4. Once complete, the paper appears in your sidebar and is ready to query
Processing Time: ~30 seconds per paper (varies by length)
Managing Papers
Delete
Click the delete icon next to any paper. This removes the paper, its chunks, and all vector embeddings.
View Stats
See detailed stats in the "Paper Stats" page: chunk count, vector count, status, and indexed time.
Analytics & History
Query History
Access all your past queries with timestamps, confidence scores, and ratings. Click "View answer" to see the full response again.
Analytics Dashboard
See aggregated insights about your research patterns:
- Top Questions: Most frequently asked queries
- Top Papers: Most referenced papers in answers
- Usage Patterns: Query volume over time
Paper Statistics
Detailed view of each paper's processing status, chunk count, vector embeddings, and metadata. Useful for debugging or understanding your library composition.
Service Health Monitoring
The Service Health page provides real-time status of all system components. Each service is checked independently with color-coded status pills.
Healthy
Service is running and responding correctly. All systems operational.
Degraded
Service is responding slowly or partially. May impact performance.
Down
Service is not responding. Check logs or restart the container.
Unknown
Unable to determine status. May indicate network issues.
Monitored Services
- Backend API (liveness & readiness)
- MongoDB
- Redis
- Qdrant
- Ollama
- Embedder Service
Configuration
Most users won't need to change anything. The default Docker Compose configurations are production-ready and optimized for local use.
Docker Compose Variants
| File | OS | Mode | Use Case |
|---|---|---|---|
docker-compose.linux.dev.yml
|
Linux | Dev | Local development on Linux |
docker-compose.mac-win.dev.yml
|
Mac/Win | Dev | Local development on macOS/Windows |
docker-compose.linux.prod.yml
|
Linux | Prod | Production deployment on Linux |
docker-compose.mac-win.prod.yml
|
Mac/Win | Prod | Production deployment on macOS/Windows |
Environment Variables
These are pre-configured in the Docker Compose files. Advanced users can customize them.
Backend Variables
API_PORT=8000 # Backend port LOG_LEVEL=info # Logging verbosity MONGO_URI=mongodb://mongo:27017 # MongoDB connection QDRANT_URL=http://qdrant:6333 # Qdrant vector DB REDIS_URL=redis://redis:6379 # Redis cache OLLAMA_BASE_URL=http://... # Ollama API endpoint EMBEDDER_URL=http://embedder:9100 # Embedder service RATE_LIMIT_WINDOW_MS=60000 # Rate limit window RATE_LIMIT_MAX=120 # Max requests per window
Frontend Variables (Vite)
VITE_API_BASE=http://localhost:8000 # Backend API base VITE_EMBEDDER_BASE=http://localhost:9100 # Embedder service base VITE_SERVICE_BASE=http://localhost:8000 # Service health checks
Docker Setup Details
Included Services
MongoDB
Stores paper metadata, query history, and user ratings
Port: 27017 | Volume: mongo_data
Qdrant
Vector database for semantic similarity search
Port: 6333 | Volume: qdrant_data
Redis
Caches responses for faster repeat queries
Port: 6379 | In-memory
Embedder
Python service for PDF parsing and text embedding
Port: 9100 | Built from source
Backend
Node.js REST API orchestrating all services
Port: 8000 | Built from source
Frontend
React UI served via Nginx
Port: 8080 | Built from source
Useful Commands
Start all services (detached mode):
docker compose -f infra/docker-compose.linux.dev.yml up -d #Run this from project root
View logs:
docker compose -f infra/docker-compose.linux.dev.yml logs -f #Run this from the project root
Stop all services:
docker compose -f infra/docker-compose.linux.dev.yml down
Rebuild after code changes:
docker compose -f infra/docker-compose.linux.dev.yml up --build #Run this from project root
Troubleshooting
Ollama connection failed
Symptoms: Backend logs show "Ollama not ready" or connection errors.
Solutions:
-
Ensure Ollama is running:
ollama serve -
Check if accessible:
curl http://localhost:11434 -
On Linux: Use
docker-compose.linux.dev.yml(host networking) -
On Mac/Windows: Make sure you're using
host.docker.internalin the compose file
Port already in use
Symptoms: Docker fails to start, says port is already allocated.
Solutions:
-
Find what's using the port:
lsof -i :8080(Mac/Linux) ornetstat -ano | findstr :8080(Windows) - Kill the process or modify the port in your compose file
- Common conflicts: Other dev servers, MongoDB instances, Redis
PDF upload fails or hangs
Symptoms: Upload progress bar stalls, or error message appears.
Solutions:
-
Check embedder service logs:
docker logs rag_embedder - Ensure PDF is valid and not corrupted
- Large PDFs (>50MB) may take longer—wait patiently
-
Check disk space:
df -h
Slow or low-quality responses
Symptoms: Answers take forever, or quality is poor.
Solutions:
-
Try a smaller model:
ollama pull llama3.2:1b - Adjust Top-K slider to fewer chunks (e.g., 3 instead of 10)
- Check system resources: Ollama needs RAM (8GB+ recommended)
-
Clear Redis cache:
docker exec rag_redis redis-cli FLUSHALL
Frontend shows "Cannot connect to backend"
Symptoms: UI loads but shows connection errors.
Solutions:
-
Check if backend is running:
curl http://localhost:8000/health/healthz - Verify environment variables in frontend build (VITE_API_BASE)
- Check CORS settings in backend if accessing from different domain
-
Rebuild frontend:
docker compose ... up --build frontend
Frequently Asked Questions
Do I need internet after setup?
No! Once you've downloaded the Docker images and Ollama models, SageAI runs completely offline. Your data never leaves your machine.
Can I use different LLM models?
Yes! Pull any Ollama-compatible model and update the
backend configuration. Popular choices:
llama3,
mistral,
codellama.
How much disk space do I need?
Minimum 15GB free space. This includes Docker images (~5GB), Ollama models (~5GB), and room for your papers and vectors. More papers = more space needed.
Is my data persistent?
Yes! Docker volumes persist MongoDB (papers, queries) and
Qdrant (vectors) data. Your data survives container
restarts. To reset, use
docker compose down -v.
Can I deploy this to a server?
Absolutely! Use the production compose files. Consider adding authentication, HTTPS (via reverse proxy like Nginx), and firewall rules. Perfect for lab servers or research group deployments.
What paper formats are supported?
Currently PDF only. The embedder uses PyMuPDF for robust text extraction. Works with most academic PDFs, including complex layouts and multi-column formats.
Need More Help?
Join our community or check out the API reference for integration details.