SageAI Documentation

Welcome to SageAI, your private research assistant powered by state-of-the-art RAG (Retrieval-Augmented Generation) technology. This documentation will guide you through every feature and capability.

What is SageAI?

SageAI is a research paper assistant that runs entirely on your local machine. It combines vector search, natural language processing, and large language models to help you understand and analyze academic papers like never before.

  • 100% Private: Your data never leaves your machine
  • Offline Capable: Works without internet after setup
  • Context-Aware: Understands relationships across papers
  • Citation-Backed: Every answer includes sources

Architecture

SageAI Architecture

SageAI System Architecture

Backend Stack

  • Node.js + Express: REST API server
  • MongoDB: Paper metadata & query history
  • Redis: Response caching
  • Qdrant: Vector database for similarity search
  • Ollama: Local LLM inference

Frontend Stack

  • React 18: UI framework
  • TypeScript: Type safety
  • Vite: Build tool
  • ShadCN UI: Component library
  • Tailwind CSS: Styling

Embedder Service

  • Python + FastAPI: Async API
  • PyMuPDF: PDF parsing
  • fastembed: Text embeddings
  • Section Detection: Smart chunking

Infrastructure

  • Docker Compose: Orchestration
  • Health Checks: Service monitoring
  • Volume Persistence: Data retention
  • Network Isolation: Security

Core Features

Natural Language Chat

Ask questions in plain English. SageAI understands context, handles follow-ups, and provides detailed answers with confidence scores.

Example: "What methodology did the transformer paper use?" → Get a detailed answer with citations to specific sections.

Vector-Based Search

Uses semantic similarity, not just keywords. Finds relevant content even when papers use different terminology.

Top-K Selection: Control how many relevant chunks (1-10) to include in context.

Smart PDF Processing

Automatically extracts text, detects sections (Abstract, Introduction, Methods, etc.), and chunks intelligently to preserve context.

Supported: Multi-page PDFs, complex layouts, academic formatting.

Answer Rating System

Rate responses from 1-5 stars. Your feedback helps track quality and identify areas for improvement.

Confidence Scores: Every answer includes an AI-generated confidence level.

Chat Interface

How to Use

  1. 1 Type your question in the composer at the bottom of the page
  2. 2 Optionally adjust Top-K (relevance) and select specific papers
  3. 3 Press Enter or click Send to get your answer
  4. 4 View citations, rate the response, and ask follow-up questions

Top-K Selector

Controls how many relevant chunks to include (1-10). Higher values give more context but may dilute relevance.

Recommended: Start with 5, adjust based on answer quality.

Paper Filter

Limit search to specific papers. Useful when comparing or focusing on particular research.

Default: "All papers" searches across your entire library.

File Management

Uploading Papers

  1. 1. Click the "Upload PDF" button in the sidebar
  2. 2. Select your PDF file (up to 50MB recommended)
  3. 3. Wait for processing—the embedder extracts text and creates chunks
  4. 4. Once complete, the paper appears in your sidebar and is ready to query

Processing Time: ~30 seconds per paper (varies by length)

Managing Papers

Delete

Click the delete icon next to any paper. This removes the paper, its chunks, and all vector embeddings.

View Stats

See detailed stats in the "Paper Stats" page: chunk count, vector count, status, and indexed time.

Analytics & History

Query History

Access all your past queries with timestamps, confidence scores, and ratings. Click "View answer" to see the full response again.

Use Case: Review your research trail, find previous insights, or revisit interesting questions.

Analytics Dashboard

See aggregated insights about your research patterns:

  • Top Questions: Most frequently asked queries
  • Top Papers: Most referenced papers in answers
  • Usage Patterns: Query volume over time

Paper Statistics

Detailed view of each paper's processing status, chunk count, vector embeddings, and metadata. Useful for debugging or understanding your library composition.

Service Health Monitoring

The Service Health page provides real-time status of all system components. Each service is checked independently with color-coded status pills.

Healthy

Service is running and responding correctly. All systems operational.

Degraded

Service is responding slowly or partially. May impact performance.

Down

Service is not responding. Check logs or restart the container.

Unknown

Unable to determine status. May indicate network issues.

Monitored Services

  • Backend API (liveness & readiness)
  • MongoDB
  • Redis
  • Qdrant
  • Ollama
  • Embedder Service

Configuration

Most users won't need to change anything. The default Docker Compose configurations are production-ready and optimized for local use.

Docker Compose Variants

File OS Mode Use Case
docker-compose.linux.dev.yml Linux Dev Local development on Linux
docker-compose.mac-win.dev.yml Mac/Win Dev Local development on macOS/Windows
docker-compose.linux.prod.yml Linux Prod Production deployment on Linux
docker-compose.mac-win.prod.yml Mac/Win Prod Production deployment on macOS/Windows

Environment Variables

These are pre-configured in the Docker Compose files. Advanced users can customize them.

Backend Variables
API_PORT=8000                    # Backend port
LOG_LEVEL=info                   # Logging verbosity
MONGO_URI=mongodb://mongo:27017  # MongoDB connection
QDRANT_URL=http://qdrant:6333    # Qdrant vector DB
REDIS_URL=redis://redis:6379     # Redis cache
OLLAMA_BASE_URL=http://...       # Ollama API endpoint
EMBEDDER_URL=http://embedder:9100 # Embedder service
RATE_LIMIT_WINDOW_MS=60000       # Rate limit window
RATE_LIMIT_MAX=120               # Max requests per window
Frontend Variables (Vite)
VITE_API_BASE=http://localhost:8000      # Backend API base
VITE_EMBEDDER_BASE=http://localhost:9100 # Embedder service base
VITE_SERVICE_BASE=http://localhost:8000  # Service health checks

Docker Setup Details

Included Services

MongoDB

Stores paper metadata, query history, and user ratings

Port: 27017 | Volume: mongo_data

Qdrant

Vector database for semantic similarity search

Port: 6333 | Volume: qdrant_data

Redis

Caches responses for faster repeat queries

Port: 6379 | In-memory

Embedder

Python service for PDF parsing and text embedding

Port: 9100 | Built from source

Backend

Node.js REST API orchestrating all services

Port: 8000 | Built from source

Frontend

React UI served via Nginx

Port: 8080 | Built from source

Useful Commands

Start all services (detached mode):

docker compose -f infra/docker-compose.linux.dev.yml up -d #Run this from project root

View logs:

docker compose -f infra/docker-compose.linux.dev.yml logs -f #Run this from the project root

Stop all services:

docker compose -f infra/docker-compose.linux.dev.yml down

Rebuild after code changes:

docker compose -f infra/docker-compose.linux.dev.yml up --build #Run this from project root

Troubleshooting

Ollama connection failed

Symptoms: Backend logs show "Ollama not ready" or connection errors.

Solutions:

  • Ensure Ollama is running: ollama serve
  • Check if accessible: curl http://localhost:11434
  • On Linux: Use docker-compose.linux.dev.yml (host networking)
  • On Mac/Windows: Make sure you're using host.docker.internal in the compose file
Port already in use

Symptoms: Docker fails to start, says port is already allocated.

Solutions:

  • Find what's using the port: lsof -i :8080 (Mac/Linux) or netstat -ano | findstr :8080 (Windows)
  • Kill the process or modify the port in your compose file
  • Common conflicts: Other dev servers, MongoDB instances, Redis
PDF upload fails or hangs

Symptoms: Upload progress bar stalls, or error message appears.

Solutions:

  • Check embedder service logs: docker logs rag_embedder
  • Ensure PDF is valid and not corrupted
  • Large PDFs (>50MB) may take longer—wait patiently
  • Check disk space: df -h
Slow or low-quality responses

Symptoms: Answers take forever, or quality is poor.

Solutions:

  • Try a smaller model: ollama pull llama3.2:1b
  • Adjust Top-K slider to fewer chunks (e.g., 3 instead of 10)
  • Check system resources: Ollama needs RAM (8GB+ recommended)
  • Clear Redis cache: docker exec rag_redis redis-cli FLUSHALL
Frontend shows "Cannot connect to backend"

Symptoms: UI loads but shows connection errors.

Solutions:

  • Check if backend is running: curl http://localhost:8000/health/healthz
  • Verify environment variables in frontend build (VITE_API_BASE)
  • Check CORS settings in backend if accessing from different domain
  • Rebuild frontend: docker compose ... up --build frontend

Frequently Asked Questions

Do I need internet after setup?

No! Once you've downloaded the Docker images and Ollama models, SageAI runs completely offline. Your data never leaves your machine.

Can I use different LLM models?

Yes! Pull any Ollama-compatible model and update the backend configuration. Popular choices: llama3, mistral, codellama.

How much disk space do I need?

Minimum 15GB free space. This includes Docker images (~5GB), Ollama models (~5GB), and room for your papers and vectors. More papers = more space needed.

Is my data persistent?

Yes! Docker volumes persist MongoDB (papers, queries) and Qdrant (vectors) data. Your data survives container restarts. To reset, use docker compose down -v.

Can I deploy this to a server?

Absolutely! Use the production compose files. Consider adding authentication, HTTPS (via reverse proxy like Nginx), and firewall rules. Perfect for lab servers or research group deployments.

What paper formats are supported?

Currently PDF only. The embedder uses PyMuPDF for robust text extraction. Works with most academic PDFs, including complex layouts and multi-column formats.

Need More Help?

Join our community or check out the API reference for integration details.