MAESTRO: Complete AI-Powered Research Platform Setup and Usage Guide

⏱️ Estimated Reading Time: 25 minutes

1. Introduction to MAESTRO

What is MAESTRO?

MAESTRO is an AI-powered research automation platform designed to efficiently handle complex research tasks. This open-source application automates entire research workflows from document collection to analysis and report generation using AI agents.

Key Features

AI Agent-Based Research: Automated research pipeline powered by LLMs
RAG (Retrieval-Augmented Generation): Vector search-based document processing
Real-time WebSocket Communication: Live monitoring of task progress
Fully Self-Hosted: Complete independent operation in local environments
Multiple LLM Support: OpenAI, local LLMs, API-compatible models
SearXNG Integration: Private metasearch engine connectivity

Technology Stack

Backend: FastAPI, SQLAlchemy, PostgreSQL, pgvector
Frontend: React, TypeScript, Vite, Tailwind CSS
Infrastructure: Docker Compose, WebSocket
AI/ML: Vector embeddings, LLM API integration

2. System Requirements

Minimum Hardware Specifications

# CPU Mode (Minimum)
- CPU: 4+ cores
- RAM: 8GB+
- Storage: 10GB+
- OS: Linux, macOS, Windows (WSL2)

# GPU Mode (Recommended)
- GPU: NVIDIA GPU (CUDA 11.0+)
- VRAM: 8GB+
- RAM: 16GB+
- Storage: 20GB+

Required Software

# Common Requirements
- Docker Desktop (latest version)
- Docker Compose v2
- Git

# Additional for GPU Usage (Linux)
- nvidia-container-toolkit
- NVIDIA Drivers (latest)

# Windows Users
- WSL2 (Ubuntu 20.04+)
- Windows Terminal (recommended)

3. Installation and Initial Setup

3.1 Repository Clone and Basic Setup

# 1. Clone MAESTRO repository
git clone https://github.com/murtaza-nasir/maestro.git
cd maestro

# 2. Grant execution permissions (Linux/macOS)
chmod +x start.sh stop.sh detect_gpu.sh maestro-cli.sh

# 3. Create environment configuration file
cp .env.example .env

3.2 Environment Variables Configuration

Edit the .env file to configure basic settings:

# .env File Key Settings
# =====================

# Database Configuration
POSTGRES_DB=maestro_db
POSTGRES_USER=maestro_user
POSTGRES_PASSWORD=your_secure_password_here

# JWT Security Settings
JWT_SECRET_KEY=your_jwt_secret_key_here
JWT_ALGORITHM=HS256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30

# LLM API Settings (for OpenAI)
OPENAI_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-4

# Local LLM Settings (for Ollama)
LOCAL_LLM_BASE_URL=http://localhost:11434/v1
LOCAL_LLM_MODEL=llama3.1:8b
USE_LOCAL_LLM=true

# Search Engine Configuration
SEARCH_ENGINE=searxng
SEARXNG_BASE_URL=http://searxng:8080

# GPU Configuration
GPU_AVAILABLE=true
BACKEND_GPU_DEVICE=0
DOC_PROCESSOR_GPU_DEVICE=0

# CPU-only Mode (for environments without GPU)
FORCE_CPU_MODE=false

3.3 GPU Support Verification

Check GPU support availability and apply optimal settings:

# Run GPU detection script
./detect_gpu.sh

# Example output:
# =========== GPU Detection Results ===========
# Platform: Linux
# GPU Support: Available
# NVIDIA Driver: 525.147.05
# CUDA Version: 12.0
# Recommended Configuration: GPU-enabled
# ===========================================

4. Platform-Specific Installation Guide

4.1 Linux (Ubuntu/Debian) - GPU Support

# 1. Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# 2. Restart Docker
sudo systemctl restart docker

# 3. Test GPU
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

# 4. Start MAESTRO
./start.sh

4.2 macOS (Apple Silicon/Intel)

# 1. Verify Docker Desktop installation
docker --version
docker-compose --version

# 2. Start in CPU-optimized mode
docker-compose -f docker-compose.cpu.yml up -d

# Or set environment variable and start normally
echo "FORCE_CPU_MODE=true" >> .env
./start.sh

4.3 Windows (WSL2)

Run PowerShell as Administrator:

# 1. Verify WSL2 and Ubuntu installation
wsl --list --verbose

# 2. Verify Docker Desktop Windows execution
docker --version

# 3. Clone repository (inside WSL2)
wsl
cd /mnt/c/
git clone https://github.com/murtaza-nasir/maestro.git
cd maestro

# 4. Set permissions and start
chmod +x *.sh
./start.sh

# Or use PowerShell script
# .\start.ps1

5. Service Configuration and Startup

5.1 Basic Service Startup

# Start with automatic platform detection
./start.sh

# Or manually run Docker Compose
docker-compose up -d

# CPU-only mode
docker-compose -f docker-compose.cpu.yml up -d

5.2 Service Status Check

# Check container status
docker-compose ps

# Check logs
docker-compose logs -f backend
docker-compose logs -f frontend
docker-compose logs -f postgres
docker-compose logs -f searxng

# Check all logs
docker-compose logs -f

5.3 Database Initialization

# Check database status
./maestro-cli.sh reset-db --check

# Query database statistics
./maestro-cli.sh reset-db --stats

# Reset database with backup (if needed)
./maestro-cli.sh reset-db --backup

6. Web Interface Access and Initial Setup

6.1 First Access and Account Creation

# Access via browser
http://localhost:3000

# Or create admin account via CLI
./maestro-cli.sh create-user admin securepassword123 --admin

6.2 Basic Configuration

Navigate to the Settings menu in the web interface and configure:

# LLM Settings
Model Provider: OpenAI / Local LLM
API Key: [YOUR_API_KEY]
Model Name: gpt-4 / llama3.1:8b
Temperature: 0.7
Max Tokens: 4000

# Search Settings
Search Engine: SearXNG
Categories: 
  - General
  - Science
  - IT
  - News
Results per Query: 10

# Research Parameters
Planning Context: 200000
Max Documents: 50
Chunk Size: 1000
Overlap: 200

7. Local LLM Integration (Ollama)

7.1 Ollama Installation and Setup

# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.ai/install.sh | sh

# Windows (PowerShell)
# Invoke-WebRequest -Uri https://ollama.ai/install.ps1 -OutFile install.ps1; .\install.ps1

# Download models
ollama pull llama3.1:8b
ollama pull codellama:7b
ollama pull mistral:7b

# Start Ollama service
ollama serve

7.2 MAESTRO and Ollama Integration

Modify the .env file as follows:

# Local LLM Settings
USE_LOCAL_LLM=true
LOCAL_LLM_BASE_URL=http://host.docker.internal:11434/v1
LOCAL_LLM_MODEL=llama3.1:8b
LOCAL_LLM_API_KEY=ollama

# Use OpenAI-compatible endpoint
LLM_PROVIDER=local

7.3 Integration Testing

# Test model via CLI
./maestro-cli.sh test-llm

# Or test directly with Python script
python << EOF
import requests

response = requests.post('http://localhost:11434/v1/chat/completions', 
    json={
        'model': 'llama3.1:8b',
        'messages': [{'role': 'user', 'content': 'Hello, MAESTRO!'}],
        'max_tokens': 100
    }
)
print(response.json())
EOF

8. SearXNG Search Engine Configuration

8.1 SearXNG Container Configuration Check

# Check SearXNG container status
docker-compose logs searxng

# Check configuration file
docker-compose exec searxng cat /etc/searxng/settings.yml

8.2 Search Categories Configuration

Customize SearXNG’s settings.yml file:

# searxng/settings.yml
search:
  safe_search: 0
  autocomplete: duckduckgo
  default_lang: ""
  formats:
    - html
    - json  # Required for MAESTRO integration

categories:
  general:
    - google
    - bing
    - duckduckgo
  science:
    - arxiv
    - pubmed
    - semantic scholar
  it:
    - github
    - stackoverflow
    - documentation
  news:
    - google news
    - reuters
    - bbc

8.3 Private Search Testing

# Test SearXNG API
curl "http://localhost:8080/search?q=artificial+intelligence&format=json&category=science"

# Test search in MAESTRO
# Web Interface > Settings > Search > Test Search button

9. Practical Usage Scenarios

9.1 Document Collection and Analysis

# Bulk document upload via CLI
./maestro-cli.sh ingest username ./research_documents

# Supported formats
# - PDF, DOCX, TXT, MD
# - Web URLs, arXiv papers
# - JSON, CSV data

# Monitor upload progress
./maestro-cli.sh status username

9.2 Research Project Creation

Create a new research project in the web interface:

# Research Configuration Example
Project Name: "AI Agent Architecture Analysis"
Research Question: "What are the latest trends in AI agent architectures?"
Scope: 
  - Academic papers (2023-2024)
  - Industry reports
  - Technical documentation
Output Format: "Comprehensive report with citations"

9.3 AI Agent Workflow Execution

# 1. Planning Phase
Research Agent -> Planning Context Analysis
              -> Outline Generation
              -> Resource Identification

# 2. Data Collection Phase  
Search Agent -> Web Search (SearXNG)
             -> Document Retrieval
             -> Content Extraction

# 3. Analysis Phase
Analysis Agent -> RAG-based Analysis
               -> Cross-reference Validation
               -> Insight Generation

# 4. Report Generation Phase
Report Agent -> Content Synthesis
             -> Citation Management
             -> Output Formatting

10. Advanced Configuration and Optimization

10.1 GPU Memory Optimization

# Monitor GPU memory
nvidia-smi -l 1

# Memory usage optimization settings
# Add to .env file
MAX_GPU_MEMORY=8192  # In MB
BATCH_SIZE=32
CHUNK_OVERLAP=100

10.2 Multi-GPU Configuration

# GPU allocation per service
BACKEND_GPU_DEVICE=0
DOC_PROCESSOR_GPU_DEVICE=1
CLI_GPU_DEVICE=0

# Check GPU load balancing
nvidia-smi topo -m

10.3 Performance Tuning

# PostgreSQL tuning
# In docker-compose.yml postgres service settings
environment:
  - POSTGRES_SHARED_PRELOAD_LIBRARIES=pg_stat_statements,auto_explain
  - POSTGRES_LOG_STATEMENT=all
  - POSTGRES_EFFECTIVE_CACHE_SIZE=4GB
  - POSTGRES_SHARED_BUFFERS=1GB

# pgvector index optimization
docker-compose exec postgres psql -U maestro_user -d maestro_db
CREATE INDEX CONCURRENTLY idx_embeddings_cosine ON documents 
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

11. Troubleshooting Guide

11.1 Common Errors and Solutions

# 1. Port conflict error
Error: Port 3000 already in use
Solution: sudo lsof -i :3000; kill -9 <PID>

# 2. GPU memory shortage
CUDA out of memory
Solution: Set FORCE_CPU_MODE=true or adjust batch size

# 3. Database connection error
Connection refused to PostgreSQL
Solution: docker-compose restart postgres

# 4. Ollama connection failure
Local LLM connection failed
Solution: Use actual IP instead of host.docker.internal

11.2 Debugging Tools Usage

# Enable detailed logging
export MAESTRO_LOG_LEVEL=DEBUG
docker-compose up -d

# Access container internals
docker-compose exec backend bash
docker-compose exec postgres psql -U maestro_user -d maestro_db

# Health checks
curl http://localhost:8000/health
curl http://localhost:3000/health

11.3 Data Backup and Recovery

# Database backup
docker-compose exec postgres pg_dump -U maestro_user maestro_db > backup.sql

# Vector data backup (including pgvector extension)
docker-compose exec postgres pg_dump -U maestro_user -Fc maestro_db > backup.dump

# Recovery
docker-compose exec -T postgres psql -U maestro_user -d maestro_db < backup.sql

12. Security Considerations

12.1 Authentication and Authorization Management

# Generate strong JWT secret
openssl rand -hex 32

# User permission settings
./maestro-cli.sh create-user researcher pass123 --role user
./maestro-cli.sh create-user admin admin123 --role admin

# API key rotation
./maestro-cli.sh rotate-keys

12.2 Network Security

# Firewall configuration (Ubuntu/Debian)
sudo ufw allow from 192.168.1.0/24 to any port 3000
sudo ufw allow from 192.168.1.0/24 to any port 8000

# Reverse Proxy configuration (Nginx)
# nginx/maestro.conf
server {
    listen 443 ssl;
    server_name maestro.yourdomain.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location / {
        proxy_pass http://localhost:3000;
        proxy_websocket_upgrade on;
    }
    
    location /api {
        proxy_pass http://localhost:8000;
    }
}

13. Monitoring and Maintenance

13.1 System Monitoring

# Resource usage monitoring
docker stats maestro_backend maestro_frontend maestro_postgres

# Log rotation configuration
# Add to docker-compose.yml
logging:
  driver: json-file
  options:
    max-size: "100m"
    max-file: "3"

# Automatic health checks
# healthcheck.sh
#!/bin/bash
curl -f http://localhost:8000/health || exit 1
curl -f http://localhost:3000/ || exit 1

13.2 Regular Maintenance

# Weekly maintenance script
#!/bin/bash
# weekly_maintenance.sh

# 1. Update containers
docker-compose pull
docker-compose up -d

# 2. Database cleanup
./maestro-cli.sh cleanup-orphaned-docs

# 3. Log compression
find /var/log/maestro -name "*.log" -mtime +7 -exec gzip {} \;

# 4. System status report
./maestro-cli.sh system-report > /var/log/maestro/weekly_report_$(date +%Y%m%d).txt

14. Extension and Customization

14.1 Custom AI Agent Development

# maestro_backend/agents/custom_agent.py
from maestro_backend.core.agent_base import BaseAgent

class CustomResearchAgent(BaseAgent):
    def __init__(self, config):
        super().__init__(config)
        self.specialty = "domain_specific_research"
    
    async def process_request(self, request):
        """Implement custom research logic"""
        results = await self.search_documents(request.query)
        analysis = await self.analyze_with_llm(results)
        return await self.generate_report(analysis)
    
    async def search_documents(self, query):
        """Domain-specific search logic"""
        # Implementation logic
        pass

14.2 API Extension

# maestro_backend/api/custom_endpoints.py
from fastapi import APIRouter, Depends
from maestro_backend.core.auth import get_current_user

router = APIRouter(prefix="/api/custom", tags=["custom"])

@router.post("/domain-research")
async def domain_research(
    request: DomainResearchRequest,
    current_user: User = Depends(get_current_user)
):
    """Custom domain research endpoint"""
    agent = CustomResearchAgent(config)
    results = await agent.process_request(request)
    return {"results": results, "status": "completed"}

15. Troubleshooting Checklist

15.1 Post-Installation Checklist

All Docker containers running (docker-compose ps)
Ports 3000, 8000, 5432, 8080 accessible
Database connection normal (./maestro-cli.sh reset-db --check)
LLM API connection test passed
Web interface login available
Search functionality working normally

15.2 Performance Optimization Checklist

GPU memory usage monitoring
PostgreSQL index optimization
SearXNG response speed verification
Document processing batch size adjustment
Cache configuration verification

16. Conclusion

MAESTRO presents a powerful platform that introduces a new paradigm for AI-powered research automation. Through this tutorial, you can completely master everything from basic installation to advanced configuration.

Key Achievements

✅ Complete Self-Hosted Environment Setup
✅ AI Agent-Based Research Workflow Implementation
✅ Local LLM and Private Search Engine Integration
✅ Scalable Architecture Understanding

Next Steps

Advanced AI Agent Development: Implement domain-specific research agents
Enterprise Environment Deployment: Consider Kubernetes cluster deployment
API Integration: Expand integration with existing research tools
Community Contribution: Participate in MAESTRO open-source project

Experience revolutionary research productivity improvements with MAESTRO and explore the infinite possibilities of AI agents! 🚀

References