⏱️ Estimated Reading Time: 15 minutes

Introduction to UltraRAG

UltraRAG 2.0 is a revolutionary MCP (Model Context Protocol)-based low-code RAG framework developed by OpenBMB. With the motto “Less Code, Lower Barrier, Faster Deployment”, UltraRAG enables researchers and developers to build complex RAG pipelines with minimal coding effort.

Key Features

  • Low-Code Framework: Build sophisticated RAG systems with YAML configuration files
  • MCP Integration: Leverages Model Context Protocol for seamless model communication
  • Extensive Dataset Support: Built-in support for 17+ popular evaluation datasets
  • Multiple Baseline Methods: Pre-implemented state-of-the-art RAG approaches
  • Docker Support: Easy deployment and containerized environments
  • Modular Architecture: Flexible pipeline components for customization

Supported Baseline Methods

UltraRAG comes with pre-implemented advanced RAG methods:

  • Vanilla RAG: Basic retrieval-augmented generation
  • IRCoT: Interleaving Retrieval with Chain-of-Thought
  • IterRetGen: Iterative Retrieval and Generation
  • RankCoT: Ranking-based Chain-of-Thought
  • R1-searcher: Advanced search methodology
  • Search-o1: Optimized search algorithm
  • Search-r1: Refined search approach
  • WebNote: Web-based note-taking integration

Prerequisites

Before starting, ensure your system meets these requirements:

# System Requirements
- Python 3.9+
- CUDA support (optional, for GPU acceleration)
- Docker (optional, for containerized deployment)
- Git for repository cloning

Installation Guide

UltraRAG uses the modern uv package manager for faster dependency resolution and installation.

Step 1: Install UV Package Manager

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Refresh your shell or run:
source ~/.bashrc  # or ~/.zshrc for zsh users

Step 2: Clone the Repository

# Clone UltraRAG repository
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG

Step 3: Install Dependencies

Choose the installation option that best fits your needs:

# Basic installation (minimal dependencies)
uv pip install -e .

# For LLM hosting support (includes vLLM)
uv pip install -e ".[vllm]"

# For document parsing capabilities
uv pip install -e ".[corpus]"

# Complete installation (all features except FAISS)
uv pip install -e ".[all]"

Step 4: Verify Installation

# Test the installation
ultrarag run examples/sayhello.yaml

Expected Output:

Hello, UltraRAG 2.0!
Welcome to the advanced RAG framework!

Method 2: Docker Installation

For containerized deployment and easier environment management:

Step 1: Build Docker Image

# Clone and navigate to UltraRAG directory
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG

# Build the Docker image
docker build -t ultrarag:v2.0.0-beta .

Step 2: Run Interactive Container

# Start interactive Docker container with GPU support
docker run -it --rm --gpus all ultrarag:v2.0.0-beta bash

# Inside the container, verify installation
ultrarag run examples/sayhello.yaml

Basic Usage: Your First RAG Pipeline

Understanding UltraRAG Workflow

UltraRAG follows a simple three-step process:

  1. Compile Pipeline: Generate parameter configuration from YAML
  2. Modify Parameters: Customize settings as needed
  3. Run Pipeline: Execute the configured RAG system

Example 1: Basic Vanilla RAG

Let’s start with a simple vanilla RAG implementation:

Step 1: Examine the Configuration

# View the basic RAG example
cat examples/rag.yaml

Step 2: Compile the Pipeline

# Generate configuration parameters
ultrarag compile examples/rag.yaml

This creates a rag_params.yaml file with all configurable parameters.

Step 3: Customize Parameters (Optional)

# Edit the generated parameters file
nano rag_params.yaml

# Key parameters to customize:
# - model_name: LLM model to use
# - retriever_name: Embedding model for retrieval
# - corpus_path: Path to your document corpus
# - dataset_path: Evaluation dataset location

Step 4: Run the Pipeline

# Execute the RAG pipeline
ultrarag run examples/rag.yaml

Example 2: Advanced RAG with Chain-of-Thought

Let’s implement a more sophisticated RAG system using IRCoT (Interleaving Retrieval with Chain-of-Thought):

# Compile IRCoT pipeline
ultrarag compile examples/IRCoT.yaml

# Run IRCoT RAG system
ultrarag run examples/IRCoT.yaml

Working with Datasets and Corpora

Supported Evaluation Datasets

UltraRAG provides built-in support for 17 popular datasets:

Dataset Type Dataset Name Original Size Evaluation Sample
QA Natural Questions (NQ) 3,610 1,000
QA TriviaQA 11,313 1,000
QA PopQA 14,267 1,000
Multi-hop QA HotpotQA 7,405 1,000
Multi-hop QA 2WikiMultiHopQA 12,576 1,000
Multiple-choice ARC 3,548 1,000
Multiple-choice MMLU 14,042 1,000
Long-form QA ASQA 948 948
Fact-verification FEVER 13,332 1,000

Using Custom Datasets

To use your own dataset, follow the UltraRAG data format:

{
  "id": "unique_question_id",
  "question": "Your question text",
  "answers": ["answer1", "answer2"],
  "context": "optional_context_information"
}

Setting Up Document Corpus

Option 1: Use Pre-built Corpus

# Download wiki-2018 corpus (21M+ documents)
# Follow the dataset download instructions in the documentation

Option 2: Create Custom Corpus

# Create your document corpus directory
mkdir -p data/my_corpus

# Add your documents (text files)
# UltraRAG supports various formats: .txt, .pdf, .docx
cp your_documents/* data/my_corpus/

Deploying Retrievers and LLMs

Setting Up a Retriever Server

UltraRAG can deploy retrieval services for corpus indexing and search:

# Start retriever server
ultrarag serve retriever \
  --model_name BAAI/bge-m3 \
  --corpus_path data/wiki-2018 \
  --port 8001

Deploying LLM Services

Deploy language models using vLLM backend:

# Deploy OpenAI-compatible LLM server
ultrarag serve llm \
  --model_name Qwen/Qwen2.5-72B-Instruct \
  --port 8000 \
  --gpu_memory_utilization 0.8

Using External API Services

Configure UltraRAG to use external APIs:

# In your pipeline configuration
llm:
  provider: "openai"
  api_key: "your_api_key"
  model: "gpt-4"
  base_url: "https://api.openai.com/v1"

retriever:
  provider: "custom"
  endpoint: "http://your-retriever-service:8001"

Advanced Configuration Examples

Example 3: Multi-Step Reasoning with IterRetGen

# examples/advanced_iterretgen.yaml
pipeline:
  name: "iterative_retrieval_generation"
  components:
    - retriever:
        model: "BAAI/bge-m3"
        top_k: 10
        iterations: 3
    - llm:
        model: "Qwen/Qwen2.5-72B-Instruct"
        temperature: 0.1
        max_tokens: 1024
    - reasoning:
        strategy: "iterative"
        max_iterations: 5
        convergence_threshold: 0.95

evaluation:
  dataset: "hotpotqa"
  metrics: ["exact_match", "f1_score", "retrieval_precision"]

Example 4: Custom Search Strategy

# examples/custom_search.yaml
pipeline:
  name: "custom_search_rag"
  components:
    - search:
        strategy: "search-o1"
        search_depth: 3
        query_expansion: true
        rerank_threshold: 0.7
    - retriever:
        model: "sentence-transformers/all-MiniLM-L6-v2"
        chunk_size: 512
        chunk_overlap: 50
    - generator:
        model: "microsoft/DialoGPT-large"
        response_length: "medium"

Performance Optimization

GPU Acceleration Setup

# Verify CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Configure GPU settings in your pipeline
gpu_config:
  enabled: true
  device_map: "auto"
  memory_fraction: 0.8
  mixed_precision: true

Memory Optimization

# In your pipeline configuration
optimization:
  batch_size: 8
  gradient_checkpointing: true
  cpu_offload: true
  memory_efficient_attention: true

Debugging and Troubleshooting

Common Issues and Solutions

Issue 1: CUDA Out of Memory

# Solution: Reduce batch size or use CPU offload
# In your configuration:
optimization:
  batch_size: 2
  cpu_offload: true

Issue 2: Slow Retrieval Performance

# Solution: Use approximate search or reduce corpus size
retriever:
  search_type: "approximate"
  index_type: "faiss"
  nprobe: 10

Issue 3: Model Loading Errors

# Solution: Check model availability and permissions
# Verify model download:
huggingface-cli download Qwen/Qwen2.5-7B-Instruct

Debug Mode

Enable detailed logging for troubleshooting:

# Run with debug output
ultrarag run examples/rag.yaml --debug --verbose

# Check logs
tail -f logs/ultrarag.log

Integration with Other Tools

Jupyter Notebook Integration

# In Jupyter notebook
import ultrarag

# Load and run pipeline
pipeline = ultrarag.load_pipeline("examples/rag.yaml")
results = pipeline.run(question="What is machine learning?")
print(results)

API Integration

# RESTful API usage
import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "ultrarag",
        "messages": [{"role": "user", "content": "Explain quantum computing"}],
        "rag_enabled": True
    }
)

Best Practices

1. Configuration Management

  • Use version control for your pipeline configurations
  • Maintain separate configs for development and production
  • Document custom parameters and their effects

2. Data Preparation

  • Ensure consistent document formatting
  • Implement proper text preprocessing
  • Use appropriate chunk sizes for your domain

3. Evaluation Strategy

  • Establish baseline metrics before optimization
  • Use multiple evaluation datasets
  • Implement A/B testing for configuration changes

4. Resource Management

  • Monitor GPU memory usage
  • Implement proper caching strategies
  • Use batch processing for large-scale evaluation

Production Deployment

Docker Compose Setup

# docker-compose.yml
version: '3.8'
services:
  ultrarag-api:
    image: ultrarag:v2.0.0-beta
    ports:
      - "8000:8000"
    environment:
      - CUDA_VISIBLE_DEVICES=0
    volumes:
      - ./data:/app/data
      - ./configs:/app/configs
    command: ultrarag serve api --config configs/production.yaml

  retriever:
    image: ultrarag:v2.0.0-beta
    ports:
      - "8001:8001"
    command: ultrarag serve retriever --port 8001

Kubernetes Deployment

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ultrarag-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ultrarag
  template:
    metadata:
      labels:
        app: ultrarag
    spec:
      containers:
      - name: ultrarag
        image: ultrarag:v2.0.0-beta
        ports:
        - containerPort: 8000
        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1

Conclusion

UltraRAG 2.0 represents a significant advancement in making RAG systems accessible to researchers and developers. With its low-code approach, extensive baseline support, and flexible architecture, you can rapidly prototype and deploy sophisticated RAG applications.

Key Takeaways

  • Easy Setup: Quick installation with uv or Docker
  • Rich Ecosystem: 17+ datasets and multiple baseline methods
  • Flexible Configuration: YAML-based pipeline definition
  • Production Ready: Docker and Kubernetes deployment options
  • Research Friendly: Built-in evaluation and debugging tools

Next Steps

  1. Experiment with different baseline methods
  2. Test on your domain-specific datasets
  3. Customize retrievers and generators for your use case
  4. Deploy in production with monitoring and scaling

For more advanced features and the latest updates, visit the official UltraRAG documentation and GitHub repository.


This tutorial covered the essential aspects of UltraRAG 2.0. For specific implementation details and advanced configurations, refer to the official documentation and example repositories.