UltraRAG Complete Tutorial: Building Advanced RAG Systems with Low-Code Framework

⏱️ Estimated Reading Time: 15 minutes

Introduction to UltraRAG

UltraRAG 2.0 is a revolutionary MCP (Model Context Protocol)-based low-code RAG framework developed by OpenBMB. With the motto “Less Code, Lower Barrier, Faster Deployment”, UltraRAG enables researchers and developers to build complex RAG pipelines with minimal coding effort.

Key Features

Low-Code Framework: Build sophisticated RAG systems with YAML configuration files
MCP Integration: Leverages Model Context Protocol for seamless model communication
Extensive Dataset Support: Built-in support for 17+ popular evaluation datasets
Multiple Baseline Methods: Pre-implemented state-of-the-art RAG approaches
Docker Support: Easy deployment and containerized environments
Modular Architecture: Flexible pipeline components for customization

Supported Baseline Methods

UltraRAG comes with pre-implemented advanced RAG methods:

Vanilla RAG: Basic retrieval-augmented generation
IRCoT: Interleaving Retrieval with Chain-of-Thought
IterRetGen: Iterative Retrieval and Generation
RankCoT: Ranking-based Chain-of-Thought
R1-searcher: Advanced search methodology
Search-o1: Optimized search algorithm
Search-r1: Refined search approach
WebNote: Web-based note-taking integration

Prerequisites

Before starting, ensure your system meets these requirements:

# System Requirements
- Python 3.9+
- CUDA support (optional, for GPU acceleration)
- Docker (optional, for containerized deployment)
- Git for repository cloning

Installation Guide

Method 1: Using UV Package Manager (Recommended)

UltraRAG uses the modern uv package manager for faster dependency resolution and installation.

Step 1: Install UV Package Manager

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Refresh your shell or run:
source ~/.bashrc  # or ~/.zshrc for zsh users

Step 2: Clone the Repository

# Clone UltraRAG repository
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG

Step 3: Install Dependencies

Choose the installation option that best fits your needs:

# Basic installation (minimal dependencies)
uv pip install -e .

# For LLM hosting support (includes vLLM)
uv pip install -e ".[vllm]"

# For document parsing capabilities
uv pip install -e ".[corpus]"

# Complete installation (all features except FAISS)
uv pip install -e ".[all]"

Step 4: Verify Installation

# Test the installation
ultrarag run examples/sayhello.yaml

Expected Output:

Hello, UltraRAG 2.0!
Welcome to the advanced RAG framework!

Method 2: Docker Installation

For containerized deployment and easier environment management:

Step 1: Build Docker Image

# Clone and navigate to UltraRAG directory
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG

# Build the Docker image
docker build -t ultrarag:v2.0.0-beta .

Step 2: Run Interactive Container

# Start interactive Docker container with GPU support
docker run -it --rm --gpus all ultrarag:v2.0.0-beta bash

# Inside the container, verify installation
ultrarag run examples/sayhello.yaml

Basic Usage: Your First RAG Pipeline

Understanding UltraRAG Workflow

UltraRAG follows a simple three-step process:

Compile Pipeline: Generate parameter configuration from YAML
Modify Parameters: Customize settings as needed
Run Pipeline: Execute the configured RAG system

Example 1: Basic Vanilla RAG

Let’s start with a simple vanilla RAG implementation:

Step 1: Examine the Configuration

# View the basic RAG example
cat examples/rag.yaml

Step 2: Compile the Pipeline

# Generate configuration parameters
ultrarag compile examples/rag.yaml

This creates a rag_params.yaml file with all configurable parameters.

Step 3: Customize Parameters (Optional)

# Edit the generated parameters file
nano rag_params.yaml

# Key parameters to customize:
# - model_name: LLM model to use
# - retriever_name: Embedding model for retrieval
# - corpus_path: Path to your document corpus
# - dataset_path: Evaluation dataset location

Step 4: Run the Pipeline

# Execute the RAG pipeline
ultrarag run examples/rag.yaml

Example 2: Advanced RAG with Chain-of-Thought

Let’s implement a more sophisticated RAG system using IRCoT (Interleaving Retrieval with Chain-of-Thought):

# Compile IRCoT pipeline
ultrarag compile examples/IRCoT.yaml

# Run IRCoT RAG system
ultrarag run examples/IRCoT.yaml

Working with Datasets and Corpora

Supported Evaluation Datasets

UltraRAG provides built-in support for 17 popular datasets:

Dataset Type	Dataset Name	Original Size	Evaluation Sample
QA	Natural Questions (NQ)	3,610	1,000
QA	TriviaQA	11,313	1,000
QA	PopQA	14,267	1,000
Multi-hop QA	HotpotQA	7,405	1,000
Multi-hop QA	2WikiMultiHopQA	12,576	1,000
Multiple-choice	ARC	3,548	1,000
Multiple-choice	MMLU	14,042	1,000
Long-form QA	ASQA	948	948
Fact-verification	FEVER	13,332	1,000

Using Custom Datasets

To use your own dataset, follow the UltraRAG data format:

{
  "id": "unique_question_id",
  "question": "Your question text",
  "answers": ["answer1", "answer2"],
  "context": "optional_context_information"
}

Setting Up Document Corpus

Option 1: Use Pre-built Corpus

# Download wiki-2018 corpus (21M+ documents)
# Follow the dataset download instructions in the documentation

Option 2: Create Custom Corpus

# Create your document corpus directory
mkdir -p data/my_corpus

# Add your documents (text files)
# UltraRAG supports various formats: .txt, .pdf, .docx
cp your_documents/* data/my_corpus/

Deploying Retrievers and LLMs

Setting Up a Retriever Server

UltraRAG can deploy retrieval services for corpus indexing and search:

# Start retriever server
ultrarag serve retriever \
  --model_name BAAI/bge-m3 \
  --corpus_path data/wiki-2018 \
  --port 8001

Deploying LLM Services

Deploy language models using vLLM backend:

# Deploy OpenAI-compatible LLM server
ultrarag serve llm \
  --model_name Qwen/Qwen2.5-72B-Instruct \
  --port 8000 \
  --gpu_memory_utilization 0.8

Using External API Services

Configure UltraRAG to use external APIs:

# In your pipeline configuration
llm:
  provider: "openai"
  api_key: "your_api_key"
  model: "gpt-4"
  base_url: "https://api.openai.com/v1"

retriever:
  provider: "custom"
  endpoint: "http://your-retriever-service:8001"

Advanced Configuration Examples

Example 3: Multi-Step Reasoning with IterRetGen

# examples/advanced_iterretgen.yaml
pipeline:
  name: "iterative_retrieval_generation"
  components:
    - retriever:
        model: "BAAI/bge-m3"
        top_k: 10
        iterations: 3
    - llm:
        model: "Qwen/Qwen2.5-72B-Instruct"
        temperature: 0.1
        max_tokens: 1024
    - reasoning:
        strategy: "iterative"
        max_iterations: 5
        convergence_threshold: 0.95

evaluation:
  dataset: "hotpotqa"
  metrics: ["exact_match", "f1_score", "retrieval_precision"]

Example 4: Custom Search Strategy

# examples/custom_search.yaml
pipeline:
  name: "custom_search_rag"
  components:
    - search:
        strategy: "search-o1"
        search_depth: 3
        query_expansion: true
        rerank_threshold: 0.7
    - retriever:
        model: "sentence-transformers/all-MiniLM-L6-v2"
        chunk_size: 512
        chunk_overlap: 50
    - generator:
        model: "microsoft/DialoGPT-large"
        response_length: "medium"

Performance Optimization

GPU Acceleration Setup

# Verify CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Configure GPU settings in your pipeline
gpu_config:
  enabled: true
  device_map: "auto"
  memory_fraction: 0.8
  mixed_precision: true

Memory Optimization

# In your pipeline configuration
optimization:
  batch_size: 8
  gradient_checkpointing: true
  cpu_offload: true
  memory_efficient_attention: true

Debugging and Troubleshooting

Common Issues and Solutions

Issue 1: CUDA Out of Memory

# Solution: Reduce batch size or use CPU offload
# In your configuration:
optimization:
  batch_size: 2
  cpu_offload: true

Issue 2: Slow Retrieval Performance

# Solution: Use approximate search or reduce corpus size
retriever:
  search_type: "approximate"
  index_type: "faiss"
  nprobe: 10

Issue 3: Model Loading Errors

# Solution: Check model availability and permissions
# Verify model download:
huggingface-cli download Qwen/Qwen2.5-7B-Instruct

Debug Mode

Enable detailed logging for troubleshooting:

# Run with debug output
ultrarag run examples/rag.yaml --debug --verbose

# Check logs
tail -f logs/ultrarag.log

Integration with Other Tools

Jupyter Notebook Integration

# In Jupyter notebook
import ultrarag

# Load and run pipeline
pipeline = ultrarag.load_pipeline("examples/rag.yaml")
results = pipeline.run(question="What is machine learning?")
print(results)

API Integration

# RESTful API usage
import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "ultrarag",
        "messages": [{"role": "user", "content": "Explain quantum computing"}],
        "rag_enabled": True
    }
)

Best Practices

1. Configuration Management

Use version control for your pipeline configurations
Maintain separate configs for development and production
Document custom parameters and their effects

2. Data Preparation

Ensure consistent document formatting
Implement proper text preprocessing
Use appropriate chunk sizes for your domain

3. Evaluation Strategy

Establish baseline metrics before optimization
Use multiple evaluation datasets
Implement A/B testing for configuration changes

4. Resource Management

Monitor GPU memory usage
Implement proper caching strategies
Use batch processing for large-scale evaluation

Production Deployment

Docker Compose Setup

# docker-compose.yml
version: '3.8'
services:
  ultrarag-api:
    image: ultrarag:v2.0.0-beta
    ports:
      - "8000:8000"
    environment:
      - CUDA_VISIBLE_DEVICES=0
    volumes:
      - ./data:/app/data
      - ./configs:/app/configs
    command: ultrarag serve api --config configs/production.yaml

  retriever:
    image: ultrarag:v2.0.0-beta
    ports:
      - "8001:8001"
    command: ultrarag serve retriever --port 8001

Kubernetes Deployment

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ultrarag-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ultrarag
  template:
    metadata:
      labels:
        app: ultrarag
    spec:
      containers:
      - name: ultrarag
        image: ultrarag:v2.0.0-beta
        ports:
        - containerPort: 8000
        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1

Conclusion

UltraRAG 2.0 represents a significant advancement in making RAG systems accessible to researchers and developers. With its low-code approach, extensive baseline support, and flexible architecture, you can rapidly prototype and deploy sophisticated RAG applications.

Key Takeaways

Easy Setup: Quick installation with uv or Docker
Rich Ecosystem: 17+ datasets and multiple baseline methods
Flexible Configuration: YAML-based pipeline definition
Production Ready: Docker and Kubernetes deployment options
Research Friendly: Built-in evaluation and debugging tools

Next Steps

Experiment with different baseline methods
Test on your domain-specific datasets
Customize retrievers and generators for your use case
Deploy in production with monitoring and scaling

For more advanced features and the latest updates, visit the official UltraRAG documentation and GitHub repository.

This tutorial covered the essential aspects of UltraRAG 2.0. For specific implementation details and advanced configurations, refer to the official documentation and example repositories.