UltraRAG Complete Tutorial: Building Advanced RAG Systems with Low-Code Framework
⏱️ Estimated Reading Time: 15 minutes
Introduction to UltraRAG
UltraRAG 2.0 is a revolutionary MCP (Model Context Protocol)-based low-code RAG framework developed by OpenBMB. With the motto “Less Code, Lower Barrier, Faster Deployment”, UltraRAG enables researchers and developers to build complex RAG pipelines with minimal coding effort.
Key Features
- Low-Code Framework: Build sophisticated RAG systems with YAML configuration files
- MCP Integration: Leverages Model Context Protocol for seamless model communication
- Extensive Dataset Support: Built-in support for 17+ popular evaluation datasets
- Multiple Baseline Methods: Pre-implemented state-of-the-art RAG approaches
- Docker Support: Easy deployment and containerized environments
- Modular Architecture: Flexible pipeline components for customization
Supported Baseline Methods
UltraRAG comes with pre-implemented advanced RAG methods:
- Vanilla RAG: Basic retrieval-augmented generation
- IRCoT: Interleaving Retrieval with Chain-of-Thought
- IterRetGen: Iterative Retrieval and Generation
- RankCoT: Ranking-based Chain-of-Thought
- R1-searcher: Advanced search methodology
- Search-o1: Optimized search algorithm
- Search-r1: Refined search approach
- WebNote: Web-based note-taking integration
Prerequisites
Before starting, ensure your system meets these requirements:
# System Requirements
- Python 3.9+
- CUDA support (optional, for GPU acceleration)
- Docker (optional, for containerized deployment)
- Git for repository cloning
Installation Guide
Method 1: Using UV Package Manager (Recommended)
UltraRAG uses the modern uv
package manager for faster dependency resolution and installation.
Step 1: Install UV Package Manager
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Refresh your shell or run:
source ~/.bashrc # or ~/.zshrc for zsh users
Step 2: Clone the Repository
# Clone UltraRAG repository
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG
Step 3: Install Dependencies
Choose the installation option that best fits your needs:
# Basic installation (minimal dependencies)
uv pip install -e .
# For LLM hosting support (includes vLLM)
uv pip install -e ".[vllm]"
# For document parsing capabilities
uv pip install -e ".[corpus]"
# Complete installation (all features except FAISS)
uv pip install -e ".[all]"
Step 4: Verify Installation
# Test the installation
ultrarag run examples/sayhello.yaml
Expected Output:
Hello, UltraRAG 2.0!
Welcome to the advanced RAG framework!
Method 2: Docker Installation
For containerized deployment and easier environment management:
Step 1: Build Docker Image
# Clone and navigate to UltraRAG directory
git clone https://github.com/OpenBMB/UltraRAG.git
cd UltraRAG
# Build the Docker image
docker build -t ultrarag:v2.0.0-beta .
Step 2: Run Interactive Container
# Start interactive Docker container with GPU support
docker run -it --rm --gpus all ultrarag:v2.0.0-beta bash
# Inside the container, verify installation
ultrarag run examples/sayhello.yaml
Basic Usage: Your First RAG Pipeline
Understanding UltraRAG Workflow
UltraRAG follows a simple three-step process:
- Compile Pipeline: Generate parameter configuration from YAML
- Modify Parameters: Customize settings as needed
- Run Pipeline: Execute the configured RAG system
Example 1: Basic Vanilla RAG
Let’s start with a simple vanilla RAG implementation:
Step 1: Examine the Configuration
# View the basic RAG example
cat examples/rag.yaml
Step 2: Compile the Pipeline
# Generate configuration parameters
ultrarag compile examples/rag.yaml
This creates a rag_params.yaml
file with all configurable parameters.
Step 3: Customize Parameters (Optional)
# Edit the generated parameters file
nano rag_params.yaml
# Key parameters to customize:
# - model_name: LLM model to use
# - retriever_name: Embedding model for retrieval
# - corpus_path: Path to your document corpus
# - dataset_path: Evaluation dataset location
Step 4: Run the Pipeline
# Execute the RAG pipeline
ultrarag run examples/rag.yaml
Example 2: Advanced RAG with Chain-of-Thought
Let’s implement a more sophisticated RAG system using IRCoT (Interleaving Retrieval with Chain-of-Thought):
# Compile IRCoT pipeline
ultrarag compile examples/IRCoT.yaml
# Run IRCoT RAG system
ultrarag run examples/IRCoT.yaml
Working with Datasets and Corpora
Supported Evaluation Datasets
UltraRAG provides built-in support for 17 popular datasets:
Dataset Type | Dataset Name | Original Size | Evaluation Sample |
---|---|---|---|
QA | Natural Questions (NQ) | 3,610 | 1,000 |
QA | TriviaQA | 11,313 | 1,000 |
QA | PopQA | 14,267 | 1,000 |
Multi-hop QA | HotpotQA | 7,405 | 1,000 |
Multi-hop QA | 2WikiMultiHopQA | 12,576 | 1,000 |
Multiple-choice | ARC | 3,548 | 1,000 |
Multiple-choice | MMLU | 14,042 | 1,000 |
Long-form QA | ASQA | 948 | 948 |
Fact-verification | FEVER | 13,332 | 1,000 |
Using Custom Datasets
To use your own dataset, follow the UltraRAG data format:
{
"id": "unique_question_id",
"question": "Your question text",
"answers": ["answer1", "answer2"],
"context": "optional_context_information"
}
Setting Up Document Corpus
Option 1: Use Pre-built Corpus
# Download wiki-2018 corpus (21M+ documents)
# Follow the dataset download instructions in the documentation
Option 2: Create Custom Corpus
# Create your document corpus directory
mkdir -p data/my_corpus
# Add your documents (text files)
# UltraRAG supports various formats: .txt, .pdf, .docx
cp your_documents/* data/my_corpus/
Deploying Retrievers and LLMs
Setting Up a Retriever Server
UltraRAG can deploy retrieval services for corpus indexing and search:
# Start retriever server
ultrarag serve retriever \
--model_name BAAI/bge-m3 \
--corpus_path data/wiki-2018 \
--port 8001
Deploying LLM Services
Deploy language models using vLLM backend:
# Deploy OpenAI-compatible LLM server
ultrarag serve llm \
--model_name Qwen/Qwen2.5-72B-Instruct \
--port 8000 \
--gpu_memory_utilization 0.8
Using External API Services
Configure UltraRAG to use external APIs:
# In your pipeline configuration
llm:
provider: "openai"
api_key: "your_api_key"
model: "gpt-4"
base_url: "https://api.openai.com/v1"
retriever:
provider: "custom"
endpoint: "http://your-retriever-service:8001"
Advanced Configuration Examples
Example 3: Multi-Step Reasoning with IterRetGen
# examples/advanced_iterretgen.yaml
pipeline:
name: "iterative_retrieval_generation"
components:
- retriever:
model: "BAAI/bge-m3"
top_k: 10
iterations: 3
- llm:
model: "Qwen/Qwen2.5-72B-Instruct"
temperature: 0.1
max_tokens: 1024
- reasoning:
strategy: "iterative"
max_iterations: 5
convergence_threshold: 0.95
evaluation:
dataset: "hotpotqa"
metrics: ["exact_match", "f1_score", "retrieval_precision"]
Example 4: Custom Search Strategy
# examples/custom_search.yaml
pipeline:
name: "custom_search_rag"
components:
- search:
strategy: "search-o1"
search_depth: 3
query_expansion: true
rerank_threshold: 0.7
- retriever:
model: "sentence-transformers/all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
- generator:
model: "microsoft/DialoGPT-large"
response_length: "medium"
Performance Optimization
GPU Acceleration Setup
# Verify CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Configure GPU settings in your pipeline
gpu_config:
enabled: true
device_map: "auto"
memory_fraction: 0.8
mixed_precision: true
Memory Optimization
# In your pipeline configuration
optimization:
batch_size: 8
gradient_checkpointing: true
cpu_offload: true
memory_efficient_attention: true
Debugging and Troubleshooting
Common Issues and Solutions
Issue 1: CUDA Out of Memory
# Solution: Reduce batch size or use CPU offload
# In your configuration:
optimization:
batch_size: 2
cpu_offload: true
Issue 2: Slow Retrieval Performance
# Solution: Use approximate search or reduce corpus size
retriever:
search_type: "approximate"
index_type: "faiss"
nprobe: 10
Issue 3: Model Loading Errors
# Solution: Check model availability and permissions
# Verify model download:
huggingface-cli download Qwen/Qwen2.5-7B-Instruct
Debug Mode
Enable detailed logging for troubleshooting:
# Run with debug output
ultrarag run examples/rag.yaml --debug --verbose
# Check logs
tail -f logs/ultrarag.log
Integration with Other Tools
Jupyter Notebook Integration
# In Jupyter notebook
import ultrarag
# Load and run pipeline
pipeline = ultrarag.load_pipeline("examples/rag.yaml")
results = pipeline.run(question="What is machine learning?")
print(results)
API Integration
# RESTful API usage
import requests
response = requests.post(
"http://localhost:8000/v1/chat/completions",
json={
"model": "ultrarag",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"rag_enabled": True
}
)
Best Practices
1. Configuration Management
- Use version control for your pipeline configurations
- Maintain separate configs for development and production
- Document custom parameters and their effects
2. Data Preparation
- Ensure consistent document formatting
- Implement proper text preprocessing
- Use appropriate chunk sizes for your domain
3. Evaluation Strategy
- Establish baseline metrics before optimization
- Use multiple evaluation datasets
- Implement A/B testing for configuration changes
4. Resource Management
- Monitor GPU memory usage
- Implement proper caching strategies
- Use batch processing for large-scale evaluation
Production Deployment
Docker Compose Setup
# docker-compose.yml
version: '3.8'
services:
ultrarag-api:
image: ultrarag:v2.0.0-beta
ports:
- "8000:8000"
environment:
- CUDA_VISIBLE_DEVICES=0
volumes:
- ./data:/app/data
- ./configs:/app/configs
command: ultrarag serve api --config configs/production.yaml
retriever:
image: ultrarag:v2.0.0-beta
ports:
- "8001:8001"
command: ultrarag serve retriever --port 8001
Kubernetes Deployment
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ultrarag-deployment
spec:
replicas: 3
selector:
matchLabels:
app: ultrarag
template:
metadata:
labels:
app: ultrarag
spec:
containers:
- name: ultrarag
image: ultrarag:v2.0.0-beta
ports:
- containerPort: 8000
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
Conclusion
UltraRAG 2.0 represents a significant advancement in making RAG systems accessible to researchers and developers. With its low-code approach, extensive baseline support, and flexible architecture, you can rapidly prototype and deploy sophisticated RAG applications.
Key Takeaways
- Easy Setup: Quick installation with uv or Docker
- Rich Ecosystem: 17+ datasets and multiple baseline methods
- Flexible Configuration: YAML-based pipeline definition
- Production Ready: Docker and Kubernetes deployment options
- Research Friendly: Built-in evaluation and debugging tools
Next Steps
- Experiment with different baseline methods
- Test on your domain-specific datasets
- Customize retrievers and generators for your use case
- Deploy in production with monitoring and scaling
For more advanced features and the latest updates, visit the official UltraRAG documentation and GitHub repository.
This tutorial covered the essential aspects of UltraRAG 2.0. For specific implementation details and advanced configurations, refer to the official documentation and example repositories.