RAGLight Complete Guide: From Basic RAG to Agentic Workflows

⏱️ Estimated Reading Time: 15 minutes

Introduction

RAGLight is a lightweight, modular Python framework designed to simplify the implementation of Retrieval-Augmented Generation (RAG). By combining document retrieval with large language models (LLMs), RAGLight enables you to build context-aware AI systems that can answer questions based on your own documents and knowledge bases.

In this comprehensive tutorial, you’ll learn how to:

Set up RAGLight with various LLM providers (Ollama, OpenAI, Mistral)
Build basic RAG pipelines for document-based question answering
Implement Agentic RAG for multi-step reasoning tasks
Use RAT (Retrieval-Augmented Thinking) for enhanced reasoning
Integrate external tools using MCP (Model Context Protocol)

What Makes RAGLight Special?

RAGLight stands out for its:

Modular Architecture: Easily swap LLMs, embeddings, and vector stores
Multiple Provider Support: Ollama, OpenAI, Mistral, LMStudio, vLLM, Google AI
Advanced Pipelines: Basic RAG, Agentic RAG, and RAT with reasoning layers
MCP Integration: Connect external tools and data sources seamlessly
Flexible Configuration: Customize every aspect of your RAG pipeline

Prerequisites

Before starting this tutorial, ensure you have:

1. Python Environment

# Check Python version (3.8 or higher required)
python3 --version

# Create a virtual environment (recommended)
python3 -m venv raglight-env
source raglight-env/bin/activate  # On macOS/Linux
# raglight-env\Scripts\activate  # On Windows

2. Ollama Installation (for local LLM)

# macOS
brew install ollama

# Or download from https://ollama.ai/download

# Start Ollama service
ollama serve

# Pull a model (in a new terminal)
ollama pull llama3.2:3b

Alternative: Use OpenAI or Mistral API if you prefer cloud-based LLMs.

3. Install RAGLight

pip install raglight

Installation and Setup

Environment Configuration

Create a .env file to store your API keys (if using cloud providers):

# .env file
OPENAI_API_KEY=your_openai_key_here
MISTRAL_API_KEY=your_mistral_key_here

Project Structure

Set up your project directory:

mkdir raglight-tutorial
cd raglight-tutorial
mkdir data
mkdir knowledge_base

Sample Data Creation

Create some sample documents for testing:

# data/document1.txt
cat > data/document1.txt << 'EOF'
RAGLight is a modular Python framework for Retrieval-Augmented Generation.
It supports multiple LLM providers including Ollama, OpenAI, and Mistral.
Key features include flexible vector store integration with ChromaDB and FAISS.
EOF

# data/document2.txt
cat > data/document2.txt << 'EOF'
Agentic RAG extends traditional RAG by incorporating autonomous agents.
These agents can perform multi-step reasoning and dynamic information retrieval.
Use cases include complex question answering and research assistants.
EOF

# data/document3.txt
cat > data/document3.txt << 'EOF'
Retrieval-Augmented Thinking (RAT) adds a specialized reasoning layer.
It uses reasoning LLMs to enhance response quality and analytical depth.
RAT is ideal for tasks requiring deep thinking and multi-hop reasoning.
EOF

Basic RAG Pipeline

Understanding the RAG Architecture

The basic RAG pipeline consists of three main components:

Document Ingestion: Your documents are split into chunks and converted to embeddings
Vector Storage: Embeddings are stored in a vector database (ChromaDB, FAISS, etc.)
Retrieval & Generation: When queried, relevant documents are retrieved and passed to the LLM

Implementation

Here’s a complete example of a basic RAG pipeline:

#!/usr/bin/env python3
"""Basic RAG Pipeline with RAGLight"""

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings
from raglight.models.data_source_model import FolderSource
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Setup logging
Settings.setup_logging()

# Vector Store Configuration
vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    api_base=Settings.DEFAULT_OLLAMA_CLIENT,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./chroma_db",
    collection_name="my_knowledge_base"
)

# RAG Configuration
config = RAGConfig(
    llm="llama3.2:3b",  # Ollama model
    k=5,  # Number of documents to retrieve
    provider=Settings.OLLAMA,
    system_prompt=Settings.DEFAULT_SYSTEM_PROMPT,
    knowledge_base=[FolderSource(path="./data")]
)

# Initialize and build pipeline
print("Initializing RAG pipeline...")
pipeline = RAGPipeline(config, vector_store_config)

print("Building knowledge base...")
pipeline.build()

# Query the pipeline
query = "What are the key features of RAGLight?"
print(f"\nQuery: {query}")

response = pipeline.generate(query)
print(f"\nResponse:\n{response}")

Key Configuration Options

Vector Store Options:

database: CHROMA, FAISS, or QDRANT
provider: HUGGINGFACE, OLLAMA, or OPENAI for embeddings
persist_directory: Where to store the vector database

RAG Options:

llm: Model name (e.g., “llama3.2:3b”, “gpt-4”, “mistral-large-2411”)
k: Number of relevant documents to retrieve
provider: OLLAMA, OPENAI, MISTRAL, LMSTUDIO, GOOGLE

Using Different LLM Providers

OpenAI:

config = RAGConfig(
    llm="gpt-4",
    k=5,
    provider=Settings.OPENAI,
    api_key=Settings.OPENAI_API_KEY,
    knowledge_base=[FolderSource(path="./data")]
)

Mistral:

config = RAGConfig(
    llm="mistral-large-2411",
    k=5,
    provider=Settings.MISTRAL,
    api_key=Settings.MISTRAL_API_KEY,
    knowledge_base=[FolderSource(path="./data")]
)

Agentic RAG Pipeline

What is Agentic RAG?

Agentic RAG extends traditional RAG by incorporating an autonomous agent that can:

Perform multi-step reasoning
Decide when to retrieve additional information
Iterate through multiple retrieval-generation cycles
Handle complex questions requiring multiple data sources

Implementation

"""Agentic RAG Pipeline with RAGLight"""

from raglight.rag.simple_agentic_rag_api import AgenticRAGPipeline
from raglight.config.agentic_rag_config import AgenticRAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings
from raglight.models.data_source_model import FolderSource
from dotenv import load_dotenv

load_dotenv()
Settings.setup_logging()

# Vector Store Configuration
vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    api_base=Settings.DEFAULT_OLLAMA_CLIENT,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./agentic_chroma_db",
    collection_name="agentic_knowledge_base"
)

# Agentic RAG Configuration
config = AgenticRAGConfig(
    provider=Settings.MISTRAL,
    model="mistral-large-2411",
    k=10,
    system_prompt=Settings.DEFAULT_AGENT_PROMPT,
    max_steps=4,  # Maximum reasoning steps
    api_key=Settings.MISTRAL_API_KEY,
    knowledge_base=[FolderSource(path="./data")]
)

# Initialize and build
print("Initializing Agentic RAG pipeline...")
agentic_rag = AgenticRAGPipeline(config, vector_store_config)

print("Building knowledge base...")
agentic_rag.build()

# Complex query requiring multiple steps
query = """
Compare the capabilities of basic RAG and Agentic RAG.
What are the specific use cases where Agentic RAG would be more beneficial?
"""

print(f"\nQuery: {query}")
response = agentic_rag.generate(query)
print(f"\nResponse:\n{response}")

Key Features of Agentic RAG

max_steps: Controls how many reasoning iterations the agent can perform

# Simple query: fewer steps needed
config = AgenticRAGConfig(max_steps=2, ...)

# Complex analysis: more steps allowed
config = AgenticRAGConfig(max_steps=10, ...)

Custom Agent Prompt: Tailor the agent’s behavior

custom_agent_prompt = """
You are a research assistant. When answering questions:
1. Break down complex queries into sub-questions
2. Retrieve relevant information for each sub-question
3. Synthesize findings into a comprehensive answer
4. Cite sources when possible
"""

config = AgenticRAGConfig(
    system_prompt=custom_agent_prompt,
    ...
)

RAT (Retrieval-Augmented Thinking)

Understanding RAT

RAT adds a specialized reasoning layer to the RAG pipeline:

Retrieval: Fetch relevant documents
Reasoning: Use a reasoning LLM to analyze retrieved content
Thinking: Generate intermediate reasoning steps
Generation: Produce final answer with enhanced context

Implementation

"""RAT Pipeline with RAGLight"""

from raglight.rat.simple_rat_api import RATPipeline
from raglight.config.rat_config import RATConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings
from raglight.models.data_source_model import FolderSource

Settings.setup_logging()

# Vector Store Configuration
vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    api_base=Settings.DEFAULT_OLLAMA_CLIENT,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./rat_chroma_db",
    collection_name="rat_knowledge_base"
)

# RAT Configuration
config = RATConfig(
    cross_encoder_model=Settings.DEFAULT_CROSS_ENCODER_MODEL,
    llm="llama3.2:3b",
    k=Settings.DEFAULT_K,
    provider=Settings.OLLAMA,
    system_prompt=Settings.DEFAULT_SYSTEM_PROMPT,
    reasoning_llm=Settings.DEFAULT_REASONING_LLM,
    reflection=3,  # Number of reasoning iterations
    knowledge_base=[FolderSource(path="./data")]
)

# Initialize and build
print("Initializing RAT pipeline...")
pipeline = RATPipeline(config, vector_store_config)

print("Building knowledge base...")
pipeline.build()

# Query requiring deep reasoning
query = """
Analyze the architectural differences between RAG, Agentic RAG, and RAT.
What are the trade-offs in terms of complexity, performance, and output quality?
"""

print(f"\nQuery: {query}")
response = pipeline.generate(query)
print(f"\nResponse:\n{response}")

RAT Configuration Options

reflection: Number of reasoning iterations

# Quick reasoning
config = RATConfig(reflection=1, ...)

# Deep analytical thinking
config = RATConfig(reflection=5, ...)

cross_encoder_model: Reranking model for better retrieval

config = RATConfig(
    cross_encoder_model="cross-encoder/ms-marco-MiniLM-L-12-v2",
    ...
)

MCP Integration

What is MCP?

Model Context Protocol (MCP) allows your RAG pipeline to interact with external tools and services. This enables:

Web search integration
Database queries
API calls to external services
Code execution environments
Custom tool integration

MCP Server Setup

First, configure your MCP server (example using MCPClient):

"""MCP Server Configuration"""

from raglight.rag.simple_agentic_rag_api import AgenticRAGPipeline
from raglight.config.agentic_rag_config import AgenticRAGConfig
from raglight.config.settings import Settings

# Configure MCP server URL
config = AgenticRAGConfig(
    provider=Settings.OPENAI,
    model="gpt-4o",
    k=10,
    mcp_config=[
        {"url": "http://127.0.0.1:8001/sse"}  # Your MCP server URL
    ],
    system_prompt=Settings.DEFAULT_AGENT_PROMPT,
    max_steps=4,
    api_key=Settings.OPENAI_API_KEY
)

# Initialize with MCP
pipeline = AgenticRAGPipeline(config, vector_store_config)
pipeline.build()

# The agent can now use external tools
query = "Search the web for recent updates on RAG frameworks and summarize findings"
response = pipeline.generate(query)

MCP Use Cases

Web Search Integration:

# Agent can search and incorporate web results
query = "What are the latest developments in RAG technology in 2024?"

Database Queries:

# Agent can query databases for real-time data
query = "Retrieve user statistics from our database and analyze trends"

API Integration:

# Agent can call external APIs
query = "Check weather API and recommend activities based on forecast"

Performance Comparison

Pipeline Characteristics

Pipeline Type	Complexity	Response Time	Use Case
Basic RAG	Low	Fast (< 5s)	Simple Q&A, document lookup
Agentic RAG	Medium	Moderate (5-15s)	Multi-step reasoning, research
RAT	High	Slower (15-30s)	Deep analysis, complex reasoning
RAG + MCP	Variable	Depends on tools	External tool integration

Choosing the Right Pipeline

Use Basic RAG when:

You need fast responses
Questions are straightforward
Single document lookup is sufficient

Use Agentic RAG when:

Questions require multiple steps
You need dynamic retrieval
Task involves research or exploration

Use RAT when:

Deep analytical thinking is required
Quality is more important than speed
Complex multi-hop reasoning is needed

Use MCP Integration when:

You need real-time external data
Task requires tool usage
Dynamic information is essential

Best Practices

1. Document Preparation

Chunk Size Optimization:

# For technical documents
chunk_size = 512

# For narrative content
chunk_size = 1024

Folder Organization:

knowledge_base/
├── technical_docs/
├── user_manuals/
├── api_reference/
└── faq/

2. Vector Store Management

Persistence:

# Always use persistent storage in production
vector_store_config = VectorStoreConfig(
    persist_directory="./prod_vectordb",
    collection_name="production_kb"
)

Collection Organization:

# Separate collections for different domains
collections = {
    "technical": "tech_docs_collection",
    "business": "business_docs_collection",
    "general": "general_knowledge_collection"
}

3. LLM Selection

Development:

# Use local models for development
config = RAGConfig(
    llm="llama3.2:3b",
    provider=Settings.OLLAMA
)

Production:

# Use more powerful models for production
config = RAGConfig(
    llm="gpt-4",
    provider=Settings.OPENAI
)

4. Error Handling

"""Robust RAG Pipeline with Error Handling"""

try:
    pipeline = RAGPipeline(config, vector_store_config)
    pipeline.build()
    response = pipeline.generate(query)
except Exception as e:
    print(f"Pipeline error: {e}")
    # Fallback to basic LLM without RAG
    response = fallback_generate(query)

5. Ignore Folders Configuration

When indexing code repositories, exclude unnecessary directories:

# Custom ignore folders
custom_ignore_folders = [
    ".venv",
    "venv",
    "node_modules",
    "__pycache__",
    ".git",
    "build",
    "dist",
    "my_custom_folder_to_ignore"
]

config = AgenticRAGConfig(
    ignore_folders=custom_ignore_folders,
    ...
)

6. Monitoring and Logging

"""Enable detailed logging"""

import logging

# Configure logging level
logging.basicConfig(level=logging.INFO)

# Or use RAGLight's setup
Settings.setup_logging()

# Monitor performance
import time

start_time = time.time()
response = pipeline.generate(query)
elapsed_time = time.time() - start_time

print(f"Query processed in {elapsed_time:.2f}s")

Advanced Customization

Custom Pipeline Builder

"""Custom RAG Pipeline with Builder Pattern"""

from raglight.rag.builder import Builder
from raglight.config.settings import Settings

# Build custom pipeline
rag = Builder() \
    .with_embeddings(
        Settings.HUGGINGFACE,
        model_name=Settings.DEFAULT_EMBEDDINGS_MODEL
    ) \
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./custom_db",
        collection_name="custom_collection"
    ) \
    .with_llm(
        Settings.OLLAMA,
        model_name="llama3.2:3b",
        system_prompt_file="./custom_prompt.txt",
        provider=Settings.OLLAMA
    ) \
    .build_rag(k=5)

# Ingest documents
rag.vector_store.ingest(data_path='./data')

# Query
response = rag.generate("Your question here")

Code Repository Indexing

"""Index code repositories"""

# Index code with signature extraction
rag.vector_store.ingest(repos_path=['./repo1', './repo2'])

# Search code
code_results = rag.vector_store.similarity_search("authentication logic")

# Search class signatures
class_results = rag.vector_store.similarity_search_class("User class definition")

GitHub Repository Integration

"""Index GitHub repositories directly"""

from raglight.models.data_source_model import GitHubSource

knowledge_base = [
    GitHubSource(url="https://github.com/Bessouat40/RAGLight"),
    GitHubSource(url="https://github.com/your-org/your-repo")
]

config = RAGConfig(
    knowledge_base=knowledge_base,
    ...
)

Docker Deployment

Dockerfile Example

FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Add host mapping for Ollama/LMStudio
# Run with: docker run --add-host=host.docker.internal:host-gateway your-image

CMD ["python", "app.py"]

Build and Run

# Build image
docker build -t raglight-app .

# Run with host network access (for Ollama)
docker run --add-host=host.docker.internal:host-gateway raglight-app

Troubleshooting

Common Issues

1. Ollama Connection Error:

# Check Ollama is running
# macOS/Linux:
ollama serve

# Update API base if needed
vector_store_config = VectorStoreConfig(
    api_base="http://localhost:11434",  # Default Ollama URL
    ...
)

2. Memory Issues:

# Reduce chunk size and k value
config = RAGConfig(
    k=3,  # Retrieve fewer documents
    ...
)

3. Slow Performance:

# Use smaller embedding models
vector_store_config = VectorStoreConfig(
    embedding_model="all-MiniLM-L6-v2",  # Smaller, faster model
    ...
)

4. Vector Store Errors:

# Clear and rebuild
rm -rf ./chroma_db
python rebuild_kb.py

Conclusion

RAGLight provides a powerful, flexible framework for building retrieval-augmented generation systems. Whether you need simple document Q&A or complex agentic workflows with external tool integration, RAGLight’s modular architecture makes it easy to build and scale.

Key Takeaways

Start Simple: Begin with Basic RAG and upgrade to Agentic RAG or RAT as needed
Choose Wisely: Select the right pipeline based on your use case and performance requirements
Customize Extensively: RAGLight’s modular design allows complete customization
Scale Gradually: Start locally with Ollama, then move to cloud providers for production

Next Steps

Experiment: Try different LLM providers and vector stores
Optimize: Tune k values, chunk sizes, and model selection
Integrate: Add MCP servers for external tool access
Deploy: Containerize with Docker for production deployment

Resources

RAGLight GitHub: https://github.com/Bessouat40/RAGLight
PyPI Package: https://pypi.org/project/raglight/
Ollama: https://ollama.ai
ChromaDB: https://www.trychroma.com
MCP Protocol: Search “Model Context Protocol” for documentation

Happy building with RAGLight! 🚀

Introduction

What Makes RAGLight Special?

Prerequisites

1. Python Environment

2. Ollama Installation (for local LLM)

3. Install RAGLight

Installation and Setup

Environment Configuration

Project Structure

Sample Data Creation

Basic RAG Pipeline

Understanding the RAG Architecture

Implementation

Key Configuration Options

Using Different LLM Providers

Agentic RAG Pipeline

What is Agentic RAG?

Implementation

Key Features of Agentic RAG

RAT (Retrieval-Augmented Thinking)

Understanding RAT

Implementation

RAT Configuration Options

MCP Integration

What is MCP?

MCP Server Setup

MCP Use Cases

Performance Comparison

Pipeline Characteristics

Choosing the Right Pipeline

Best Practices

1. Document Preparation

2. Vector Store Management

3. LLM Selection

4. Error Handling

5. Ignore Folders Configuration

6. Monitoring and Logging

Advanced Customization

Custom Pipeline Builder

Code Repository Indexing

GitHub Repository Integration

Docker Deployment

Dockerfile Example

Build and Run

Troubleshooting

Common Issues

Conclusion

Key Takeaways

Next Steps

Resources

참고

Goclone: 웹사이트를 몇 초 만에 컴퓨터로 복제하기

Goclone: Clone Any Website to Your Computer in Seconds

Goclone: استنساخ أي موقع ويب إلى جهازك في ثوانٍ

RAGLight 완벽 가이드: 기본 RAG부터 에이전트 워크플로우까지