Stanford STORM: Complete Guide to the LLM-Based Knowledge Curation Agent System
⏱️ Estimated reading time: 10 min
Introduction
STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking) developed at Stanford is an LLM-based knowledge curation agent system. With 25.4k GitHub stars, this project is an innovative system that automatically researches topics and generates full-length reports complete with citations.
STORM is not just a text generator. It is an intelligent agent system that raises questions from diverse perspectives, collects information, and organizes it systematically to produce Wikipedia-quality reports.
This guide covers the core features of STORM and Co-STORM, practical usage, and customization methods in full detail.
STORM System Overview
Core Features
STORM has the following innovative characteristics:
- Automated research: Multi-angle information gathering on a topic
- Citation system: Source attribution for all content
- Structured organization: Wikipedia-style hierarchical outlines
- Multiple perspectives: Approach from various expert viewpoints
- Collaborative mode: Human-AI collaboration support through Co-STORM
- Customization: Applicable to various domains
STORM vs Co-STORM
graph TD
A["STORM Ecosystem"] --> B["STORM<br/>Automated System"]
A --> C["Co-STORM<br/>Collaborative System"]
B --> D["Fully Automated<br/>Report Generation"]
B --> E["Wikipedia-Style<br/>Outline"]
B --> F["Multi-Perspective<br/>Information Gathering"]
C --> G["Human-AI Collaboration"]
C --> H["Real-Time Dialogue"]
C --> I["User-Engaged<br/>Knowledge Curation"]
STORM Architecture Deep Dive
4 Core Modules
STORM consists of the following 4 modules:
1. Knowledge Curation Module
# Knowledge gathering process
class KnowledgeCurationModule:
def __init__(self, retriever, llm):
self.retriever = retriever # Bing, Google, etc.
self.llm = llm
def collect_information(self, topic):
"""Collect information from diverse perspectives"""
perspectives = self.generate_perspectives(topic)
collected_info = []
for perspective in perspectives:
queries = self.generate_queries(topic, perspective)
for query in queries:
results = self.retriever.search(query)
collected_info.extend(results)
return self.deduplicate_and_filter(collected_info)
2. Outline Generation Module
# Hierarchical outline generation
class OutlineGenerationModule:
def generate_outline(self, collected_info, topic):
"""Systematically organize collected information"""
key_concepts = self.extract_key_concepts(collected_info)
hierarchy = self.build_hierarchy(key_concepts)
outline = {
"title": topic,
"sections": []
}
for section in hierarchy:
outline["sections"].append({
"title": section["title"],
"subsections": section["subsections"],
"key_points": section["key_points"]
})
return outline
3. Article Generation Module
# Write article based on the outline
class ArticleGenerationModule:
def populate_outline(self, outline, knowledge_base):
"""Fill the outline with actual content"""
article = {
"title": outline["title"],
"content": []
}
for section in outline["sections"]:
section_content = self.write_section(
section,
knowledge_base,
citation_style="wikipedia"
)
article["content"].append(section_content)
return article
4. Article Polishing Module
# Final article refinement
class ArticlePolishingModule:
def polish_article(self, article):
"""Improve quality and consistency of the article"""
polished = {
"title": article["title"],
"content": []
}
for section in article["content"]:
# Unify style, remove duplicates, clean citations
polished_section = self.improve_writing_quality(section)
polished_section = self.verify_citations(polished_section)
polished["content"].append(polished_section)
return polished
Installation and Usage
1. Installation
# Simple installation via pip
pip install knowledge-storm
# Or install from source
git clone https://github.com/stanford-oval/storm.git
cd storm
pip install -r requirements.txt
pip install -e .
2. API Key Setup
# Create secrets.toml file
# ============ language model configurations ============
OPENAI_API_KEY="your_openai_api_key"
OPENAI_API_TYPE="openai"
# For Azure OpenAI
OPENAI_API_TYPE="azure"
AZURE_API_BASE="your_azure_api_base_url"
AZURE_API_VERSION="your_azure_api_version"
# ============ retriever configurations ============
BING_SEARCH_API_KEY="your_bing_search_api_key"
# ============ encoder configurations ============
ENCODER_API_TYPE="openai"
3. Basic STORM Usage
# Basic STORM usage example
from knowledge_storm import STORMWikiRunnerArguments, STORMWikiRunner
from knowledge_storm import STORMWikiLMConfigs
# Initialize configuration
lm_configs = STORMWikiLMConfigs()
runner_args = STORMWikiRunnerArguments(
output_dir="./storm_output",
max_conv_turn=5,
max_perspective=5
)
# Run STORM
runner = STORMWikiRunner(lm_configs)
# Set topic and run
topic = "Artificial Intelligence in Healthcare"
runner.run(
topic=topic,
do_research=True,
do_generate_outline=True,
do_generate_article=True,
do_polish_article=True
)
# Check results
print(f"Generated article saved to: {runner_args.output_dir}")
4. Command Line Interface
# Fully automated run
python examples/storm_examples/run_storm_wiki_gpt.py \
--output-dir ./results \
--retriever bing \
--do-research \
--do-generate-outline \
--do-generate-article \
--do-polish-article
# Run only specific steps
python examples/storm_examples/run_storm_wiki_gpt.py \
--output-dir ./results \
--retriever bing \
--do-research # Research only
Co-STORM: Collaborative AI System
Co-STORM Overview
Co-STORM is an innovative system for human-AI collaborative knowledge curation:
- Real-time dialogue: Conversation between user and AI agents
- Multiple experts: Several AI experts collaborating
- User participation: Ability to intervene in the dialogue at any time
- Dynamic adjustment: Direction adjusted based on user feedback
Co-STORM Usage
# Initialize and run Co-STORM
from knowledge_storm import CoStormRunner
# Create Co-STORM runner
costorm_runner = CoStormRunner(
args=costorm_args,
lm_configs=lm_configs,
rm=rm,
conv_simulator_lm=conv_simulator_lm,
topic=topic,
callback_handler=StreamlitCallbackHandler()
)
# Start collaborative session
costorm_runner.warm_start()
# Progress step by step
conv_turn = costorm_runner.step() # Observe AI agent dialogue
costorm_runner.step(user_utterance="User opinion added") # User intervention
# Generate final report
costorm_runner.knowledge_base.reorganize()
article = costorm_runner.generate_report()
Co-STORM Run Example
# Co-STORM run command
python examples/costorm_examples/run_costorm_gpt.py \
--output-dir ./costorm_results \
--retriever bing
Advanced Customization
1. Custom Search Engine
# Implement custom search system
from knowledge_storm.interface import Retriever
class CustomRetriever(Retriever):
def __init__(self, custom_api_key):
self.api_key = custom_api_key
def retrieve(self, query, k=10):
"""Custom search logic"""
# Integrate your search system
results = self.search_custom_database(query)
return [
{
"title": result["title"],
"content": result["content"],
"url": result["url"]
}
for result in results[:k]
]
def search_custom_database(self, query):
# Custom database search implementation
pass
2. Domain-Specific Modules
# Medical domain-specific STORM
class MedicalSTORMRunner(STORMWikiRunner):
def __init__(self, lm_configs):
super().__init__(lm_configs)
# Medical expert prompt configuration
self.medical_prompts = {
"research": "Research from a medical expert perspective...",
"outline": "Structure in medical textbook style...",
"article": "Write for easy understanding by medical staff..."
}
def customize_for_medical_domain(self):
"""Customization for the medical domain"""
# Load medical glossary
self.load_medical_terminology()
# Set medical citation style
self.set_medical_citation_style()
3. Multiple Output Formats
# Support for multiple report formats
class MultiFormatSTORM(STORMWikiRunner):
def generate_report(self, format_type="wikipedia"):
"""Generate reports in various formats"""
base_content = super().generate_report()
if format_type == "academic":
return self.convert_to_academic_paper(base_content)
elif format_type == "presentation":
return self.convert_to_slides(base_content)
elif format_type == "executive_summary":
return self.create_executive_summary(base_content)
else:
return base_content
def convert_to_academic_paper(self, content):
"""Convert to academic paper format"""
return {
"abstract": self.generate_abstract(content),
"introduction": content["sections"][0],
"literature_review": self.create_literature_review(content),
"conclusion": content["sections"][-1],
"references": content["citations"]
}
Production Deployment Guide
1. Docker Containerization
# Dockerfile for STORM
FROM python:3.9-slim
# Install system packages
RUN apt-get update && apt-get install -y \
git \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install STORM
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
RUN pip install knowledge-storm
# Application code
COPY . .
# Environment variables
ENV PYTHONPATH=/app
ENV STORM_OUTPUT_DIR=/app/outputs
# Service port
EXPOSE 8000
# Start service
CMD ["python", "storm_server.py"]
2. Web Service Implementation
# FastAPI-based STORM service
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import asyncio
app = FastAPI(title="STORM API", version="1.0.0")
class ResearchRequest(BaseModel):
topic: str
max_conv_turn: int = 5
max_perspective: int = 5
output_format: str = "wikipedia"
class ResearchResponse(BaseModel):
task_id: str
status: str
result_url: str = None
@app.post("/research", response_model=ResearchResponse)
async def create_research_task(
request: ResearchRequest,
background_tasks: BackgroundTasks
):
task_id = generate_task_id()
# Run STORM in the background
background_tasks.add_task(
run_storm_research,
task_id,
request.topic,
request.max_conv_turn,
request.max_perspective
)
return ResearchResponse(
task_id=task_id,
status="processing"
)
async def run_storm_research(task_id, topic, max_conv_turn, max_perspective):
"""Background STORM execution"""
try:
runner = STORMWikiRunner(lm_configs)
result = runner.run(
topic=topic,
do_research=True,
do_generate_outline=True,
do_generate_article=True,
do_polish_article=True
)
# Save result
save_result(task_id, result)
update_task_status(task_id, "completed")
except Exception as e:
update_task_status(task_id, "failed", str(e))
@app.get("/research/{task_id}")
async def get_research_result(task_id: str):
"""Retrieve research result"""
status = get_task_status(task_id)
if status["status"] == "completed":
result = load_result(task_id)
return {
"task_id": task_id,
"status": "completed",
"result": result
}
else:
return {
"task_id": task_id,
"status": status["status"],
"message": status.get("message")
}
3. Kubernetes Deployment
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: storm-api
spec:
replicas: 3
selector:
matchLabels:
app: storm-api
template:
metadata:
labels:
app: storm-api
spec:
containers:
- name: storm-api
image: your-registry/storm-api:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: storm-secrets
key: openai-api-key
- name: BING_SEARCH_API_KEY
valueFrom:
secretKeyRef:
name: storm-secrets
key: bing-api-key
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
volumeMounts:
- name: storm-storage
mountPath: /app/outputs
volumes:
- name: storm-storage
persistentVolumeClaim:
claimName: storm-pvc
---
apiVersion: v1
kind: Service
metadata:
name: storm-service
spec:
selector:
app: storm-api
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
Performance Optimization and Scaling
1. Parallel Processing Optimization
# Performance improvement through parallel processing
import asyncio
from concurrent.futures import ThreadPoolExecutor
class OptimizedSTORMRunner:
def __init__(self, max_workers=4):
self.executor = ThreadPoolExecutor(max_workers=max_workers)
async def parallel_research(self, topic, perspectives):
"""Multi-perspective parallel research"""
tasks = []
for perspective in perspectives:
task = asyncio.create_task(
self.research_perspective(topic, perspective)
)
tasks.append(task)
results = await asyncio.gather(*tasks)
return self.merge_research_results(results)
async def research_perspective(self, topic, perspective):
"""Research from a specific perspective"""
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
self.executor,
self.single_perspective_research,
topic,
perspective
)
2. Caching System
# Efficient caching implementation
from functools import lru_cache
import hashlib
import pickle
class STORMCache:
def __init__(self, cache_dir="./storm_cache"):
self.cache_dir = cache_dir
os.makedirs(cache_dir, exist_ok=True)
def get_cache_key(self, topic, parameters):
"""Generate cache key"""
content = f"{topic}_{str(sorted(parameters.items()))}"
return hashlib.md5(content.encode()).hexdigest()
def get_cached_result(self, cache_key):
"""Look up cached result"""
cache_file = os.path.join(self.cache_dir, f"{cache_key}.pkl")
if os.path.exists(cache_file):
with open(cache_file, 'rb') as f:
return pickle.load(f)
return None
def save_to_cache(self, cache_key, result):
"""Save result to cache"""
cache_file = os.path.join(self.cache_dir, f"{cache_key}.pkl")
with open(cache_file, 'wb') as f:
pickle.dump(result, f)
# Cached STORM runner
class CachedSTORMRunner(STORMWikiRunner):
def __init__(self, lm_configs):
super().__init__(lm_configs)
self.cache = STORMCache()
def run_with_cache(self, topic, **kwargs):
cache_key = self.cache.get_cache_key(topic, kwargs)
cached_result = self.cache.get_cached_result(cache_key)
if cached_result:
print(f"Using cached result for: {topic}")
return cached_result
result = self.run(topic=topic, **kwargs)
self.cache.save_to_cache(cache_key, result)
return result
Datasets and Benchmarks
FreshWiki Dataset
FreshWiki is a high-quality dataset for STORM evaluation:
- Scale: 100 high-quality Wikipedia articles
- Period: Most-edited pages from February 2022 to September 2023
- Use: Automated knowledge curation research
# FreshWiki dataset utilization
from datasets import load_dataset
# Load dataset from Hugging Face
dataset = load_dataset("stanford-oval/FreshWiki")
# Use evaluation data
for item in dataset["train"]:
topic = item["title"]
reference_article = item["content"]
# Generate article with STORM
generated_article = storm_runner.run(topic=topic)
# Quality evaluation
score = evaluate_quality(generated_article, reference_article)
print(f"Topic: {topic}, Quality Score: {score}")
WildSeek Dataset
WildSeek is a dataset containing complex information-seeking patterns from real users:
# Evaluate Co-STORM with WildSeek dataset
wildseek_data = load_dataset("stanford-oval/WildSeek")
for item in wildseek_data["train"]:
topic = item["topic"]
user_goal = item["user_goal"]
# Simulate collaborative session with Co-STORM
costorm_runner = CoStormRunner(topic=topic)
costorm_runner.set_user_goal(user_goal)
result = costorm_runner.collaborative_research()
print(f"Topic: {topic}")
print(f"User Goal: {user_goal}")
print(f"Result Quality: {evaluate_result(result)}")
Comparative Analysis: STORM vs Existing Systems
Feature Comparison
| Feature | STORM | ChatGPT | Claude | Perplexity |
|---|---|---|---|---|
| Automated research | Yes | No | No | Yes |
| Structured outline | Yes | No | No | No |
| Citation system | Yes | No | No | Yes |
| Multiple perspectives | Yes | No | No | No |
| Collaborative mode | Yes | No | No | No |
| Customization | Yes | No | No | No |
Performance Benchmark
# Performance comparison experiment
def benchmark_systems():
topics = [
"Quantum Computing Applications",
"Climate Change Mitigation Strategies",
"Artificial Intelligence Ethics"
]
results = {
"STORM": [],
"ChatGPT": [],
"Claude": [],
"Perplexity": []
}
for topic in topics:
# Evaluate STORM
storm_result = storm_runner.run(topic=topic)
storm_score = evaluate_comprehensive_quality(storm_result, topic)
results["STORM"].append(storm_score)
# Compare with other systems...
return results
# Evaluation metrics
def evaluate_comprehensive_quality(result, topic):
"""Comprehensive quality evaluation"""
metrics = {
"factual_accuracy": evaluate_factual_accuracy(result),
"completeness": evaluate_completeness(result, topic),
"citation_quality": evaluate_citations(result),
"structure_quality": evaluate_structure(result),
"readability": evaluate_readability(result)
}
return sum(metrics.values()) / len(metrics)
Real-World Use Cases
1. Education
# Generate educational materials
class EducationalSTORM(STORMWikiRunner):
def generate_course_material(self, topic, education_level="university"):
"""Generate learning materials appropriate for the education level"""
# Customize by education level
if education_level == "high_school":
complexity_level = "basic"
citation_style = "simplified"
elif education_level == "university":
complexity_level = "intermediate"
citation_style = "academic"
else:
complexity_level = "advanced"
citation_style = "scholarly"
result = self.run(
topic=topic,
complexity_level=complexity_level,
citation_style=citation_style
)
# Add learning objectives and quiz
result["learning_objectives"] = self.generate_learning_objectives(result)
result["quiz_questions"] = self.generate_quiz(result)
return result
# Usage example
edu_storm = EducationalSTORM(lm_configs)
course_material = edu_storm.generate_course_material(
topic="Machine Learning Fundamentals",
education_level="university"
)
2. Corporate Research
# Market analysis report for enterprise use
class BusinessSTORM(STORMWikiRunner):
def generate_market_analysis(self, company_or_industry):
"""Generate market analysis report"""
# Set business perspectives
business_perspectives = [
"Market Size and Growth",
"Competitive Landscape",
"Consumer Trends",
"Regulatory Environment",
"Technology Disruption",
"Investment Opportunities"
]
result = self.run(
topic=company_or_industry,
perspectives=business_perspectives,
citation_style="business"
)
# Add business insights
result["executive_summary"] = self.generate_executive_summary(result)
result["swot_analysis"] = self.generate_swot_analysis(result)
result["recommendations"] = self.generate_recommendations(result)
return result
# Usage example
business_storm = BusinessSTORM(lm_configs)
market_report = business_storm.generate_market_analysis("Electric Vehicle Industry")
3. Journalism
# In-depth research for investigative reporting
class JournalismSTORM(STORMWikiRunner):
def investigative_research(self, topic):
"""In-depth research for investigative reporting"""
journalism_perspectives = [
"Who (Key Players)",
"What (Core Facts)",
"When (Timeline)",
"Where (Geographic Context)",
"Why (Motivations and Causes)",
"How (Mechanisms and Processes)"
]
result = self.run(
topic=topic,
perspectives=journalism_perspectives,
fact_checking=True,
source_verification=True
)
# Add journalism elements
result["fact_check_report"] = self.generate_fact_check(result)
result["source_credibility"] = self.assess_source_credibility(result)
result["follow_up_questions"] = self.generate_follow_up_questions(result)
return result
Conclusion
Stanford STORM is an innovative system that presents a new paradigm for knowledge curation.
Key Advantages
- Systematic approach: 4-stage modular pipeline
- Multiple perspectives: Information gathering from various expert viewpoints
- Citation system: Source attribution for all content
- Collaborative features: Human-AI collaboration through Co-STORM
- Customization: Applicable to various domains
- Open source: Free use under MIT license
Recommended Use Cases
- Researchers: Literature review and systematic review writing
- Educators: Educational materials and lecture note generation
- Enterprises: Market analysis and competitor research
- Journalists: Investigative reporting and fact-checking
- Students: Research for assignments and projects
STORM and Co-STORM have the potential to fundamentally change the way knowledge work is done, going beyond mere tools. As the 25.4k GitHub stars attest, many users have already recognized their value.
Try incorporating STORM into your knowledge curation workflow!
Reference links: