Complete Guide to OpenLLMetry: LLM Observability with OpenTelemetry
⏱️ Estimated Reading Time: 15 minutes
Introduction
As Large Language Model (LLM) applications become increasingly complex and production-ready, monitoring and observability have become critical requirements. OpenLLMetry emerges as a comprehensive solution that brings enterprise-grade observability to LLM applications through OpenTelemetry standards.
OpenLLMetry is an open-source observability platform specifically designed for LLM applications. Built on top of OpenTelemetry, it provides complete visibility into your AI application’s performance, costs, and behavior while maintaining compatibility with existing observability infrastructure.
Why OpenLLMetry Matters
Traditional monitoring tools fall short when it comes to LLM applications. OpenLLMetry addresses unique challenges such as:
- Token Usage Tracking: Monitor input/output tokens and associated costs
- Latency Analysis: Track response times across different model providers
- Error Monitoring: Capture and analyze LLM-specific errors and failures
- Performance Optimization: Identify bottlenecks in AI workflows
- Cost Management: Monitor spending across multiple LLM providers
Prerequisites
Before diving into OpenLLMetry, ensure you have:
- Python 3.8+ installed on your system
- Basic understanding of OpenTelemetry concepts
- LLM application (OpenAI, Anthropic, etc.) to monitor
- Observability backend (optional, for advanced setups)
Part 1: Getting Started with OpenLLMetry
Installation and Basic Setup
Let’s start with the simplest possible setup. OpenLLMetry provides a convenient SDK that makes getting started effortless.
Step 1: Install OpenLLMetry SDK
# Install the core SDK
pip install traceloop-sdk
# For specific integrations, install additional packages
pip install openai anthropic # Example LLM providers
Step 2: Basic Initialization
The most straightforward way to start monitoring your LLM application is with a single line of code:
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry with default settings
Traceloop.init()
For local development, you might want to see traces immediately:
# Disable batch sending for immediate trace visibility
Traceloop.init(disable_batch=True)
Step 3: Your First Monitored LLM Call
Here’s a complete example that demonstrates basic monitoring:
import openai
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry
Traceloop.init(disable_batch=True)
# Configure OpenAI client
client = openai.OpenAI(api_key="your-api-key")
# Make a monitored LLM call
def generate_response(prompt):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": prompt}
],
max_tokens=100
)
return response.choices[0].message.content
# Test the monitored function
if __name__ == "__main__":
result = generate_response("Explain quantum computing in simple terms")
print(result)
When you run this code, OpenLLMetry automatically:
- Captures the request and response
- Records token usage and costs
- Measures response latency
- Tracks any errors that occur
Part 2: Advanced Configuration
Custom Tracing with Decorators
OpenLLMetry provides decorators for custom tracing of your application logic:
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
# Initialize OpenLLMetry
Traceloop.init()
@workflow(name="document_analysis_pipeline")
def analyze_document(document_text):
"""Main workflow for document analysis"""
summary = summarize_text(document_text)
sentiment = analyze_sentiment(document_text)
keywords = extract_keywords(document_text)
return {
"summary": summary,
"sentiment": sentiment,
"keywords": keywords
}
@task(name="text_summarization")
def summarize_text(text):
"""Summarize the input text"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Summarize the following text concisely."},
{"role": "user", "content": text}
],
max_tokens=150
)
return response.choices[0].message.content
@task(name="sentiment_analysis")
def analyze_sentiment(text):
"""Analyze sentiment of the text"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Analyze the sentiment of this text. Respond with: positive, negative, or neutral."},
{"role": "user", "content": text}
],
max_tokens=10
)
return response.choices[0].message.content
@task(name="keyword_extraction")
def extract_keywords(text):
"""Extract key terms from the text"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Extract 5 key terms from this text. Return as comma-separated list."},
{"role": "user", "content": text}
],
max_tokens=50
)
return response.choices[0].message.content
Environment-based Configuration
For production deployments, configure OpenLLMetry through environment variables:
# Set environment variables
export TRACELOOP_API_KEY="your-traceloop-api-key"
export TRACELOOP_BATCH_EXPORT="true"
export TRACELOOP_TELEMETRY="false" # Disable telemetry if needed
import os
from traceloop.sdk import Traceloop
# Production configuration
Traceloop.init(
api_key=os.getenv("TRACELOOP_API_KEY"),
disable_batch=os.getenv("TRACELOOP_BATCH_EXPORT", "true").lower() == "false",
telemetry_enabled=os.getenv("TRACELOOP_TELEMETRY", "true").lower() == "true"
)
Part 3: Integration with Popular LLM Frameworks
LangChain Integration
OpenLLMetry seamlessly integrates with LangChain applications:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry
Traceloop.init()
# Create LangChain components
llm = OpenAI(temperature=0.7)
prompt = PromptTemplate(
input_variables=["topic"],
template="Write a brief explanation about {topic}"
)
# Create and run chain
chain = LLMChain(llm=llm, prompt=prompt)
# This will be automatically traced
result = chain.run(topic="machine learning")
print(result)
LlamaIndex Integration
For LlamaIndex applications:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import OpenAI
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry
Traceloop.init()
# Configure LlamaIndex
llm = OpenAI(model="gpt-3.5-turbo")
# Load documents and create index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(llm=llm)
# Query with automatic tracing
response = query_engine.query("What are the main topics in these documents?")
print(response)
Part 4: Vector Database Monitoring
OpenLLMetry also monitors vector database operations:
Pinecone Integration
import pinecone
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry
Traceloop.init()
# Initialize Pinecone
pinecone.init(
api_key="your-pinecone-api-key",
environment="your-environment"
)
# Create index reference
index = pinecone.Index("your-index-name")
# Monitored vector operations
def search_similar_documents(query_vector, top_k=5):
"""Search for similar documents using vector similarity"""
results = index.query(
vector=query_vector,
top_k=top_k,
include_metadata=True
)
return results
# Monitored upsert operation
def store_document_embedding(doc_id, embedding, metadata):
"""Store document embedding in Pinecone"""
index.upsert([
(doc_id, embedding, metadata)
])
Chroma Integration
import chromadb
from traceloop.sdk import Traceloop
# Initialize OpenLLMetry
Traceloop.init()
# Initialize Chroma client
client = chromadb.Client()
# Get or create collection
collection = client.get_or_create_collection("documents")
# Monitored operations
def add_documents(documents, embeddings, ids, metadatas):
"""Add documents to Chroma collection"""
collection.add(
documents=documents,
embeddings=embeddings,
ids=ids,
metadatas=metadatas
)
def query_documents(query_text, n_results=5):
"""Query similar documents from Chroma"""
results = collection.query(
query_texts=[query_text],
n_results=n_results
)
return results
Part 5: Observability Backend Integration
Datadog Integration
Connect OpenLLMetry to Datadog for enterprise monitoring:
from opentelemetry import trace
from opentelemetry.exporter.datadog import DatadogExporter, DatadogSpanProcessor
from opentelemetry.sdk.trace import TracerProvider
from traceloop.sdk import Traceloop
# Configure Datadog exporter
tracer_provider = TracerProvider()
datadog_exporter = DatadogExporter(
agent_url="http://localhost:8126", # Datadog Agent URL
service="llm-application"
)
# Add Datadog span processor
tracer_provider.add_span_processor(
DatadogSpanProcessor(datadog_exporter)
)
# Set tracer provider
trace.set_tracer_provider(tracer_provider)
# Initialize OpenLLMetry with custom tracer
Traceloop.init()
Honeycomb Integration
For Honeycomb observability:
import os
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from traceloop.sdk import Traceloop
# Configure Honeycomb exporter
tracer_provider = TracerProvider()
otlp_exporter = OTLPSpanExporter(
endpoint="https://api.honeycomb.io/v1/traces",
headers={
"x-honeycomb-team": os.getenv("HONEYCOMB_API_KEY"),
"x-honeycomb-dataset": "llm-traces"
}
)
# Add batch span processor
tracer_provider.add_span_processor(
BatchSpanProcessor(otlp_exporter)
)
# Set tracer provider
trace.set_tracer_provider(tracer_provider)
# Initialize OpenLLMetry
Traceloop.init()
Part 6: Custom Metrics and Attributes
Adding Custom Attributes
Enhance traces with custom business logic:
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task
from opentelemetry import trace
# Initialize OpenLLMetry
Traceloop.init()
@task(name="customer_support_response")
def handle_customer_query(query, customer_id, priority="normal"):
"""Handle customer support query with custom attributes"""
# Get current span
current_span = trace.get_current_span()
# Add custom attributes
current_span.set_attribute("customer.id", customer_id)
current_span.set_attribute("query.priority", priority)
current_span.set_attribute("query.length", len(query))
# Determine model based on priority
model = "gpt-4" if priority == "high" else "gpt-3.5-turbo"
current_span.set_attribute("llm.model", model)
# Generate response
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful customer support agent."},
{"role": "user", "content": query}
]
)
# Add response attributes
response_text = response.choices[0].message.content
current_span.set_attribute("response.length", len(response_text))
current_span.set_attribute("response.satisfactory", "unknown") # Could be determined by ML model
return response_text
Custom Metrics Collection
Create custom metrics for business KPIs:
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader
import time
# Configure metrics
metric_reader = PeriodicExportingMetricReader(
ConsoleMetricExporter(),
export_interval_millis=5000
)
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))
# Create custom meters
meter = metrics.get_meter("llm_application")
# Create custom metrics
request_counter = meter.create_counter(
"llm_requests_total",
description="Total number of LLM requests"
)
response_time_histogram = meter.create_histogram(
"llm_response_time",
description="LLM response time in seconds"
)
token_usage_counter = meter.create_counter(
"llm_tokens_used",
description="Total tokens consumed"
)
def monitored_llm_call(prompt, model="gpt-3.5-turbo"):
"""LLM call with custom metrics"""
start_time = time.time()
try:
# Increment request counter
request_counter.add(1, {"model": model})
# Make LLM call
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
# Record response time
response_time = time.time() - start_time
response_time_histogram.record(response_time, {"model": model})
# Record token usage
usage = response.usage
token_usage_counter.add(usage.total_tokens, {
"model": model,
"type": "total"
})
token_usage_counter.add(usage.prompt_tokens, {
"model": model,
"type": "prompt"
})
token_usage_counter.add(usage.completion_tokens, {
"model": model,
"type": "completion"
})
return response.choices[0].message.content
except Exception as e:
request_counter.add(1, {"model": model, "status": "error"})
raise
Part 7: Production Best Practices
Error Handling and Resilience
Implement robust error handling for production environments:
from traceloop.sdk import Traceloop
from opentelemetry import trace
import logging
import sys
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize OpenLLMetry with error handling
try:
Traceloop.init(
disable_batch=False, # Enable batching for production
telemetry_enabled=False # Disable telemetry for privacy
)
logger.info("OpenLLMetry initialized successfully")
except Exception as e:
logger.error(f"Failed to initialize OpenLLMetry: {e}")
# Continue without tracing rather than failing the application
pass
def safe_llm_call(prompt, max_retries=3, backoff_factor=2):
"""LLM call with retry logic and comprehensive error handling"""
span = trace.get_current_span()
for attempt in range(max_retries):
try:
span.set_attribute("retry.attempt", attempt + 1)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
timeout=30 # Set timeout for production
)
span.set_attribute("request.successful", True)
return response.choices[0].message.content
except openai.RateLimitError as e:
span.set_attribute("error.type", "rate_limit")
span.set_attribute("error.message", str(e))
if attempt < max_retries - 1:
wait_time = backoff_factor ** attempt
logger.warning(f"Rate limit hit, waiting {wait_time}s before retry")
time.sleep(wait_time)
else:
span.set_attribute("request.successful", False)
raise
except openai.APIError as e:
span.set_attribute("error.type", "api_error")
span.set_attribute("error.message", str(e))
span.set_attribute("request.successful", False)
logger.error(f"OpenAI API error: {e}")
raise
except Exception as e:
span.set_attribute("error.type", "unknown")
span.set_attribute("error.message", str(e))
span.set_attribute("request.successful", False)
logger.error(f"Unexpected error: {e}")
raise
Performance Optimization
Optimize OpenLLMetry for high-throughput applications:
from traceloop.sdk import Traceloop
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace import TracerProvider
import os
# High-performance configuration
tracer_provider = TracerProvider()
# Configure batch processor for high throughput
batch_processor = BatchSpanProcessor(
span_exporter=your_exporter, # Your chosen exporter
max_queue_size=2048, # Increase queue size
export_timeout_millis=30000, # 30 second timeout
max_export_batch_size=512, # Larger batch sizes
schedule_delay_millis=500 # More frequent exports
)
tracer_provider.add_span_processor(batch_processor)
# Initialize with performance optimizations
Traceloop.init(
disable_batch=False,
resource_attributes={
"service.name": "llm-application",
"service.version": "1.0.0",
"deployment.environment": os.getenv("ENVIRONMENT", "production")
}
)
Security and Privacy Considerations
Implement security best practices:
from traceloop.sdk import Traceloop
from opentelemetry import trace
import hashlib
import re
# Initialize with privacy-focused configuration
Traceloop.init(
telemetry_enabled=False, # Disable telemetry
api_key=os.getenv("TRACELOOP_API_KEY") # Use environment variables
)
def sanitize_prompt(prompt):
"""Remove sensitive information from prompts before tracing"""
# Remove email addresses
prompt = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', prompt)
# Remove phone numbers
prompt = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', prompt)
# Remove credit card numbers
prompt = re.sub(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', '[CREDIT_CARD]', prompt)
return prompt
def secure_llm_call(prompt, include_prompt_in_trace=False):
"""LLM call with privacy protection"""
span = trace.get_current_span()
# Hash the original prompt for tracking without exposing content
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()[:16]
span.set_attribute("prompt.hash", prompt_hash)
span.set_attribute("prompt.length", len(prompt))
# Optionally include sanitized prompt
if include_prompt_in_trace:
sanitized_prompt = sanitize_prompt(prompt)
span.set_attribute("prompt.sanitized", sanitized_prompt)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
# Don't include response content in traces for privacy
response_text = response.choices[0].message.content
span.set_attribute("response.length", len(response_text))
span.set_attribute("response.hash", hashlib.sha256(response_text.encode()).hexdigest()[:16])
return response_text
Part 8: Monitoring and Alerting
Setting Up Alerts
Configure alerts for common LLM application issues:
from traceloop.sdk import Traceloop
from opentelemetry import trace
import time
# Initialize OpenLLMetry
Traceloop.init()
class LLMMonitor:
def __init__(self):
self.error_count = 0
self.total_requests = 0
self.total_cost = 0.0
self.response_times = []
def track_request(self, success=True, cost=0.0, response_time=0.0):
"""Track request metrics for alerting"""
self.total_requests += 1
self.total_cost += cost
self.response_times.append(response_time)
if not success:
self.error_count += 1
# Keep only last 100 response times for moving average
if len(self.response_times) > 100:
self.response_times.pop(0)
# Check for alert conditions
self.check_alerts()
def check_alerts(self):
"""Check for alerting conditions"""
# High error rate alert
if self.total_requests > 10:
error_rate = self.error_count / self.total_requests
if error_rate > 0.1: # 10% error rate
self.send_alert(f"High error rate: {error_rate:.2%}")
# High response time alert
if len(self.response_times) > 10:
avg_response_time = sum(self.response_times[-10:]) / 10
if avg_response_time > 5.0: # 5 second average
self.send_alert(f"High response time: {avg_response_time:.2f}s")
# Cost alert
if self.total_cost > 100.0: # $100 threshold
self.send_alert(f"High cost: ${self.total_cost:.2f}")
def send_alert(self, message):
"""Send alert (implement your preferred alerting method)"""
print(f"ALERT: {message}")
# Implement Slack, email, or other alerting here
# Global monitor instance
monitor = LLMMonitor()
def monitored_llm_call_with_alerting(prompt):
"""LLM call with monitoring and alerting"""
start_time = time.time()
span = trace.get_current_span()
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
# Calculate metrics
response_time = time.time() - start_time
cost = estimate_cost(response.usage) # Implement cost calculation
# Track successful request
monitor.track_request(success=True, cost=cost, response_time=response_time)
# Add metrics to span
span.set_attribute("request.cost", cost)
span.set_attribute("request.response_time", response_time)
return response.choices[0].message.content
except Exception as e:
response_time = time.time() - start_time
# Track failed request
monitor.track_request(success=False, response_time=response_time)
# Add error info to span
span.set_attribute("request.failed", True)
span.set_attribute("error.message", str(e))
raise
def estimate_cost(usage, model="gpt-3.5-turbo"):
"""Estimate request cost based on token usage"""
# Simplified cost calculation (update with current pricing)
pricing = {
"gpt-3.5-turbo": {"input": 0.001, "output": 0.002} # per 1K tokens
}
if model in pricing:
input_cost = (usage.prompt_tokens / 1000) * pricing[model]["input"]
output_cost = (usage.completion_tokens / 1000) * pricing[model]["output"]
return input_cost + output_cost
return 0.0
Testing and Validation
Let’s create a comprehensive test script to validate our OpenLLMetry setup:
#!/usr/bin/env python3
"""
OpenLLMetry Test Script
Run this to validate your OpenLLMetry installation and configuration.
"""
import os
import sys
import time
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
def test_basic_initialization():
"""Test basic OpenLLMetry initialization"""
print("Testing basic initialization...")
try:
Traceloop.init(disable_batch=True)
print("✅ OpenLLMetry initialized successfully")
return True
except Exception as e:
print(f"❌ Initialization failed: {e}")
return False
@task(name="test_task")
def test_custom_tracing():
"""Test custom tracing decorators"""
print("Testing custom tracing...")
time.sleep(0.1) # Simulate work
return "Task completed"
@workflow(name="test_workflow")
def test_workflow_tracing():
"""Test workflow-level tracing"""
print("Testing workflow tracing...")
result = test_custom_tracing()
return f"Workflow result: {result}"
def test_environment_configuration():
"""Test environment-based configuration"""
print("Testing environment configuration...")
# Check for environment variables
required_vars = ["TRACELOOP_API_KEY"]
optional_vars = ["TRACELOOP_BATCH_EXPORT", "TRACELOOP_TELEMETRY"]
for var in required_vars:
if not os.getenv(var):
print(f"⚠️ Warning: {var} not set")
for var in optional_vars:
value = os.getenv(var, "not set")
print(f"ℹ️ {var}: {value}")
def run_tests():
"""Run all tests"""
print("🚀 Running OpenLLMetry Tests")
print("=" * 40)
tests = [
test_basic_initialization,
test_environment_configuration,
test_workflow_tracing
]
results = []
for test in tests:
try:
result = test()
results.append(result)
print()
except Exception as e:
print(f"❌ Test {test.__name__} failed: {e}")
results.append(False)
print()
# Summary
passed = sum(1 for r in results if r)
total = len(results)
print("=" * 40)
print(f"Test Results: {passed}/{total} passed")
if passed == total:
print("🎉 All tests passed! OpenLLMetry is ready to use.")
else:
print("⚠️ Some tests failed. Check configuration and dependencies.")
if __name__ == "__main__":
run_tests()
Conclusion
OpenLLMetry provides a comprehensive solution for LLM application observability, offering seamless integration with existing OpenTelemetry infrastructure while addressing the unique monitoring needs of AI applications.
Key Takeaways
- Simple Setup: Get started with just a few lines of code
- Framework Integration: Works seamlessly with LangChain, LlamaIndex, and other popular frameworks
- Production Ready: Includes robust error handling, performance optimization, and security features
- Extensible: Supports custom metrics, attributes, and backend integrations
- Cost Effective: Open source with support for multiple observability backends
Next Steps
- Start Small: Begin with basic monitoring in your development environment
- Add Custom Metrics: Implement business-specific tracking for your use case
- Production Deployment: Configure robust error handling and alerting
- Team Integration: Connect to your existing observability infrastructure
- Continuous Improvement: Use insights to optimize performance and costs
OpenLLMetry transforms LLM application monitoring from an afterthought into a first-class capability, enabling you to build more reliable, performant, and cost-effective AI applications.
For more information, visit the OpenLLMetry GitHub repository or check out the official documentation.
Happy monitoring! 🚀