⏱️ Estimated Reading Time: 15 minutes

Introduction

As Large Language Model (LLM) applications become increasingly complex and production-ready, monitoring and observability have become critical requirements. OpenLLMetry emerges as a comprehensive solution that brings enterprise-grade observability to LLM applications through OpenTelemetry standards.

OpenLLMetry is an open-source observability platform specifically designed for LLM applications. Built on top of OpenTelemetry, it provides complete visibility into your AI application’s performance, costs, and behavior while maintaining compatibility with existing observability infrastructure.

Why OpenLLMetry Matters

Traditional monitoring tools fall short when it comes to LLM applications. OpenLLMetry addresses unique challenges such as:

  • Token Usage Tracking: Monitor input/output tokens and associated costs
  • Latency Analysis: Track response times across different model providers
  • Error Monitoring: Capture and analyze LLM-specific errors and failures
  • Performance Optimization: Identify bottlenecks in AI workflows
  • Cost Management: Monitor spending across multiple LLM providers

Prerequisites

Before diving into OpenLLMetry, ensure you have:

  • Python 3.8+ installed on your system
  • Basic understanding of OpenTelemetry concepts
  • LLM application (OpenAI, Anthropic, etc.) to monitor
  • Observability backend (optional, for advanced setups)

Part 1: Getting Started with OpenLLMetry

Installation and Basic Setup

Let’s start with the simplest possible setup. OpenLLMetry provides a convenient SDK that makes getting started effortless.

Step 1: Install OpenLLMetry SDK

# Install the core SDK
pip install traceloop-sdk

# For specific integrations, install additional packages
pip install openai anthropic  # Example LLM providers

Step 2: Basic Initialization

The most straightforward way to start monitoring your LLM application is with a single line of code:

from traceloop.sdk import Traceloop

# Initialize OpenLLMetry with default settings
Traceloop.init()

For local development, you might want to see traces immediately:

# Disable batch sending for immediate trace visibility
Traceloop.init(disable_batch=True)

Step 3: Your First Monitored LLM Call

Here’s a complete example that demonstrates basic monitoring:

import openai
from traceloop.sdk import Traceloop

# Initialize OpenLLMetry
Traceloop.init(disable_batch=True)

# Configure OpenAI client
client = openai.OpenAI(api_key="your-api-key")

# Make a monitored LLM call
def generate_response(prompt):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": prompt}
        ],
        max_tokens=100
    )
    return response.choices[0].message.content

# Test the monitored function
if __name__ == "__main__":
    result = generate_response("Explain quantum computing in simple terms")
    print(result)

When you run this code, OpenLLMetry automatically:

  • Captures the request and response
  • Records token usage and costs
  • Measures response latency
  • Tracks any errors that occur

Part 2: Advanced Configuration

Custom Tracing with Decorators

OpenLLMetry provides decorators for custom tracing of your application logic:

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow

# Initialize OpenLLMetry
Traceloop.init()

@workflow(name="document_analysis_pipeline")
def analyze_document(document_text):
    """Main workflow for document analysis"""
    summary = summarize_text(document_text)
    sentiment = analyze_sentiment(document_text)
    keywords = extract_keywords(document_text)
    
    return {
        "summary": summary,
        "sentiment": sentiment,
        "keywords": keywords
    }

@task(name="text_summarization")
def summarize_text(text):
    """Summarize the input text"""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Summarize the following text concisely."},
            {"role": "user", "content": text}
        ],
        max_tokens=150
    )
    return response.choices[0].message.content

@task(name="sentiment_analysis")
def analyze_sentiment(text):
    """Analyze sentiment of the text"""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Analyze the sentiment of this text. Respond with: positive, negative, or neutral."},
            {"role": "user", "content": text}
        ],
        max_tokens=10
    )
    return response.choices[0].message.content

@task(name="keyword_extraction")
def extract_keywords(text):
    """Extract key terms from the text"""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Extract 5 key terms from this text. Return as comma-separated list."},
            {"role": "user", "content": text}
        ],
        max_tokens=50
    )
    return response.choices[0].message.content

Environment-based Configuration

For production deployments, configure OpenLLMetry through environment variables:

# Set environment variables
export TRACELOOP_API_KEY="your-traceloop-api-key"
export TRACELOOP_BATCH_EXPORT="true"
export TRACELOOP_TELEMETRY="false"  # Disable telemetry if needed
import os
from traceloop.sdk import Traceloop

# Production configuration
Traceloop.init(
    api_key=os.getenv("TRACELOOP_API_KEY"),
    disable_batch=os.getenv("TRACELOOP_BATCH_EXPORT", "true").lower() == "false",
    telemetry_enabled=os.getenv("TRACELOOP_TELEMETRY", "true").lower() == "true"
)

LangChain Integration

OpenLLMetry seamlessly integrates with LangChain applications:

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from traceloop.sdk import Traceloop

# Initialize OpenLLMetry
Traceloop.init()

# Create LangChain components
llm = OpenAI(temperature=0.7)
prompt = PromptTemplate(
    input_variables=["topic"],
    template="Write a brief explanation about {topic}"
)

# Create and run chain
chain = LLMChain(llm=llm, prompt=prompt)

# This will be automatically traced
result = chain.run(topic="machine learning")
print(result)

LlamaIndex Integration

For LlamaIndex applications:

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import OpenAI
from traceloop.sdk import Traceloop

# Initialize OpenLLMetry
Traceloop.init()

# Configure LlamaIndex
llm = OpenAI(model="gpt-3.5-turbo")

# Load documents and create index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine
query_engine = index.as_query_engine(llm=llm)

# Query with automatic tracing
response = query_engine.query("What are the main topics in these documents?")
print(response)

Part 4: Vector Database Monitoring

OpenLLMetry also monitors vector database operations:

Pinecone Integration

import pinecone
from traceloop.sdk import Traceloop

# Initialize OpenLLMetry
Traceloop.init()

# Initialize Pinecone
pinecone.init(
    api_key="your-pinecone-api-key",
    environment="your-environment"
)

# Create index reference
index = pinecone.Index("your-index-name")

# Monitored vector operations
def search_similar_documents(query_vector, top_k=5):
    """Search for similar documents using vector similarity"""
    results = index.query(
        vector=query_vector,
        top_k=top_k,
        include_metadata=True
    )
    return results

# Monitored upsert operation
def store_document_embedding(doc_id, embedding, metadata):
    """Store document embedding in Pinecone"""
    index.upsert([
        (doc_id, embedding, metadata)
    ])

Chroma Integration

import chromadb
from traceloop.sdk import Traceloop

# Initialize OpenLLMetry
Traceloop.init()

# Initialize Chroma client
client = chromadb.Client()

# Get or create collection
collection = client.get_or_create_collection("documents")

# Monitored operations
def add_documents(documents, embeddings, ids, metadatas):
    """Add documents to Chroma collection"""
    collection.add(
        documents=documents,
        embeddings=embeddings,
        ids=ids,
        metadatas=metadatas
    )

def query_documents(query_text, n_results=5):
    """Query similar documents from Chroma"""
    results = collection.query(
        query_texts=[query_text],
        n_results=n_results
    )
    return results

Part 5: Observability Backend Integration

Datadog Integration

Connect OpenLLMetry to Datadog for enterprise monitoring:

from opentelemetry import trace
from opentelemetry.exporter.datadog import DatadogExporter, DatadogSpanProcessor
from opentelemetry.sdk.trace import TracerProvider
from traceloop.sdk import Traceloop

# Configure Datadog exporter
tracer_provider = TracerProvider()
datadog_exporter = DatadogExporter(
    agent_url="http://localhost:8126",  # Datadog Agent URL
    service="llm-application"
)

# Add Datadog span processor
tracer_provider.add_span_processor(
    DatadogSpanProcessor(datadog_exporter)
)

# Set tracer provider
trace.set_tracer_provider(tracer_provider)

# Initialize OpenLLMetry with custom tracer
Traceloop.init()

Honeycomb Integration

For Honeycomb observability:

import os
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from traceloop.sdk import Traceloop

# Configure Honeycomb exporter
tracer_provider = TracerProvider()

otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.honeycomb.io/v1/traces",
    headers={
        "x-honeycomb-team": os.getenv("HONEYCOMB_API_KEY"),
        "x-honeycomb-dataset": "llm-traces"
    }
)

# Add batch span processor
tracer_provider.add_span_processor(
    BatchSpanProcessor(otlp_exporter)
)

# Set tracer provider
trace.set_tracer_provider(tracer_provider)

# Initialize OpenLLMetry
Traceloop.init()

Part 6: Custom Metrics and Attributes

Adding Custom Attributes

Enhance traces with custom business logic:

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task
from opentelemetry import trace

# Initialize OpenLLMetry
Traceloop.init()

@task(name="customer_support_response")
def handle_customer_query(query, customer_id, priority="normal"):
    """Handle customer support query with custom attributes"""
    
    # Get current span
    current_span = trace.get_current_span()
    
    # Add custom attributes
    current_span.set_attribute("customer.id", customer_id)
    current_span.set_attribute("query.priority", priority)
    current_span.set_attribute("query.length", len(query))
    
    # Determine model based on priority
    model = "gpt-4" if priority == "high" else "gpt-3.5-turbo"
    current_span.set_attribute("llm.model", model)
    
    # Generate response
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful customer support agent."},
            {"role": "user", "content": query}
        ]
    )
    
    # Add response attributes
    response_text = response.choices[0].message.content
    current_span.set_attribute("response.length", len(response_text))
    current_span.set_attribute("response.satisfactory", "unknown")  # Could be determined by ML model
    
    return response_text

Custom Metrics Collection

Create custom metrics for business KPIs:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader
import time

# Configure metrics
metric_reader = PeriodicExportingMetricReader(
    ConsoleMetricExporter(), 
    export_interval_millis=5000
)
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))

# Create custom meters
meter = metrics.get_meter("llm_application")

# Create custom metrics
request_counter = meter.create_counter(
    "llm_requests_total",
    description="Total number of LLM requests"
)

response_time_histogram = meter.create_histogram(
    "llm_response_time",
    description="LLM response time in seconds"
)

token_usage_counter = meter.create_counter(
    "llm_tokens_used",
    description="Total tokens consumed"
)

def monitored_llm_call(prompt, model="gpt-3.5-turbo"):
    """LLM call with custom metrics"""
    start_time = time.time()
    
    try:
        # Increment request counter
        request_counter.add(1, {"model": model})
        
        # Make LLM call
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        # Record response time
        response_time = time.time() - start_time
        response_time_histogram.record(response_time, {"model": model})
        
        # Record token usage
        usage = response.usage
        token_usage_counter.add(usage.total_tokens, {
            "model": model,
            "type": "total"
        })
        token_usage_counter.add(usage.prompt_tokens, {
            "model": model,
            "type": "prompt"
        })
        token_usage_counter.add(usage.completion_tokens, {
            "model": model,
            "type": "completion"
        })
        
        return response.choices[0].message.content
        
    except Exception as e:
        request_counter.add(1, {"model": model, "status": "error"})
        raise

Part 7: Production Best Practices

Error Handling and Resilience

Implement robust error handling for production environments:

from traceloop.sdk import Traceloop
from opentelemetry import trace
import logging
import sys

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize OpenLLMetry with error handling
try:
    Traceloop.init(
        disable_batch=False,  # Enable batching for production
        telemetry_enabled=False  # Disable telemetry for privacy
    )
    logger.info("OpenLLMetry initialized successfully")
except Exception as e:
    logger.error(f"Failed to initialize OpenLLMetry: {e}")
    # Continue without tracing rather than failing the application
    pass

def safe_llm_call(prompt, max_retries=3, backoff_factor=2):
    """LLM call with retry logic and comprehensive error handling"""
    
    span = trace.get_current_span()
    
    for attempt in range(max_retries):
        try:
            span.set_attribute("retry.attempt", attempt + 1)
            
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": prompt}],
                timeout=30  # Set timeout for production
            )
            
            span.set_attribute("request.successful", True)
            return response.choices[0].message.content
            
        except openai.RateLimitError as e:
            span.set_attribute("error.type", "rate_limit")
            span.set_attribute("error.message", str(e))
            
            if attempt < max_retries - 1:
                wait_time = backoff_factor ** attempt
                logger.warning(f"Rate limit hit, waiting {wait_time}s before retry")
                time.sleep(wait_time)
            else:
                span.set_attribute("request.successful", False)
                raise
                
        except openai.APIError as e:
            span.set_attribute("error.type", "api_error")
            span.set_attribute("error.message", str(e))
            span.set_attribute("request.successful", False)
            
            logger.error(f"OpenAI API error: {e}")
            raise
            
        except Exception as e:
            span.set_attribute("error.type", "unknown")
            span.set_attribute("error.message", str(e))
            span.set_attribute("request.successful", False)
            
            logger.error(f"Unexpected error: {e}")
            raise

Performance Optimization

Optimize OpenLLMetry for high-throughput applications:

from traceloop.sdk import Traceloop
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace import TracerProvider
import os

# High-performance configuration
tracer_provider = TracerProvider()

# Configure batch processor for high throughput
batch_processor = BatchSpanProcessor(
    span_exporter=your_exporter,  # Your chosen exporter
    max_queue_size=2048,  # Increase queue size
    export_timeout_millis=30000,  # 30 second timeout
    max_export_batch_size=512,  # Larger batch sizes
    schedule_delay_millis=500  # More frequent exports
)

tracer_provider.add_span_processor(batch_processor)

# Initialize with performance optimizations
Traceloop.init(
    disable_batch=False,
    resource_attributes={
        "service.name": "llm-application",
        "service.version": "1.0.0",
        "deployment.environment": os.getenv("ENVIRONMENT", "production")
    }
)

Security and Privacy Considerations

Implement security best practices:

from traceloop.sdk import Traceloop
from opentelemetry import trace
import hashlib
import re

# Initialize with privacy-focused configuration
Traceloop.init(
    telemetry_enabled=False,  # Disable telemetry
    api_key=os.getenv("TRACELOOP_API_KEY")  # Use environment variables
)

def sanitize_prompt(prompt):
    """Remove sensitive information from prompts before tracing"""
    
    # Remove email addresses
    prompt = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', prompt)
    
    # Remove phone numbers
    prompt = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', prompt)
    
    # Remove credit card numbers
    prompt = re.sub(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', '[CREDIT_CARD]', prompt)
    
    return prompt

def secure_llm_call(prompt, include_prompt_in_trace=False):
    """LLM call with privacy protection"""
    
    span = trace.get_current_span()
    
    # Hash the original prompt for tracking without exposing content
    prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()[:16]
    span.set_attribute("prompt.hash", prompt_hash)
    span.set_attribute("prompt.length", len(prompt))
    
    # Optionally include sanitized prompt
    if include_prompt_in_trace:
        sanitized_prompt = sanitize_prompt(prompt)
        span.set_attribute("prompt.sanitized", sanitized_prompt)
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    
    # Don't include response content in traces for privacy
    response_text = response.choices[0].message.content
    span.set_attribute("response.length", len(response_text))
    span.set_attribute("response.hash", hashlib.sha256(response_text.encode()).hexdigest()[:16])
    
    return response_text

Part 8: Monitoring and Alerting

Setting Up Alerts

Configure alerts for common LLM application issues:

from traceloop.sdk import Traceloop
from opentelemetry import trace
import time

# Initialize OpenLLMetry
Traceloop.init()

class LLMMonitor:
    def __init__(self):
        self.error_count = 0
        self.total_requests = 0
        self.total_cost = 0.0
        self.response_times = []
    
    def track_request(self, success=True, cost=0.0, response_time=0.0):
        """Track request metrics for alerting"""
        self.total_requests += 1
        self.total_cost += cost
        self.response_times.append(response_time)
        
        if not success:
            self.error_count += 1
        
        # Keep only last 100 response times for moving average
        if len(self.response_times) > 100:
            self.response_times.pop(0)
        
        # Check for alert conditions
        self.check_alerts()
    
    def check_alerts(self):
        """Check for alerting conditions"""
        
        # High error rate alert
        if self.total_requests > 10:
            error_rate = self.error_count / self.total_requests
            if error_rate > 0.1:  # 10% error rate
                self.send_alert(f"High error rate: {error_rate:.2%}")
        
        # High response time alert
        if len(self.response_times) > 10:
            avg_response_time = sum(self.response_times[-10:]) / 10
            if avg_response_time > 5.0:  # 5 second average
                self.send_alert(f"High response time: {avg_response_time:.2f}s")
        
        # Cost alert
        if self.total_cost > 100.0:  # $100 threshold
            self.send_alert(f"High cost: ${self.total_cost:.2f}")
    
    def send_alert(self, message):
        """Send alert (implement your preferred alerting method)"""
        print(f"ALERT: {message}")
        # Implement Slack, email, or other alerting here

# Global monitor instance
monitor = LLMMonitor()

def monitored_llm_call_with_alerting(prompt):
    """LLM call with monitoring and alerting"""
    start_time = time.time()
    span = trace.get_current_span()
    
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}]
        )
        
        # Calculate metrics
        response_time = time.time() - start_time
        cost = estimate_cost(response.usage)  # Implement cost calculation
        
        # Track successful request
        monitor.track_request(success=True, cost=cost, response_time=response_time)
        
        # Add metrics to span
        span.set_attribute("request.cost", cost)
        span.set_attribute("request.response_time", response_time)
        
        return response.choices[0].message.content
        
    except Exception as e:
        response_time = time.time() - start_time
        
        # Track failed request
        monitor.track_request(success=False, response_time=response_time)
        
        # Add error info to span
        span.set_attribute("request.failed", True)
        span.set_attribute("error.message", str(e))
        
        raise

def estimate_cost(usage, model="gpt-3.5-turbo"):
    """Estimate request cost based on token usage"""
    # Simplified cost calculation (update with current pricing)
    pricing = {
        "gpt-3.5-turbo": {"input": 0.001, "output": 0.002}  # per 1K tokens
    }
    
    if model in pricing:
        input_cost = (usage.prompt_tokens / 1000) * pricing[model]["input"]
        output_cost = (usage.completion_tokens / 1000) * pricing[model]["output"]
        return input_cost + output_cost
    
    return 0.0

Testing and Validation

Let’s create a comprehensive test script to validate our OpenLLMetry setup:

#!/usr/bin/env python3
"""
OpenLLMetry Test Script
Run this to validate your OpenLLMetry installation and configuration.
"""

import os
import sys
import time
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow

def test_basic_initialization():
    """Test basic OpenLLMetry initialization"""
    print("Testing basic initialization...")
    
    try:
        Traceloop.init(disable_batch=True)
        print("✅ OpenLLMetry initialized successfully")
        return True
    except Exception as e:
        print(f"❌ Initialization failed: {e}")
        return False

@task(name="test_task")
def test_custom_tracing():
    """Test custom tracing decorators"""
    print("Testing custom tracing...")
    time.sleep(0.1)  # Simulate work
    return "Task completed"

@workflow(name="test_workflow")
def test_workflow_tracing():
    """Test workflow-level tracing"""
    print("Testing workflow tracing...")
    result = test_custom_tracing()
    return f"Workflow result: {result}"

def test_environment_configuration():
    """Test environment-based configuration"""
    print("Testing environment configuration...")
    
    # Check for environment variables
    required_vars = ["TRACELOOP_API_KEY"]
    optional_vars = ["TRACELOOP_BATCH_EXPORT", "TRACELOOP_TELEMETRY"]
    
    for var in required_vars:
        if not os.getenv(var):
            print(f"⚠️  Warning: {var} not set")
    
    for var in optional_vars:
        value = os.getenv(var, "not set")
        print(f"ℹ️  {var}: {value}")

def run_tests():
    """Run all tests"""
    print("🚀 Running OpenLLMetry Tests")
    print("=" * 40)
    
    tests = [
        test_basic_initialization,
        test_environment_configuration,
        test_workflow_tracing
    ]
    
    results = []
    for test in tests:
        try:
            result = test()
            results.append(result)
            print()
        except Exception as e:
            print(f"❌ Test {test.__name__} failed: {e}")
            results.append(False)
            print()
    
    # Summary
    passed = sum(1 for r in results if r)
    total = len(results)
    
    print("=" * 40)
    print(f"Test Results: {passed}/{total} passed")
    
    if passed == total:
        print("🎉 All tests passed! OpenLLMetry is ready to use.")
    else:
        print("⚠️  Some tests failed. Check configuration and dependencies.")

if __name__ == "__main__":
    run_tests()

Conclusion

OpenLLMetry provides a comprehensive solution for LLM application observability, offering seamless integration with existing OpenTelemetry infrastructure while addressing the unique monitoring needs of AI applications.

Key Takeaways

  1. Simple Setup: Get started with just a few lines of code
  2. Framework Integration: Works seamlessly with LangChain, LlamaIndex, and other popular frameworks
  3. Production Ready: Includes robust error handling, performance optimization, and security features
  4. Extensible: Supports custom metrics, attributes, and backend integrations
  5. Cost Effective: Open source with support for multiple observability backends

Next Steps

  1. Start Small: Begin with basic monitoring in your development environment
  2. Add Custom Metrics: Implement business-specific tracking for your use case
  3. Production Deployment: Configure robust error handling and alerting
  4. Team Integration: Connect to your existing observability infrastructure
  5. Continuous Improvement: Use insights to optimize performance and costs

OpenLLMetry transforms LLM application monitoring from an afterthought into a first-class capability, enabling you to build more reliable, performant, and cost-effective AI applications.

For more information, visit the OpenLLMetry GitHub repository or check out the official documentation.


Happy monitoring! 🚀