Skip to content

Container Deployment Guide

This document provides comprehensive build instructions and deployment guidance for all Docker images in the VLLM Evaluation system.

Overview

The VLLM Eval system uses containerized evaluation services with cross-platform build support. All images follow consistent naming conventions and deployment patterns.

Naming Convention

  • macOS builds: Tag ends with -mac
  • Linux/amd64 builds: Tag ends with -linux

Prerequisites

  • Docker 24+ with Buildx enabled (for cross-platform builds)
  • Access to ghcr.io/thakicloud/ container registry

Available Images

1. Evalchemy

Purpose: EleutherAI lm-evaluation-harness based unified benchmark runner

macOS (local arch)

docker build -f docker/evalchemy.Dockerfile -t ghcr.io/thakicloud/evalchemy-mac:latest .
docker push ghcr.io/thakicloud/evalchemy-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/evalchemy-linux:latest .
docker push ghcr.io/thakicloud/evalchemy-linux:latest


2. Standard Evalchemy

Purpose: Standard benchmark evaluation with predefined tasks

macOS (local arch)

docker build -f docker/standard-evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/standard-evalchemy-mac:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/standard-evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/standard-evalchemy-linux:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-linux:latest


3. NVIDIA Eval

Purpose: NVIDIA-specific evaluation tasks including AIME and LiveCodeBench

macOS (local arch)

docker build -f docker/nvidia-eval.Dockerfile -t ghcr.io/thakicloud/nvidia-eval-mac:latest .
docker push ghcr.io/thakicloud/nvidia-eval-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/nvidia-eval.Dockerfile \
  -t ghcr.io/thakicloud/nvidia-eval-linux:latest .
docker push ghcr.io/thakicloud/nvidia-eval-linux:latest


4. VLLM Benchmark

Purpose: VLLM performance benchmarking and evaluation

macOS (local arch)

docker build -f docker/vllm-benchmark.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/vllm-benchmark.Dockerfile \
  -t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest


5. VLLM Benchmark Linux

Purpose: Linux-optimized VLLM benchmarking

macOS (local arch)

docker build -f docker/vllm-benchmark-linux.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/vllm-benchmark-linux.Dockerfile \
  -t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest


6. Deepeval

Purpose: PyTest-style LLM evaluation framework with custom metrics

macOS (local arch)

docker build -f docker/deepeval.Dockerfile -t ghcr.io/thakicloud/deepeval-mac:latest .
docker push ghcr.io/thakicloud/deepeval-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/deepeval.Dockerfile \
  -t ghcr.io/thakicloud/deepeval-linux:latest .
docker push ghcr.io/thakicloud/deepeval-linux:latest


Container Configuration

Security Features

  • All images use non-root user evaluser where applicable
  • Minimal base images for reduced attack surface
  • Proper permission handling for mounted volumes

Standard Directories

  • Results: /app/results - Output directory for evaluation results
  • Parsed: /app/parsed - Directory for processed evaluation data
  • Volumes: Typically mounted to these paths in Kubernetes deployments

Health Checks

Most images include built-in health checks. See individual Dockerfiles for specific implementation details.


Deployment Patterns

Kubernetes Integration

These images are designed for deployment in Kubernetes environments using:

  • Argo Workflows: For orchestrating evaluation pipelines
  • GPU Scheduling: Images support NVIDIA GPU allocation
  • Resource Limits: Configured for optimal resource utilization

Volume Mounts

# Example Kubernetes volume configuration
volumeMounts:
- name: results-volume
  mountPath: /app/results
- name: parsed-volume
  mountPath: /app/parsed

Troubleshooting

Common Issues

Exec format error

  • Cause: Building for incorrect platform
  • Solution: Use Buildx with --platform linux/amd64 for Linux images

Windows line endings

  • Cause: CRLF line endings in scripts
  • Solution: Ensure scripts use LF or run: sed -i 's/\r$//' /app/*.sh

Permission denied

  • Cause: Scripts not executable
  • Solution: Run container with root to diagnose and chmod +x /app/*.sh

Network issues

  • Cause: Container networking problems
  • Solution: Use --network host for local testing or verify connectivity

Build Automation

CI/CD Integration

These build commands are integrated into the CI/CD pipeline for:

  • Automatic builds on code changes
  • Multi-platform image generation
  • Container registry deployment
  • Security scanning and validation

Version Management

Images are tagged with: - latest for current stable builds - release-{version} for specific releases - Platform-specific suffixes (-mac, -linux)